redacted

Started by
50 comments, last by Promit 19 years, 1 month ago
The idea of parallelizing at the instruction level is not a new one. Intel's Itanium does it, but it doesn't try to do it "automatically" like your design, it relies on the compiler to tell it when statements can be run in parallel. That is, it has special instructions which tell the processor when things can run in parallel.

The circuitry needed to synchronize two related instructions to run in parallel would be far too complex to be practical. The Itanium idea takes unrelated statements, where the output of one is not used as input for others, and runs them in parallel. That's far cheaper in terms of transistors and synchronization.

So for example, say you have some high-level code like:

A = (B + C) * D
W = (X + Y) * Z

which isn't completely impossible. Now, in your idea (if I understand it correctly) you'd end up using the result of adding B and C to multiply it by D at the same time. On the Itanium, the compiler would reorder the assembly such that you can add B and C together and X and Y together at the same time, then the multiplications could also be done at the same time. Because all the results are unrelated, there's no synchronization needed, and the circuitry's all rather simple.

On the other hand, Intel's idea puts a lot more emphasis on smart compilers (and they are getting smarter) so you do need to have a good compiler to make the most of it.
Advertisement
A fundamental principle of any hardware design is that smaller equals faster. The more busses, wires, gates, register files, caches, ROMs, etc., you add to the processor the slower it's going to be by virtue of propagation delay. Remember that your clock cycle has to long enough to handle the slowest single-cycle operation your processor performs. The concept you're thinking of, where results from one instruction are routed to another instruction without using the primary source (memory/register file) is similar to bypassing in a pipelined processor design. However, I think what your suggesting is an extremely complex bypassing/forwarding scheme. I'm not really clear on your design, but it sounds like you'd run into some major synchronization and racing issues, or it would just have some tremendously complex control logic and run slowly.
Do you know anything about parallel computation at all? Everything you've posted assumes a perfectly linear speed-up, which does not occur because of dependencies. You're throwing out buzzwords in butchered english and claiming that somehow you're going to create a processor in your spare time that's somehow going to be lightyears faster and cheaper than the best piece of hardware Intel, IBM, Apple and AMD have out. You've given no answer to how you're going to synchronize this thing to prevent race conditions and you also haven't taken into account the heat that would be produced if your linear speed-up were possible. I hate to crush your dreams, but you have a ton of research to do before you can even talk about processor design seriously. If you don't want to be taken as a joke, I suggest you start doing some lit review. And Asimov does not count as a lit review.
Again, the central idea behind what you're proposing is already known as forwarding (in the context of a pipeline but same notion here). However I don't think you've actually considered the amount of circuitry and control required to manipulate and synchronize thousands of banks. You seem to be just waving your hands and saying it won't be a problem, but have you actually considered the number of components and gate delays required to implement this processor? Second, you're also overlooking all the possible structural and data hazards that could occur due to parallel processing, i.e. two parallel instructions finish at the same time but write to the same register. And what about multi-cycle operations? If you plan on turning these into single-cycle instructions by "unrolling" the cycles, then you further increase the amount of circuitry and propogation time. And what about fetching all these instructions from memory? Or the time required to write and read from the register file? And non-ALU instructions such as load, stores, and branches need to be taken into consideration too. If you accidentally start processing thousands of mispredicated instructions, things could turn ugly.

While I think you're on the right track by considering ways to increase thruput, hundreds or thousands of banks with all the extra control will make your processor bigger and slower. You can't just wave around imaginary numbers and say that's how fast things will be. Maybe, theoretically, you could get away with somehow paralizing thousands of instructions per cycle (if you're lucky), but the synchrozation combined with control and delay would kill your clock.
You can't just have multiple instructions running through several of these banks in a single cycle and then not worry about synchronization, because this is an ideal racing condition scenario. The problem is that circuits have no notion of timing. They'll just operate as fast as they can, close to the speed of light (not exactly because a transistor state switch change takes a bit of time). However this behavior can be unpredictable and unreliable, which is why you introduce another more reliable timing mechanism - the clock. Another problem is that circuits have no notion of doing "work". Transistors will respond to changes in voltage with altered current flow, but that's it. So if you have something like a ripple-carry adder or carry-lookahead adder that takes a certain amount of time to propagate before you have a meaningful result, you need the clock to "know" how long it takes for this to happen. But once you start introducing multiple concurrent instructions, none of the individual instructions know they're running in parallel with any other instructions because this is part of the notion of increasing throughput, which is directly related to increasing work done per unit time and transistors have no notion of meaningful work or computation.

I guess my point is that you can't just leave these things up to their own devices. You need the extra synchronization, and when concurrent processing is involved in a single processor or datapath you need to add the extra cycles so that you can increase your throughput without falling victim to unwanted hazards. Essentially these basic electrical components have no idea what kind of grander scheme they're involved in.

This all comes back to how pipelined processors managed to increase throughput. If you want to do this on the instruction level, then you need to split up an instruction into atomic stages. This is what allows multiple instructions to be executed at once, because while one instruction is on a later stage a new instruction can start at an earlier stage, and since the two sections of circuitry are independent they can be done in parallel. Pipelining your processor has the effect of increasing CPI and CPS, which about cancel each other out. However you have that raw increase in throughput which allows you do get more work done in a fixed period of time. Now your worst case scenario is that only one instruction is in the datapath at once (i.e. one stage is an active instruction and the rest are nops), and the best case scenario is you're able to have as many concurrent instructions being processed as there are stages.

[Edited by - Zipster on March 13, 2005 3:00:06 PM]
Quote:Original post by Nice Coder
But i'm relying on smart compilers/coders!.


That's one of many reasons your processor is a bad idea.

Quote:
Also, how would i do lit review? (litterary review?)


A literature review is what real researchers and engineers do. They read the recent literature in a field so that they actually know what they're talking about (gasp). You do a lit review by reading journals, conference proceedings and textbooks.

Quote:
And please don't insult my spelling/grammer. Its hard to write clearly when your thinking....


It's hard to write clearly when my thinking? What does that mean? And a contraction of "it is" to "it's" has an apostrophe. Seriously, though, if you can't communicate an idea then nobody will be able to understand you. Your posts look like crap posts when you don't take the time to proofread your posts and/or use appropriate style to convey an idea. Other posters don't seem to have difficulty conveying their ideas in writing. This is not the only thread where I have seen your inability to wield the english language cause you to crap post. I would be sympathetic and understanding if you were from a country where english was not the standard language, but you are from Australia.

Quote:
how am i throwing out buzzwords?


When other posters who understand hardware design have posted valid reasons why your processor will not work, you throw out buzzwords and start waving your hands to answer them. For example, many people have posted replies talking about the need for synchronization. Most of these replies have noted that synchronization is hard and often requires a great deal of extra circuitry. You respond to these posters by tossing out buzzwords like "bus" and "barrel shifter" and say that synchronization won't be needed. Nobody is going to think that you know more about hardware design because you googled for barrel shifter. The fact that you know very little about actual hardware design is quite transparent.
First of all, parallelising a problem is REALLY HARD in the general case. And when you do succeed you tend to require lots of communication. Look up Amdahl's law.

And I hate to be blunt but you don't know jack shit about processor design. Read the book "Computer architechture: a quantitative approach" by Hennessy, Goldberg and Patterson ( http://www.amazon.com/exec/obidos/ASIN/1558603298/102-9711485-7224123 ) You'll learn LOTS, I promise.
I'm moving the thread to "General Programming" since the thread is not so much about math & physics.
Graham Rhodes Moderator, Math & Physics forum @ gamedev.net


weirdest thread in a while ..
Someone who thinks he's too smart to educate himself about a complex field throws out a bunch of random ideas, thinking they're the best thing since sliced bread. And even starts talking about patenting... Bloody classic!

This topic is closed to new replies.

Advertisement