Sign in to follow this  
Zmurf

How a CPU Works? The Nitty Gritty

Recommended Posts

Zmurf    100
A understand the basics of how a CPU works, like the theory behind it. But I'm finding it hard to get information on how the electronic pulses running through the buses allow the processor to execute instructions, how the computer can cache and store memory and information on the hard drive. Can anyone help me find some information on the actual science behind the computer? Thanks in advanced.

Share this post


Link to post
Share on other sites
Promit    13246
I recommend reading Computer Organization and Design. May not be the cheapest way, but it's the standard text book on the matter and serves as the intro text to the nitty gritty of things for many thousands of students every year. I myself loved the book and considered the book and the course to be one of the most significant parts of my computer science education.

On the other side, you should probably get a good operating systems textbook as well. No recommendations on that one, I'm afraid -- I wasn't too fond of the text we used.

Share this post


Link to post
Share on other sites
superpig    1825
When we studied it last year I didn't use a textbook... the course was structured such that we started with the simple logic gates and their composition using transistors, then looked at how they could be built up to make things like binary adders and registers.

Divide and conquer is the approach to take. An 32-bit adder is made up of 1-bit adders. RAM is made of millions of individual bit-stores. And so on.

Share this post


Link to post
Share on other sites
Zipster    2365
Quote:
Original post by Promit
On the other side, you should probably get a good operating systems textbook as well. No recommendations on that one, I'm afraid -- I wasn't too fond of the text we used.

Modern Operating Systems? This was the book we used in my operation systems course, and it was pretty good. But I got more out of actually implementing a simple OS in the lab rather than just reading about one. Same goes for the architecture course. Once you have to implement a simple single- and multi-cycle pipelined processor in Verilog, you really get down and dirty with the details.

Share this post


Link to post
Share on other sites
Endar    668
Quote:
Original post by Zipster
But I got more out of actually implementing a simple OS in the lab rather than just reading about one. Same goes for the architecture course. Once you have to implement a simple single- and multi-cycle pipelined processor in Verilog, you really get down and dirty with the details.


That's just no fair. I did both the Architecture and Operating System subjects (architecture I unfortunately didn't pay any attention in, but I was able to pass), but we didn't do anything approaching implementing a processor in Verilog.

In the architecture subject, all we did was MIPS stuff. And the labs had no relation to the lectures. That was annoying. Perhaps if they were I would have paid more attention to the lectures.

In OS, we did a couple of relevant things, things like process scheduling, creating a basic file system and file allocation table. The hardest thing was writing a basic linux shell, and, to be honest, I didn't learn anything in that prac which I have used since.

Implementing a processor? That would have been useful.

Anyway, that was kind of OT, but, yeah, it may not be quite what you're looking for, but "Modern Operating Systems" was a good book.

Share this post


Link to post
Share on other sites
Extrarius    1412
Quote:
Original post by superpig
[...]Divide and conquer is the approach to take. An 32-bit adder is made up of 1-bit adders. RAM is made of millions of individual bit-stores. And so on.
While it's possible to make an N-bit adder from a 1-bit full adder, it's generally not the best way to do it (the latency is too high).
Anyways, another proper name for this is "Digital Logic". Once you understand transistors, it's not to big a leap to logic gates and flip-flops (the simplest unit of memory), and once you have that, it's almost trivial to make an ALU (arithmetic logic unit, which does adding etc). For something simple, you'd just have every possibly type of calculation done (addition, shift, etc) and then have a multiplexer that picks which value to output based on the instruction bits in the opcode/microcode.

Implementing pipelining and caching etc is an advanced topic because of all the exceptions and corner cases (making sure you flush things at the correct time, etc), but really it's an extension of the above - when the logic unit needs to read memory, you'd send a read to the cache subsystem instead of the memory interface subsystem. When it gets the request, it would do all kinds of calculations to compare the registers storing 'Addresses Cached Here' to the requested address, and then it would either return the value from it's cache registers or it would send the request on to the memory subsystem to read a full cache line into one of it's register sets.

If you've never used Verilog or VHDL, you might want to download a simulator from someplace like Xilinx (that makes FPGAs that let you actually do this stuff - they even have a beginner kit that looks really nice for $150 {very very inexpensive for hardware development boards}). Once you mess around with the hardware design languages, this stuff can become a lot clearer.

Share this post


Link to post
Share on other sites
sanch3x    483
Quote:
Original post by Endar
That's just no fair. I did both the Architecture and Operating System subjects (architecture I unfortunately didn't pay any attention in, but I was able to pass), but we didn't do anything approaching implementing a processor in Verilog.

In the architecture subject, all we did was MIPS stuff. And the labs had no relation to the lectures. That was annoying. Perhaps if they were I would have paid more attention to the lectures.

In OS, we did a couple of relevant things, things like process scheduling, creating a basic file system and file allocation table. The hardest thing was writing a basic linux shell, and, to be honest, I didn't learn anything in that prac which I have used since.

Implementing a processor? That would have been useful.

Anyway, that was kind of OT, but, yeah, it may not be quite what you're looking for, but "Modern Operating Systems" was a good book.


My architecture course's lab had us build the ACLU for a processor using MAX+Plus II... that was the most interesting thing I ever did in those classes... aside from that arc I and II were horribly boring. OS was neat though.

Share this post


Link to post
Share on other sites
Raghar    96
Quote:
Original post by Zmurf
A understand the basics of how a CPU works, like the theory behind it. But I'm finding it hard to get information on how the electronic pulses running through the buses allow the processor to execute instructions, how the computer can cache and store memory and information on the hard drive. Can anyone help me find some information on the actual science behind the computer?


Do you mean a math foundation, or a principle behind current register based computer? These are two different things.

If you mean the later, the best way is try to write few test programs in an Intel syntax like ASM, and learn from them.

Do you know what is a register? Have you listened words like flat (linear) memory model?

Intel developer manuals are quite nice source of informations about current CPUs. Arstechnica has also detailed articles.

If you'd like to see how are done actual numeric operations, then the best way would be into my Java libraries for 128 bit whole number aritmetic. These are straightforward, and were done with ease of converting into ASM/HW implementation in the mind.

Share this post


Link to post
Share on other sites
Skizz    794
Well, this is weird. Where I'm working they use Xilinx FPGAs and I thought about putting together a small board that I could connect up to a PC and download CPU designs to with the idea of writing an article about CPU design. I was thinking of nothing more complex than a Z80. Back in 1991 I did much the same thing for my final year degree project, a configurable co-processor, only that was an ISA board to plug into a PC expansion slot. Also, the FPGA as a co-processor idea is starting to appear in supercomputer design circles.

Skizz

Share this post


Link to post
Share on other sites
BeanDog    1065
Quote:
Original post by Endar
That's just no fair. I did both the Architecture and Operating System subjects (architecture I unfortunately didn't pay any attention in, but I was able to pass), but we didn't do anything approaching implementing a processor in Verilog.

That's too bad--we built a simple processor in Verilog in a 200-level class at my university (required course for CS and EE students). It was really an eye-opening project for me. To pass the class, you had to run an undisclosed program provided by the professor on your processor (flashed onto an FPGA) and get the correct output.

Share this post


Link to post
Share on other sites
trzy    100
Zmurf: To understand this, you'll want to study digital logic. There are two broad categories: combinational and sequential. Combinational logic is logic which returns an output for a given input (think AND, OR, NOT, XOR, etc.) Sequential logic retains state information and, therefore, the output can depend on previous states. Sequential logic circuits are made out of combinational elements as well as logic circuits which have "memory" -- latches and flip-flops.

From there on out, it's just a matter of more abstraction. If you're in college, take some digital logic courses and learn an HDL.

Share this post


Link to post
Share on other sites
Monder    993
Quote:
Has anyone used or read this book (Computer Architecture: A Quantitative Approach)? Amazon recommends it in conjunction with Computer Organization and Design. How much is there in the way of overlap - and how do they compare?


My Computer Design supervisor informs be it's got pretty much most of the stuff that Computer Organization and Design has plus more.

I've read a bit of Computer Architecture, and it means it when it says 'A Quantitative Approach'. They're got a fair amount of (simple)maths to compare various ways of doing architecture. It pretty much assumes you know the basics (e.g. The basics of pipelining are covered in the appendix) though, so Computer Organization probably gives more in-depth coverage of them.

Though really if you want to know the nitty gritty of how a CPU works, I recomend (as others have in this thread) to grab yourself some free tools from Xilinx or Altera, and a hardware board and make a CPU. Generally books don't go into precisely how these things work, e.g. you may have lots of detail about how to deal with pipeline hazards, but when it comes to say stalling the pipeline or flushing it they don't say how this is actually going to be done in hardware (probably because it can vary quite a bit depending on how you've constructed the thing).

Share this post


Link to post
Share on other sites
nobodynews    3126
Quote:
Original post by Extrarius
While it's possible to make an N-bit adder from a 1-bit full adder, it's generally not the best way to do it (the latency is too high).
Anyways, another proper name for this is "Digital Logic". Once you understand transistors, it's not to big a leap to logic gates and flip-flops (the simplest unit of memory), and once you have that, it's almost trivial to make an ALU (arithmetic logic unit, which does adding etc). For something simple, you'd just have every possibly type of calculation done (addition, shift, etc) and then have a multiplexer that picks which value to output based on the instruction bits in the opcode/microcode.


I was thinking about this recently while some people in a group I'm are taking a digital circuitry class which involved making a 4-bit ALU (exactly as you describe). I knew that having a, say, 64-bit adder would have too long of a propogation delay as just a series of 1-bit full adders. Just now, I came up with a method for adding that acts somewhat like a Fast Fourier Transform algorithm and I wanted to know if it was similar to what is done in real processors.

Say you have two 64-bit numbers A and B and you wish to generate the sum C. Break apart A, B, and C in half into A1, A2, B1, B2, C1, and C2 where X1 is the lower order bits and X2 is the higher order bits.

A1 + B1 = C1 w/ or w/o Carry
A2 + B2 + Carry = C2a
A2 + B2 + = C2b

Once C2a and C2b are calculated it simply becomes a matter of outputing the version of C2 that correspons to there being a carry or not.

This method can further be broken down. Divide A1, B1, and C1 in half and perform the same technique as before. Continue until you reach a size where the propogation delay is negligible.

My question is: is this how it's done, or is there a better trick?

Share this post


Link to post
Share on other sites
nicksterdomus    270
Quote:
Original post by nobodynews
Say you have two 64-bit numbers A and B and you wish to generate the sum C. Break apart A, B, and C in half into A1, A2, B1, B2, C1, and C2 where X1 is the lower order bits and X2 is the higher order bits.

A1 + B1 = C1 w/ or w/o Carry
A2 + B2 + Carry = C2a
A2 + B2 + = C2b

Once C2a and C2b are calculated it simply becomes a matter of outputing the version of C2 that correspons to there being a carry or not.

This method can further be broken down. Divide A1, B1, and C1 in half and perform the same technique as before. Continue until you reach a size where the propogation delay is negligible.

My question is: is this how it's done, or is there a better trick?


Your explanation is so high level it's hard for me to tell what implementation details you are thinking of that would lessen prorogation delay. The 1-bit series implementation is actually exactly your description if you break it up all the way to 1 bit.

Break all the additions down into 1-bit slices. Then have the Carry Out of the previous slice feed into the Carry In of the next slice and that will determine whether C2a or C2b will be chosen because the carry will either be 1 (implicitly choose C2a) or 0.

If you are going to stop breaking it down at some point, for example 8 blocks of 8 bits, then you have to explain how you are going to make the 8 bit addition in each block fast. This is not really a problem. Though, whatever circuit you use to implement the 8-bit blocks does not make the design automatically faster than the 1-bit slices in series design. You would have to calculate the propagation delay of these circuits and see if it the total propagation delay is actually less than the 1-bit adders in series.

Carry look ahead adders are modifications of the ripple carry adder that are proven to lessen propagation delay compared to ripple carry. Read the Wikipedia article and also follow the Look ahead carry unit link at the bottom and you can see them build up a 64 bit adder.

It's too bad they just show block diagrams and not gates within. If they did it would be easy to compare and see that the longest path is much shorter using carry look ahead vs ripple.

Don't think ripple carry adders are never used. If you need a design that takes up little physical area and the speed of your ripple carry adder design is not a problem, then you could use it.

Share this post


Link to post
Share on other sites
Nathan Baum    1027
Quote:
Original post by nobodynews
I was thinking about this recently while some people in a group I'm are taking a digital circuitry class which involved making a 4-bit ALU (exactly as you describe). I knew that having a, say, 64-bit adder would have too long of a propogation delay as just a series of 1-bit full adders. Just now, I came up with a method for adding that acts somewhat like a Fast Fourier Transform algorithm and I wanted to know if it was similar to what is done in real processors.

Say you have two 64-bit numbers A and B and you wish to generate the sum C. Break apart A, B, and C in half into A1, A2, B1, B2, C1, and C2 where X1 is the lower order bits and X2 is the higher order bits.

A1 + B1 = C1 w/ or w/o Carry
A2 + B2 + Carry = C2a
A2 + B2 + = C2b

Once C2a and C2b are calculated it simply becomes a matter of outputing the version of C2 that correspons to there being a carry or not.

This method can further be broken down. Divide A1, B1, and C1 in half and perform the same technique as before. Continue until you reach a size where the propogation delay is negligible.

My question is: is this how it's done, or is there a better trick?

Interesting. I came up with this idea three days ago.

I've sketched out a couple of circuits which (purport to) implement the scheme. I am no kind of expert on digital schematics. There may well be errors.



In this one, A and B come in from the left, Cin comes in from the top, the sum goes out to the right, and Cout goes out to the bottom. In this 1-bit adder case, there is no speed-up because we still need to propagate the carry in order to select which sum we want, and it happens that we use the same circuitry (two ands and an or) as we'd use in a conventional full adder.



This one uses three 74283 4-bit full adders and a 74157 multiplexer (it chooses between two sets of 4 wires). Both the 74283 and 74157 have a gate delay of 4. The upshot of this is that the total gate delay of the circuit is 8. The gate delay of two 74283's placed in series (i.e. the conventional ripple carry adder) is also 8. Result: No speed up.

I conjecture that the minimum gate delay of any n-bit multiplexer is equal to the minimum gate delay of any n-bit full adder.

Share this post


Link to post
Share on other sites
Zmurf    100
Quote:
Original post by trzy
Zmurf: To understand this, you'll want to study digital logic.


AH YES, that was the word I was looking for! It escaped my mind for some reason when I wanted to make this post.

Digital science is what I want to find information about, thanks everyone for all the links and info, I'll most likely be studying at ANU next year (an Australian University) and I'm definitely going to be looking into some of these courses. I think most of them are prescribed anyways for Software Engineering, I'd take Computer Science, but I just can't be bothered going for the UAI of 98 ...

Share this post


Link to post
Share on other sites
Mastaba    761
For a slightly different perspective, one might consider reading Feynman's Lectures on Computation. This covers things more from the physics of information point of view.

Share this post


Link to post
Share on other sites
Ahnfelt    176
Quote:
Original post by Endar
Implementing a processor? That would have been useful.


Since that is something you're going to do regularily? ;-)

Actually we did get to implement a CPU in a language called Kreds (I heard it's become visual since). That was pretty interesting :) But it's more fun than it's useful.

Share this post


Link to post
Share on other sites
Sign in to follow this