Sign in to follow this  
boogyman19946

Memory access and CPU cycles

Recommended Posts

boogyman19946    1487
I've been reading Art of Assembly in my spare time and I'm currently reading sections about system timing and memory access. After reading the sections about memory access times, I've went ahead and looked up the access times of my RAM and with my CPU frequency (2.6GHz) I've tried to calculate how many cycles it takes to access memory.

So with 2,600,000,000 cycles per second (give or take), one cycle takes about 0.3846 nanoseconds (or 384.6 picoseconds but it's easier to go with ns). My RAM is of type DDR2-800 so CAS latency (for the sake of simplicity that's all to be considered) is expected to be 5 nanoseconds.

To find how many clock cycles we need to cover 5 nanoseconds, we divide the access time by the cycle time and obtain 13 cycles.

Is this correct? Does it really take 13 CPU cycles just to for the memory to fetch the data? (That's to say, from the time the memory sets out to access the data to the moment it ends up at the data bus) That's a ridiculous impact on CPU performance! I understand there is also the cache memory which is supposed to be the uber fast memory that reduces the latency to a lot smaller amounts. But say that wasn't available. For the sake of argument, say I was operating with truly random access memory where the next address of memory to access if a flip of a coin away and I've somehow managed to remove all helper technology such as cache memory and whatnot. If by every instruction I sacrifice 13 CPU cycles, that divides my total operating frequency from 2.6GHz to 200MHz (2.6GHz / 13).

It looks to me like being careless with programming and unmindful of how data is being tossed around, the performance can be heavily handicapped. Obviously in actual situations there is cache memory, pipelines, even multi-core processors which may (or may not :D) make the timings a negligible threat, but I still can't believe there's so much penalty for it. Are my calculations correct here?

EDIT: I just realized... This doesn't really pertain to games and it doesn't seem particularly like a "For Beginners" question.... Sorry about that XD This was the only forum I knew where I could possibly get an answer to that.

Share this post


Link to post
Share on other sites
TheTroll    883
Not sure about the real numbers, but it could be pretty close. Any IO takes time, that is just the nature of the beast.

So should you be worry about this? Not nearly as much as you seem to be. No you don't want to be wasting CPU cycles for no reason, but the amount of time you would waste trying to optimize everything is just not worth it. Understanding what is going on under the hood doesn't hurt but in 99.99% of the applications people write, that kind of performance is never an issue. Profile and fix problems, don't try to preempt optimizations, it rarely works.

Exception, micocontrollers, optimize the heck out them.

Share this post


Link to post
Share on other sites
KulSeran    3267
[quote name='boogyman19946' timestamp='1302467986' post='4796797']
To find how many clock cycles we need to cover 5 nanoseconds, we divide the access time by the cycle time and obtain 13 cycles.
Is this correct?
[/quote]
A recent presentation at the [url="http://www.gdcvault.com/play/1014645/-SPONSORED-Hotspots-FLOPS-and"]GDC by intel[/url] quotes:
1 cycle to read a register
4 cycles to reach to L1 cache
10 cycles to reach L2 cache
75 cycles to reach L3 cache
and hundreds of cycles to reach main memory.

This is mostly because there is computation going on to figure out the addresses in the higher cache levels, and also because each larger cache will likely fetch more memory, as caches fill by line, and each line contains many bits.

[quote]
If by every instruction I sacrifice 13 CPU cycles, that divides my total operating frequency from 2.6GHz to 200MHz (2.6GHz / 13).
[/quote]
But you're not. You're going to have several load instructions, then some computation, and maybe only one store. Most compilers push the loads to the top so as many as possible happen first, allowing all your computations to take place in regiesters. Out-of-order CPUs pipeline that even more, and just keep reading instructions till their pipeline is full in hopes of keeping the ALU busy on what data it did manage to pull in.

But, yes, you discovered how computationally easy problems quickly become IO bound. The best thing you can do is take a "Data Oriented Design" to your code, and try to structure your data such that accesses are localized. The most common advice is to change from a "array of structures" to a "structure of arrays". The idea being, that updating the position (3*4 bytes) of all your objects doesn't require the rest of the object data. However, a cache read will pull in much object if it is in AOS form and that data won't be used. While a SOA form will pull in this objects position and the following few object positions which needed updating anyway.

Share this post


Link to post
Share on other sites
rip-off    10979
Memory is the new disk - don't access it if you can avoid it. Batching is your friend, structure the performance critical parts of your code to work sequentially on contiguous arrays of data.

[url="http://video.google.com/videoplay?docid=-4714369049736584770"]This[/url] might interest you, even if it is a few years old. The thesis is simple, you can buy more bandwidth, but latency has physical limits. At the ~25 minute time, he is talking about 14 cycles for [b]L2 cache[/b]. For RAM, he talks about [b]~200 cycles[/b] for access.

Share this post


Link to post
Share on other sites
boogyman19946    1487
@TheTroll, I stopped worrying about performance as soon as I made those calculations ^.^ I've become overly concerned when I started to use Java and experiment with scripts and file parsing but when I made those calculations and realized how much the underlying hardware actually optimizes things to roon smoothly, I stopped caring too much. Now, obviously with a huge throughput of calculations that have to be done repeatedly VERY fast (like collision detection, pathfinding, etc.) the algorithm could use some attention to be optimized, but I tend to focus more on the so called "macro optimization" than how fast my memory is being accessed. I'm just interesting in this low level stuff, and I happened to be reading a section on it, that's all. :D

@KulSeran, Thanks for that video. I've watched some of it but I only got the general gist of what they were talking about. Also, I realize there is a lot more between the CPU and RAM than as I described, but I was only doing that for simplicity :) Just checking if what I read so far is more or less correct. Thanks!

@rip-off, Thanks, I'll be sure to check out that video when I have time ^.^

Share this post


Link to post
Share on other sites
Hodgman    51324
[quote name='boogyman19946' timestamp='1302467986' post='4796797']
Does it really take 13 CPU cycles just to for the memory to fetch the data? That's a ridiculous impact on CPU performance!
EDIT: I just realized... This doesn't seem particularly like a "For Beginners" question.... [/quote]As mentioned above, it's actually hundreds of cycles ([i]or tens of thousands if you hit a page of virtual memory located on disk[/i]) - memory is [b]extremely[/b] slow compared to the CPU, accessing it is a huge performance penalty.
Memory accesses are [b]the [/b]main area of concern when writing high performance code as a professional games programmer the days, but yes, it's not something you should concern yourself with as a beginner ;) It's more important to learn how to write good code in general than to learn all the performance tricks that are currently in vogue.

The [url="http://www.google.com.au/search?q=pitfalls+of+oop"]pitfalls of oop[/url] presentation is another good example of this issue.

Share this post


Link to post
Share on other sites
boogyman19946    1487
I'm not particularly worried about performance of my games really. I'm not making large scale stuff where a million calculations take place so if I'm ever going to find myself doing micro optimizations like trying to force the cache to do something for me then I'll know I'm doing something wrong. I read about this stuff because it interests me and out of all things I've ever tried to do (music being one of them) CS is my greatest strength , so I take what I have and I expand when I can :)

Thanks for the replies though guys. Every time I come in these forums and read a reply from the Pros I get reminded of how little I know and how much there is to learn XD

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this