Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!


1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Metric used for memory bandwidth


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
2 replies to this topic

#1 joystick-hero   Members   -  Reputation: 148

Like
0Likes
Like

Posted 04 September 2014 - 01:50 PM

Hi guys. I've developed a DirectX application and I need to measure the performance achieved in several scenarios. One question that I really need to answer and wasn't able to find anywhere is about the metrics used for the GPU's memory bandwidth.

For instance, I own a Gigabyte Geforce GTX 660 and in the technical specifications that I've found say the memory bandwidth is 144.2 GB/S but my question is: in this case, 1 GB = 100,000,000 bytes or 1GB = 2^30 bytes ?

I thought it was the former, but my profiler says that my application reachs maximum speeds of 155.83 GB/S. My profiler could be wrong too. That's why I would like to know if I should change the metric used for the performance calculation or if somehow my gigabyte geforce gtx 660 is better than I hoped or if my profiler needs to be checked. I really hope it's not the last option hehe.
 
And regarding the GFLOPS metric, I have the same question, 1GFLOP = 10^9 flops  or 2^30? sad.png
 
 
Regards.

 



Sponsor:

#2 MJP   Moderators   -  Reputation: 14338

Like
0Likes
Like

Posted 04 September 2014 - 03:05 PM

Bandwidth will use base-2 prefixes, so 1 GB/s == 2^30 bytes per seconds. Flops will use base-10 prefixes just like SI units, so 1GFLOP == 10^9 flops.

How exactly are you measuring bandwidth usage? Is it your own profiler? Typically you need hardware counters to get exact bandwidth measurements, since the GPU will have internal caches that can provide higher effective rates for transactions that don't miss the cache.


Edited by MJP, 04 September 2014 - 03:08 PM.


#3 joystick-hero   Members   -  Reputation: 148

Like
0Likes
Like

Posted 04 September 2014 - 03:21 PM

Thanks for your reply. Yes, it is my own profiler. According to the values that is returning it seems to be working just fine. If I take 1GB == 2^30 as you said then I reach peaks of 143GB/s or so and makes total sense. But I could be doing it wrong anyway, here's how I do it: I use the effective bandwidth equation presented here only that I will now divide by 2^30. To measure the time I use the QueryPerformanceCounter function but only after telling the GPU to finish all the works in its internal queue with a D3D11_QUERY_EVENT.

 

In my shaders each thread accesses a unique memory location only once, and writes at a unique memory location, so I think the cache is not an issue?






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS