Jump to content
  • Advertisement
ddengster

no performance gains from morton code vs x + y * width?

Recommended Posts

 

Hi, I've implemented Morton Code via the 'magic bits' method as mentioned in https://www.forceflow.be/2013/10/07/morton-encodingdecoding-through-bit-interleaving-implementations/  and the real-time collision detection book, but I've yet to see the performance gains that it promises over array[x + y * width] even when increasing sizes to 16k * 16k. Do I have to implement the LUT method? Compiler is MSVC, and the particular problem I've integrated it into is a marching squares opcode determination program. The morton encoding functions are inlined, i've checked.

Seems to me it might not be the best for known/fixed spaces. Has anyone reached similar conclusions?

Edited by ddengster
title

Share this post


Link to post
Share on other sites
Advertisement

Are you compiling and timing in release or debug mode?  What level of optimizations are you doing with your release build?  And if you're using debug do not trust your numbers in the least.

Share this post


Link to post
Share on other sites
3 hours ago, CrazyCdn said:

Are you compiling and timing in release or debug mode?  What level of optimizations are you doing with your release build?  And if you're using debug do not trust your numbers in the least.

it's release, /O2.

Share this post


Link to post
Share on other sites

What are the data access patterns of the algorithm? How big is the data set that you're working on? Did you have data before showing that the cache hit ratio was lower than ideal? What's the new cache hit ratio after the change?

Share this post


Link to post
Share on other sites
On 2/26/2019 at 5:22 PM, Hodgman said:

What are the data access patterns of the algorithm? How big is the data set that you're working on? Did you have data before showing that the cache hit ratio was lower than ideal? What's the new cache hit ratio after the change?

Much of it is opcode generation, where you arrange bits from (x, y), (x + 1, y), (x, y + 1), (x + 1, y + 1) to form a 4bit opcode. Then using this new grid of opcodes you iterate left to right, top to bottom and when find a contour you trace it. Looking at how Morton encoding works, there should be fewer cache misses when you move 'up' the y-axis compared to a grid that has a large width beyond a cache line.

Unfortunately I don't have a profiler that measures cache hits/misses on MSVC2017, if there's a free one that's easy to integrate do tell:)

 

Share this post


Link to post
Share on other sites

If you're using Morton encoding, then you should never iterate over your grid left-to-right top-to-bottom.  Doing so completely defeats the point of using Morton encoding in the first place.  Instead, iterate in memory layout order.

Given your description of what you're doing with your array, only the actual contour tracing would actually benefit from Morton encoding, and it's unlikely that that's your bottleneck.  But if you're iterating in memory layout order, then at least the other bits won't be any slower than if you didn't use Morton encoding.

Share this post


Link to post
Share on other sites
18 hours ago, a light breeze said:

If you're using Morton encoding, then you should never iterate over your grid left-to-right top-to-bottom.  Doing so completely defeats the point of using Morton encoding in the first place.  Instead, iterate in memory layout order.

Given your description of what you're doing with your array, only the actual contour tracing would actually benefit from Morton encoding, and it's unlikely that that's your bottleneck.  But if you're iterating in memory layout order, then at least the other bits won't be any slower than if you didn't use Morton encoding.

 

I corrected the iteration order and it's still not showing gains. Looks like morton coding won't be helping for this algorithm. Oh well!

Edited by ddengster

Share this post


Link to post
Share on other sites
2 hours ago, ddengster said:

Looks like morton coding won't be helping for this algorithm.

One of the important rules of optimization is to measure before you start, ensuring you have a problem that the optimization actually addresses.

Typical grid encoding is a full line at a time.  If you access the cells sequentially and only to the right or left, this is great on the cache.  If you access multiple rows as a set, again working sequentially, this is also great on the cache.  But if you're accessing the cells above, below, or diagonally, and not scanning across the row, the access pattern can be terrible for the CPU cache.

The encoding creates a more cache-friendly pattern for those up/down/diagonal access when not scanning in a row.

If that was not the problem you were having, then the coding probably won't help.  

The problem and the optimization have everything to do with data access patterns.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!