Sign in to follow this  

[HLSL] Coping without bitwise operators

This topic is 3742 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi I've got a function which I want to move onto the GPU. Unfortunately, I am using DX9, so don't have access to SM4.0! Therefore my HLSL has no bitwise operators..... So how can I express bitwise functions in HLSL? For example: int i = val & 15; and if ( i&1 ) DoStuff; Thanks for any advice! My binary maths is pretty poor! Simon

Share this post


Link to post
Share on other sites
Given that there is no integer instruction set either you're not really dealing with integers despite your int types.

I'd imagine a good compromise is a look-up texture of some kind, especially if point-filtered. A 256x256 texture allows for all combinations of 8bit operators indexed via 0..1 texture coordinates. Point filtering and texture addressing converts your colour values to integers and does the bitwise comparison all in one go [cool]

if( tex2D( sampLookup, float2( A, B ) ) > 0.5f ) /* stuff */

hth
Jack

Share this post


Link to post
Share on other sites
I've never actually tried this before, but I'm sure there are plenty of ways you can be clever if you really want to emulate bitwise operations.

For instance, take bitwise-AND. If you have the case x & 2N-1 such as your first example, the result is just frac(x/(2N))*2N. In other words you divide by the power-of-two, take the fractional component of the result, and multiply that by the power-of-two. Since these are all floating-point operations you might not get exact results, but for the purposes of conditionals and the like it should suffice. If you have x & 2N, then you first divide by 2N and check if the result is even or odd. If it's even then the result of the whole operation is 0, otherwise it's 2N. This can be expressed mathematically as ceil(x/(2N+1))*2N, because the result of the even/odd check will be either 0.0 or 0.5, which we can 'ceil' to get 0.0 or 1.0 and then multiply by the original power-of-two. And finally, for some other value x & K, just remember that 'K' is a sum of power-of-two's and repeat the previous test. You can get even more clever here with certain numbers. Take x & 239. Since there is only a single 0 bit to check here, it makes more sense to do x & 255 and then subtract x & 16, than it does to individually sum x & 1, x & 2, ..., all the way up to x & 128.

Along those same lines you can probably devise tests for OR, XOR, NOT, NOR, or whatever else you need. Just remember at all times to be careful when checking the results and computing values, since they're all floating-point computations, and you should be fine.

Share this post


Link to post
Share on other sites
Jack,

Nice approach, but DX9 can't do texture lookups in the vertex shader, right?

Zipster.... I will ponder this further. Looks interesting.

Before I embark on such a mission, does anyone have any theory about how much performance I could expect using a GPU instead of CPU to manipulate a Vertex Buffer each frame?

Ball park guess?

Factor of 2? 10?

Thanks

Si

Share this post


Link to post
Share on other sites
Quote:
Original post by sipickles
Nice approach, but DX9 can't do texture lookups in the vertex shader, right?
VS_3_0 can on Nvidia hardware, but otherwise no. But you never mentioned VS though [razz]

Quote:
Original post by sipickles
Before I embark on such a mission, does anyone have any theory about how much performance I could expect using a GPU instead of CPU to manipulate a Vertex Buffer each frame?
But unless you're using R2VB how are you expecting to store the data? Or are you just offloading it all so that you're re-computing static results every time its rendered?

I guess you're doing some sort of noise-function based perturbtion of vertex data. This sounds like a good parallelisable task, so throw a bounded buffer and OpenMP based solution at it and you should be fine for it on the CPU.

Jack

Share this post


Link to post
Share on other sites
Quote:

This sounds like a good parallelisable task, so throw a bounded buffer and OpenMP based solution at it and you should be fine for it on the CPU.



Care to elaborate?!

Share this post


Link to post
Share on other sites
Damn, I thought I could pull off that whole sounding clever thing...

The exact MP mechanics will vary on how you've got your app set up, but you can conceptualise the deformation of a vertex buffer by a noise function as a "task". Usually this'll be very nice as you've got seperate inputs and you're not going to have to deal with synchronization or locking or nasty stuff with sharing writable memory with other threads.

OpenMP ships as part of VS'05 and is pretty easy to use (although I'm no expert) so you can set up a bounded buffer and have each 'task' running in the thread pool, crunching away as fast as it can. Obviously performance scales for dual and quad core CPU's. These CPU based tasks are the producers, and the idea is that the GPU is the consumer and a simple lock-copy-unlock operation sends the generated data up for being rendered.

You also have more control over how often the geometry is updated - you may want to render faster than it needs changing (e.g. I had a noise-based water renderer that only updated at 10hz despite rendering as fast as it could).


hth
Jack

Share this post


Link to post
Share on other sites
Wow, I was not aware of OpenMP, It looks fantastic.

My problem is, as an indie developer, I havent got £1000 to throw at MSVC2005 Pro, so am running Standard. No OpenMP :(

Strange that there is an option in Properties>Configuration>C++>Language to enable OpenMP support if MSVC Standard doesn't support it.

Is there any way to download and install OpenMP?

Microsoft give you hope then they take it away! :)

----

EDIT: Hmm, I even have vcomp.dll in C:\Program Files\Microsoft Visual Studio 8\VC\redist\x86\Microsoft.VC80.OPENMP

Share this post


Link to post
Share on other sites
Quote:
Original post by sipickles
int i = val & 15;

if ( i&1 )

These translate to (val % 16) and (i % 2). The '%' operator in HLSL works for floating point types.

General bitwise ops don't translate so well, but if all you need is this kind of calculation, then you shouldn't have a problem.

Share this post


Link to post
Share on other sites
Quote:
Original post by sipickles
Wow, I was not aware of OpenMP, It looks fantastic.
It's where parallel programming needs to be. We shouldn't have to deal with OS-level threads unless we're doing something very specialized.

Quote:
Original post by sipickles
My problem is, as an indie developer, I havent got £1000 to throw at MSVC2005 Pro, so am running Standard. No OpenMP :(
I didn't realise it was only in the Pro/TS SKU's [headshake]

Quote:
Original post by sipickles
Is there any way to download and install OpenMP?
Not any that I know of, but I have VSTS so I've not needed to look elsewhere. Maybe someone else can tell you some names...

hth
Jack

Share this post


Link to post
Share on other sites
Shame, looks like I will have to resort to boost::thread.

On the plus side, I took your advice and throttled the update call to my simplex noise water generator to every 50ms ....

Bingo! FPS x 200% !!!

Sometimes you forget the simplest answers in the search for complex ones [totally]

Share this post


Link to post
Share on other sites

This topic is 3742 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this