software pixel shaders

Started by
33 comments, last by Uthman 20 years, 7 months ago
hm.. i think those ddXYZ funcs are sort of bullshit anyways, as pixels should be threated individually. espencially with branching/looping they make simply NO sence anymore (well, numeric derivatives at least.. i DO understand their purpose for the real situation..)

i''m interested how often they would get used at all.. well i wouldn''t use them. what i would need definitely is a very fast branching/looping unit, wich does not excecute different pathways, but only the required ones.

..

i have no real solution for this eighter.. but yeah, for non-branching pixelshaders, you could perform 4 pixels in parallel and gain much speed. tested myself, got a 2x increase from normal sse code in my raytracer..

definitely a good thing for amd64 wich has 2x the amount of sse registers:D can''t wait for my one.. and i love that it isn''t that deepely pipelined => branches don''t hurt that much. and its memory performance is amazing. i can''t wait:D

there''s one other way (possibly) wich you "could" do..

a scanline-derivative buffer..

means you allocate a buffer, width x 2 x dd-instructions in size.

and then you excecute your code, and as you excecute each pixel after the other, you could just store its values at the dd-instructions in the buffer. why width x 2 x dd? because for ddy you need the info of the previous scanline above you.. (or so.. could be done more optimal, but i''m tierd..)
//then, calculating the ddx would meanddx(float4& dst,float4 cur) {dst = cur - dd_cur_buffer[ddinstr + ddinstrcount*(pixelxpos - 1)];dd_cur_buffer[ddinstr + ddinstrcount*pixelxpos] = cur;++ddinstr;}//andddy(float4& dst,float4 cur) {dst = cur - dd_old_buffer[ddinstr + ddinstrcount*pixelxpos];dd_cur_buffer[ddinstr + ddinstrcount*pixelxpos] = cur;++ddinstr;}


and for each scanline you would
std::swap(dd_cur_buffer,dd_old_buffer); // just pointers:D 


ddinstrcount == amount of dd-instructions in the whole program, pixelpos == current xpos from the left of the screen..
ddinstr, initialized to zero for each incomming pixel, gets incremented per dd-instruction..

something like this.. i hope you get the idea..


is that a good idea? or not? dunno.. as i said yet, i''m tierd as hell..



If that''s not the help you''re after then you''re going to have to explain the problem better than what you have. - joanusdmentia

davepermen.net
If that's not the help you're after then you're going to have to explain the problem better than what you have. - joanusdmentia

My Page davepermen.net | My Music on Bandcamp and on Soundcloud

Advertisement
quote:Original post by davepermen
hm.. i think those ddXYZ funcs are sort of bullshit anyways, as pixels should be threated individually. espencially with branching/looping they make simply NO sence anymore (well, numeric derivatives at least.. i DO understand their purpose for the real situation..)


I agree. They had issues from the beginning and shouldn''t have been added to ps 3.0. In ps 2.0 they are no problem because there is no branching, but then it''s a bad idea to remove them again for ps 3.0

quote:i''m interested how often they would get used at all.. well i wouldn''t use them. what i would need definitely is a very fast branching/looping unit, wich does not excecute different pathways, but only the required ones.


They can be used very often. Every time you do a transformation on your texture coordinates, you have to use them to have correct gradients for the mipmap level selection. This is very important, because otherwise the texture might look very blurry or aliased. I''ve seen this many times in demo''s with environment mapping and other situations where the texture coordinates are not linearly across the polygon. In fact, this method of calculating mipmap level is already done implicitely for the fixed-function pipeline, as it is nearly for free.

The ''ideal'' solution is to compute the gradients analytically. Like for example if you apply a log(x) on your texture coordinates, you do 1/x for your gradients. The problem is that it''s not an attractive solution. First of all you would need a method to get your original gradients. However, these vary non-linearly over the polygon. So it requires extra operations to do it right. And a division like in the above example isn''t exactly fast either. So simply looking at the neighboring pixels is a much faster solution.

But like I said, it doesn''t work with branching. How bad is that? Well, the DirectX REF returns 0 as the gradient when the surrounding pixels execute a different branch. The result of this is that the highest precision mipmap is always selected. So this means that pixels which are surrounded by pixels that execute another branch use completely different mipmap levels. This causes very strange looking aliasing effects, like if the polygons isn''t ''smooth'' any more. It''s really ugly.

quote:i have no real solution for this eighter.. but yeah, for non-branching pixelshaders, you could perform 4 pixels in parallel and gain much speed. tested myself, got a 2x increase from normal sse code in my raytracer..


Nice! I''m certainly going to try it for my ps 2.0 emulator.

quote:definitely a good thing for amd64 wich has 2x the amount of sse registers:D can''t wait for my one.. and i love that it isn''t that deepely pipelined => branches don''t hurt that much. and its memory performance is amazing. i can''t wait:D


Yes, although I''m more waiting for the Prescott, AMD finally has made a come-back for the high-end. It''s going to be interesting how much of an advantage the extra registers and increased memory bandwidth is going to give once applications take advantage of it.

quote:there''s one other way (possibly) wich you "could" do..

a scanline-derivative buffer..


Yes, this solution has also been proposed at the Beyond3D forum. Its only shortcoming is that I''d have to return 0 or an invalid value for pixels with neighbors that take different branches. But it''s probably the best practical compromise. Thanks for reminding me of it!
anyone having info on how ps3.0 should handle derivatives? because it IS rather impossible..

yeah, i''d go for the scanline buffer. and yep, you''ll need quite some branching for the derivation.. namely because you need to know if the above pixel actually IS part of the triangle, else you cannot calc the derivative eighter.. but thats just sort of a cmov..

but its the only idea i can see that CAN actually work..

lets see dx9 specs.. if i''ll find any info



If that''s not the help you''re after then you''re going to have to explain the problem better than what you have. - joanusdmentia

davepermen.net
If that's not the help you're after then you're going to have to explain the problem better than what you have. - joanusdmentia

My Page davepermen.net | My Music on Bandcamp and on Soundcloud

oh, and, yep, if you can, bether calc the real derivative. that works always, even in ps2.0:D



If that''s not the help you''re after then you''re going to have to explain the problem better than what you have. - joanusdmentia

davepermen.net
If that's not the help you're after then you're going to have to explain the problem better than what you have. - joanusdmentia

My Page davepermen.net | My Music on Bandcamp and on Soundcloud

quote:The rate of change computed from the source register is an approximation on the contents of the same register in adjacent pixel(s) running the pixel shader in lock-step with the current pixel. This is designed to work even if adjacent pixels follow different paths due to flow control, because the hardware is required to run a group of lock-step pixel shaders, disabling writes as necessary when flow control goes down a path that a particular pixel does not take.


BAH.. stupid!! really. they do it the slow way



If that''s not the help you''re after then you''re going to have to explain the problem better than what you have. - joanusdmentia

davepermen.net
If that's not the help you're after then you're going to have to explain the problem better than what you have. - joanusdmentia

My Page davepermen.net | My Music on Bandcamp and on Soundcloud

This topic is closed to new replies.

Advertisement