Will software rasterizers ever pass hardware?

Started by
31 comments, last by Zarmax 20 years, 7 months ago
I was pondering this question the other day and wanted to ask the experts out there. Given the speed of the current and future crop of intel/amd processors and speed of memmory access, do you ever see a time when it will be just as fast to go back to doing pure software rasterizers than to rely on differing 3d hardware standards? Granted, the 3d hardware boards will increase in power along with the main CPUs, offer concurrent processing and are becoming more compatible over time. But, given the existing compatibility issues between different 3d boards, a software rasterizer could eliminate those incompatibilities and reduce the overall costs of having to buy both a fast MB/CPU and 3d board. Hence, your application could be available to a larger audience. Just wait for Walmart computers with the 10Ghz 64bit CPU with 4Gigs of memory, which will probably be the standard in 3-5 years. I'm also thinking of the flexibility that a software rasterizer would give since it would be totally programmable. Things like modifying your vertices per frame would not take the big hit it does now, BSPs might make a comeback, blending lots of textures, occlusion testing, etc. What I would really like to see is the elimination of the whole Vid/Main memory thing and go with a unified memory model, but hey, I don't design the stuff. Just curious... [edited by - Zarmax on September 23, 2003 5:02:20 PM]
Advertisement
First of all, you seem to forget that the CPU has to do other stuff as well - game logic, physics, AI. If you do all the rendering on the CPU, there''s no time left for the "interesting" things. Essentially, it''s like using the CPU for emulated floating point ops even though you have a hardware FPU available. Emulating floating point is *a lot* slower, and nobody does it these days for obvious reasons.

Second, GPUs are specially designed for what they''re doing, while CPUs are general purpose. Rendering on the CPU doesn''t use the number of transistors on the chip as efficiently as rendering on the GPU can.

The unified memory architecture thing is a valid point though. In fact, some company (I think it was 3d Labs) produced a graphics chip that treated the on-board graphics memory essentially as a huge cache, which does make sense.
Of course, one can emulate this caching behaviour in one''s own programs, but it''s probably more efficient to do it in hardware or in the driver.

cu,
Prefect
Widelands - laid back, free software strategy
further on the graphics hardware:

many graphics operations are inherrently parralelizable. for instance, one very often imposes the same matrix transform on many vertices at the same time. the hardware is set up to do this in parallel so that you get many vertex transforms at the same time (forget the number on the best cards). general purpose CPUs are linear. even the hyperthreaded ones won''t take advantage of the paralellness of graphics transforms unless you specifically create seperate threads to transform the vertices. even doing that, the max number of simultaneous transforms is going to be low compared to the GPU.

also, GPUs are advancing at the same Moore''s Law rate that CPUs are. so you have GPUs that are significantly better than CPUs getting better at the same rate. GPUs will continue to win.

-me
I can understand the point about emulating things you don''t have to. But, MAME is a great example of the opposite. It runs games that took dedicated hardware to run years back. Java is another good example, with a 2Ghz CPU it is almost acceptable.

What I am getting at is that out 3-5 years from now CPUs will be so fast as to make what we do now with CPU/GPU easily doable on just the CPU. Granted the games will probably advance fast enough to soak up all CPU/GPU capabilities, but at what point do we get into diminishing returns? Real-time raytracing?

Using general purpose hardware to run special purpose stuff (like graphics) will always be slower, but balanced against the flexibilities of a completely software based rasterizer and the advancement of general CPU speed, will it ever make sense again?

I''m not advocating it, I''m just trying to understand the arguments for and against.
When GPUs get fast enough to avoid current issues like fillrate and memory bottlenecks, we''ll start seeing more techniques employed in hardware to expand the power of the chips. Perhaps hardware-accelerated radiosity (hacked versions have been fairly successful) or more efficient cube mapping. There is also the reasonable possibility that GPUs will begin a trend towards raytracing-based (and likely photon mapping-based) techniques once the speed is available. Hardware accelerated raytracing has been shown in a few cases to be near-realtime, so I daresay in the next 5-6 years we''ll start seeing prototypes and earnest experiments in producing a realtime raytracing system.

Of course, to get to this point, we have one major hurdle to cross: memory latency. Raytracing has a very random, non-coherent memory access pattern, so we cannot take advantage of a rasterizer''s coherece and work with a contiguous block of memory for large periods of time. Hardware isn''t my forte so I have no idea how future innovators might get around this, but it will definitely be the largest challenge.

I doubt hardware acceleration will go away any time soon, although rasterizers will almost certainly be slowly replaced by raytracers when they become available. The principle of hardware acceleration isn''t "my CPU is too slow to do it all on the fly;" the principle is "I don''t need to reinvent the wheel, someone else has the nasty part taken care of for me." Writing a 3D engine 10 years ago was hugely different than it is today; we didn''t have things like DX or OGL to simplify our lives. The entrails of a 3D engine are vastly complex and take a lot of work to develop. Pre-hardware accelerated rasterization, there were very few 3D games, and most were based on engines by id software. Once accelerators arrived, they freed up the developers who were unable to make 3D games before.

Anyways, just a few thoughts... there''s still the whole idea of software flexibility to consider.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

hardware acceleration has yet shown to be technically realtime. software raytracing is on its way, and for not that complex situations, it even is able to be realtime today.

software renderers always surpass any hw in terms of features, scalability, and even actual performance. the last one needs clarification:
there''s no way to use as inteligent and as optimized algorithms in hw than it is to do in software. say stencil shadows. you can implement them with some sort of c-buffer if you need to in software. the result looks the same, but it has one big feature: it does not eat (much) fillrate. so you can beat out the biggest bottleneck of doom3. then there are the no-overdraw rastericers, and much more. you can have treelike struktures for meshes, textures, everything. all stuff hw isn''t able to handle well.
so simple said: a software renderer needs technically less work to do, or does less stupid work, as a hw renderer. the hw renderer is just so freaking fast in doing its tasks, that he can still beat out in actual raw performance..

will software beat hw? yes and no. i think they get more intermixable, so hw gets more and more to data-stream processors and like that more of a coprocessor. this is needed to actually further scale rendering speed anyways.

and yes, i think raytracing will come. radiosity? no. it is just not useful. i think photonmapping has a much bigger chance.. or montecarlo based raytracing. or a combination of it.

a general raytracing solution should be able to generate photonmaps, do "simple" raytracing, montecarlo based raytracing, path tracing, possibly even metropolis based solutions without much problems. just like rastericers can draw triangles into all sort of buffers..




we''ll see.


(i at least can''t wait for my athlon64.. branching at high speed.. something both p4 and gpu''s can''t provide..)



If that''s not the help you''re after then you''re going to have to explain the problem better than what you have. - joanusdmentia

davepermen.net
If that's not the help you're after then you're going to have to explain the problem better than what you have. - joanusdmentia

My Page davepermen.net | My Music on Bandcamp and on Soundcloud

something which has been mentioned alot other times i''ve talked with ppl about this is the magic of multiple cpus, giving one task (rastering) to one cpu and running the rest of the stuff on another.
That way you get the dedicated processing as with a hardware gpu with all the flexiblity of a cpu thrown in.
There is always the option that, instead of it being a true cpu, its instead some kinda SIMD chip which is naturaly designed for gfx operations but is more flexable than the current gpus we have so that its a bit closer to a cpu and also access the same ram as the cpu does. And if you could unplug one of these chips to replace with another like you can with cpus even better
(ok, i admit that idea needs more work/explaining but i''m only just forming it in my head, i hope you can see where i''m going with it)
i''m currently just going trough "wald_realtime_raytracing_review2003.pdf".. a huge file, tons of interesting things to read.. Instant Global Illumination makes its way even into highly occluded environments, raytracing makes its way onto single cpus (without subsampling, don''t forget! with subsampling, like realstorm, its a "piece of cake" today..)..

on a p4 2.5gig (non-ht i think.. and not one of the fsb800 and all the crap.. just thinking of a p4ee ht with 800fsb and 3.6gig now.. or prescott soon, hehehehehehe..).. they have "The “ERW6” and “soda” scenes (800 and 2.5
million triangles, respectively) rendered at 1024×1024 pixels
on a single Pentium-IV 2.5GHz CPU using the RTRT
kernel. Including shading, these scenes run at 2.3 and 1.8
frames per second, respectively. Only tracing the rays – i.e.
without shading – RTRT achieves 7.1 respectively 4.1 frames
per second, see Table 1." (figure 4, page 5).

that means.. up to 30fps for shaded scenes at 320x240.. withOUT subsampling.. on 800million triangles... not that bad for a single-cpu trace.. and its not one of the topend-cpu''s eighter..



If that''s not the help you''re after then you''re going to have to explain the problem better than what you have. - joanusdmentia

davepermen.net
If that's not the help you're after then you're going to have to explain the problem better than what you have. - joanusdmentia

My Page davepermen.net | My Music on Bandcamp and on Soundcloud

quote:Will software rasterizers ever pass hardware?

They have already passed them, they always have. Of course you''ve got to specify the criterium for that. I think hardware rendering is just way too limiting to do what you actually want to do. Even shaders have limited programmability, and the trouble with gradient instructions are very serious for hardware implementation. Of course, for games, that need real-time rendering and an approximation of real 3D is sufficient, hardware rendering rules. We can''t deny that. However, games that are just a few years old can now be rendered using software instead of hardware.

The newest CPUs surpass the performance of a TNT2 and certainly are more flexible. I am currently writing a software renderer functionally completely compatible with DirectX 9. The fixed-function pipeline is 64-bit and more precise than any DX8 hardware from ATI or NVidia. The programmable vertex and pixel pipelines are version 2.0 compliant and use IEEE 32-bit floating-point precision at all time. No compromises have been made there. Only my filtering quality is limited to trilinear but this can easily be improved.

I get the maximum performance by using MMX and SSE all the time, and run-time assembling exactly the instructions that construct the pipeline. At the same time I do near-optimal register allocation, better than what I could do manually. So no time is wasted with branching and unnecessary swapping of data. I wrote the assembler for this myself and it''s available under the LGPL: SoftWire. A proof-of-concept from several months ago can be found here: swShader.
quote:What I would really like to see is the elimination of the whole Vid/Main memory thing and go with a unified memory model, but hey, I don''t design the stuff.

Isn''t that what the nForce chipset from NVidia does? But I don''t see the need for it if you have a separate graphics card.
quote:Original post by Palidine many graphics operations are inherrently parralelizable. for instance, one very often imposes the same matrix transform on many vertices at the same time. the hardware is set up to do this in parallel so that you get many vertex transforms at the same time (forget the number on the best cards). general purpose CPUs are linear. even the hyperthreaded ones won''t take advantage of the paralellness of graphics transforms unless you specifically create seperate threads to transform the vertices. even doing that, the max number of simultaneous transforms is going to be low compared to the GPU.

Current Hyper-Threading technology is just a start. We can run two threads in parallel now, but still have the same number of execution units. So the only benifit it brings now is that the available execution units are used more efficiently.

But, there is a great future for Hyper-Threading. There are theoretically no problems to run -all- the threads in parallel. And with extra execution units, there is no reason why it couldn''t get near the performance of GPUs. Plus, this CPU would still be totally programmable instead of hard-wired! With 0.09 micron processes, this could easily be implemented. And with Hyper-Threading you can save on branch prediction logic and invest in decoders. Long latency is not much of a problem any more so you can increase clock frequency even higher. Also, the FPU will become legacy because of its bad design and SSE will be used all around. Expect big things for Prescott and its successors!

This topic is closed to new replies.

Advertisement