State-of-art in real-time software rendering?

Started by
18 comments, last by MJP 4 years, 9 months ago
2 minutes ago, Green_Baron said:

And i just wouldn't run a graphical surface at all on a safety critical system. What for ?

Many safety-critical computer vision tasks require image filtering, geometry transformation and texture sampling without drawing to any screen. Instead you get a camera stream and control engines.

Fighter jets also need a gui for showing the missile tracker to the pilot.

Advertisement

if you want to compare numbers:

  • what is the perf of your rasterizer -> ms/frame
  • what hardware are you running on
  • what test scenes have you used?

I've posted here somewhere my 1 Billion Triangle realtime rasterization, not sure where that is gone. I think it was in "screenshot of the day" or something. There were framerate numbers ?

12 hours ago, Dawoodoz said:

* Ubuntu often comes with software emulated GPU drivers that are many times slower for 2D overlays. Installing drivers is usually a month's work of rebooting the X-server and manually modifying and recompiling the Linux kernel with a level 4 Nvidia support technician. Even our company's senior Linux admin called it a real headache. Now imagine a 12 year old gamer given the same task with one hour of patience before rebooting on Windows.

I used to work at Canonical (the makers of Ubuntu) on 2d overlay software (the Unity desktop, the Mir display server).  We had a full-time expert who worked with all the GPU vendors to make installing the proprietary drivers a smooth and hassle-free experience, and the vendors would fall over themselves to help (Linux is a huge market for the GPU vendors, what with commercial render farms AI applications and bitcoin mining, unlike Windows where the markup on consumer-grade hardware is negligible).  Using vendor-supplied graphics drivers on Ubuntu has been a painless and easy experience for at least a decade now and it's only gotten easier.

Ubuntu didn't install the proprietary drivers out of the box for licensing reasons.

12 hours ago, Dawoodoz said:

* In safety critical systems, the customer will usually specify "CPU only" because of safety concerns.

I now work for a popular realtime embedded OS company on safety critical systems.  Customers are falling all over themselves trying to get GPU-based software to run on the virtual machines (!) they install in safety critical systems.  Of course, the truly hard-core safety systems don't have any kind of graphical display (think: the brakes in your car or the controller for your stabilizer trim on your 737-MAX) and the kernel itself bans even floating-point operations.  The rest: lane deviation detection systems, cockpit displays, CAT scanners, they all use GPUs and SSE/NEON SIMD extensions like there's no tomorrow.

Honestly, I think it's scary what they (and the ISO 61508/26262 auditors) consider safe.  Stay off the roads and move to a cave, folks, it's all gonna collapse some day.

Mean time, I encourage you to pursue a CPU-only rendering engine.  I knew someone who worked for a company that did just that and could beat GPU-based software in many specialized cases 10 years ago.  Turns out there wasn't enough margin in it and the company went in the direction of a side project in animation software that was actually making money.  I think there's probably a niche market for what you're trying to do.

Stephen M. Webb
Professional Free Software Developer

57 minutes ago, Krypt0n said:

if you want to compare numbers:

  • what is the perf of your rasterizer -> ms/frame
  • what hardware are you running on
  • what test scenes have you used?

I've posted here somewhere my 1 Billion Triangle realtime rasterization, not sure where that is gone. I think it was in "screenshot of the day" or something. There were framerate numbers ?

My perspective 3D rasterizer is not optimized yet, because what the CPU is really good at is isometric graphics where you don't need a floating-point division per pixel.

The CPU used for testing is a very old quad-core Intel Core i5-4690K to be the minimum requirement. Performance scales up well if you use more cores, so it ran a lot faster on an octa-core I7 which might become the recommended specification. Planned optimizations can potentially make multiple lights faster by processing highly overlapping light sources in the same pass to reduce cache load from reading positions and normals.

Results on my old quad-core I5 running the isometric deferred engine:
940x512 runs at around 13 ms per frame with two dynamic global point lights and unlimited detail level.
1920x1080 can currently handle one dynamic point light at around 17 ms per frame, so this resolution is currently too much for quad-core but should be no problem for octa-core.

Isometric 2.5D to take advantage of the CPU's cache prediction
Each model is pre-rendered into diffuse, height and normal maps. The baking application generates two triangles per pixel in the source texture and can have more polygons than RenderMan. Maximum-height-drawing uses diffuse, depth and normal buffers for rendering 3D models as 2D sprites, which the CPU is really good at by reading memory in order.

Background
The static background has dynamic light, by storing blocks of 512² diffuse, normal and height images. These are generated from backgrounds sprites while the camera moves to allow having one sprite per brick in a castle if you want. Drawing the background is then mostly a sequence of memcpy calls, which is fast even when not aligned.

Light pass
The screen buffers for diffuse, normal, height and light are all using 32 bits per pixel to align well when illuminating with deferred light sources in multi-threaded SIMD filters. Once we have the light image, just multiply by diffuse times two to get a little HDR feeling to the bright spots. I tried doing the multiplication using 16-bit integers, but Intel CPUs are computing so fast compared to memory that it did no difference to performance.

Broad-phase
Culling for sprite drawing and light rendering uses 8 dynamically expanding octrees in their own directions from the origin, so the caller is free to place the models in any location. Tiles were used for this example just to fill the background until I have a rendering system for ground and vegetation.

Future plans
I plan to add the ability of combining polygon animation of characters with the pre-rendered sprites so that animations can adapt well with bone animation.

Isometric_deferred_light.png

1 hour ago, Bregma said:

and the kernel itself bans even floating-point operations.

I did that in my GUI rendering script by creating a fixed-precision 16.16 type. This ensures 100% bit-exact determinism for the 2D overlays.

That looks really nice, and the performance seems to be good, but it's also a very specialized implementation, it would be hard to compare, unless I implement all the shading etc. as well.

Just now, Krypt0n said:

That looks really nice, and the performance seems to be good, but it's also a very specialized implementation, it would be hard to compare, unless I implement all the shading etc. as well.

The main draw-back is that all shaders must be hand-coded using SIMD vectorization, preferrably as separate full-screen passes. However, the engine comes with a hardware abstraction layer for SIMD and tries to stay within the SSE-NEON intersection. This is why LLVM OpenGL implementations cannot be optimal on the CPU, because vectorizing is difficult for compilers when only some shaders can be efficiently vectorized.

Maybe that paper will help you to more easily write CPU shaders that compile to SIMD:

https://software.intel.com/en-us/articles/easy-simd-through-wrappers

20 hours ago, Dawoodoz said:

There are many odd systems out there that cannot access the GPU properly, so that's the reason for my distinction.

* Google tried to block OpenCL on Android in order to promote their own RenderScript, which is not nearly as good.

* Ubuntu often comes with software emulated GPU drivers that are many times slower for 2D overlays. Installing drivers is usually a month's work of rebooting the X-server and manually modifying and recompiling the Linux kernel with a level 4 Nvidia support technician. Even our company's senior Linux admin called it a real headache. Now imagine a 12 year old gamer given the same task with one hour of patience before rebooting on Windows.

* In safety critical systems, the customer will usually specify "CPU only" because of safety concerns.

* I also developed firmware for a platform where the CPU outperformed its low-end GPU because of missing essential OpenGL extensions.

Fair enough ?. Just thought I'd point some of those things out though since they could still potentially work on a CPU as well, since they're not relying on hardware rasterization (or at least not nearly as much). I would also take a look at some of the recent work in raycasting into voxel grids, and/or the cool things that people have been doing with directly rendering signed distance fields.

6 hours ago, Dawoodoz said:

The main draw-back is that all shaders must be hand-coded using SIMD vectorization, preferrably as separate full-screen passes. However, the engine comes with a hardware abstraction layer for SIMD and tries to stay within the SSE-NEON intersection. This is why LLVM OpenGL implementations cannot be optimal on the CPU, because vectorizing is difficult for compilers when only some shaders can be efficiently vectorized.

I may not be understanding the issues here, but I don't think there's anything stopping you from from doing the same "1 shader thread per SIMD lane" style vectorization that's used on GPU's. ISPC supports this model natively, with the added bonus that it's explicit about scalar vs. vector operatations.

This topic is closed to new replies.

Advertisement