Increased frame time with less workload

Started by
10 comments, last by thatguyfromthething 8 years, 10 months ago

Hi all

As my title already suggests I am having a little bit of a weird problem. First of all let me show you a picture of what im talking about.

This is how the scene normally looks right now:

DzslL8e.png

Currently there is virtually no optimization, just frustum culling and "not-rendering" of invisible blocks. As you can see it renders with over 250 FPS. However if I turn 90° to the left, watch what happens:

bHtKBbu.pngThere are less triangles rendered (indices/3) and also fewer draw calls in the second image, but the frame time nearly doubled. I am baffled why this happens. The geometry is exactly the same, the exact same rendering setup, same shaders, there is no branching in the shaders that might only happen in the second picture. Its also not depending on the light direction, when i invert the light direction so that the first image in the situation above gets hit by specular light and the second is that dull gray its the same effect. The maps are also generated randomly at each start, so this also can be ruled out.

I really wonder what could cause that issue, does anyone of you have an idea what i should start checking? I already did a lot of CPU profiling where i spend a lot of time in either of the two situations and compare the render loop for changes, but so far i havent had any luck. Could it be something on the GPU? Even if everything is pretty much identical?

Thanks in advance,

Plerion

Advertisement

Is backface culling enabled?

SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.

Oh, i thought it was, but actually it wasn't, forgot to turn it back on after the last experiments, which makes it even strange for me since it means every triangle went through the entire stage. I enabled it now but the difference remains, just on a higher level

PQcNm1I.pngWqaq4Kp.png

One's taking 2.67 milliseconds, and the other one is taking 3.61 milliseconds. FPS can give misleading measurements. Or rather, FPS makes the measurements seem more dramatic than they already are.

Losing a single frame from 20 FPS to 19 FPS is a much bigger deal than losing a single frame from 100 FPS to 99 FPS.

That said, you're still measuring some difference (one is taking 1 millisecond more per frame).

Your two screenshots look different though. One looks like the blocks are casting shadows (or is that AO?) with some gradient shading and rounded edges, and the other is really sharp with flat shading. They look significantly different, at least to my beginner's eye.

Are you positive you didn't accidentally map your rotation keys also to toggling on/off special effects?

Yes, im aware of the fps vs frame time difference, i also calculated that 1ms difference which seemed strange to me.

The "slow" version is facing in the direction where the specular reflection on the blocks is at its maximum, the "fast" version is 180° in the other direction with absolutely no specular lighting. The shadows are AO, everything is rendered using the same code on CPU and GPU. Since AO and also the exact lighting isnt fully implemented yet it looks a bit odd, like the specular lighting trumps the AO which makes it look a lot different. The important part is, that there is currently only one path to rendering the blocks and it uses all the same states.

I suspected that it might be that in one direction my frustum/box intersection code might get more cases where it can early out, but profiling both situations over a longer time period yielded no major difference (fast version: 2.6% of the samples in intersection, slow version 3.1% of the samples in intersection, the difference might just be because one version was sampled over a longer period).

Here is what the profiler says for both version considering the functions doing the most work:
fast version:

something in nvoglv32.dll - the actual rendering i guess -> 28.67% Exclusive Samples

NtGdiDdDDIEscape - no idea what that is, nor ever seen in any other project -> 26.09%

NtGdiDdDDIGetDeviceState - same -> 9.59%

Math::Frustum::intersects -> 4.34%

RtlQueryperformanceCounter -> 3.16%

slow version -> same functions, same order, only %-values:

28.59%

26.22%

9.74%

4.31%

3.40%

Some pretty much the same for CPU sampling.

I would guesd that one image is simply drawing more pixels than the other.

What's your overdraw like? Are the triangles drawn in any particular order?

@Hodgman I tried to find two positions where they have about the same frame rate and frame time:

The so called "slow" facing:

YUKQrG1.png

The "fast" facing:

xgpwW2n.png

The frame times are very interesting i think. The "pre frame time" is a bit a misleading title, its the actual time from glClear until SwapBuffers. The "Post Frame Time" is the time from the beginning of SwapBuffers to the end of SwapBuffers. SwapBuffers is also calling that NtGdiDdDDIEscape which gets most of the samples in the profiler. This is also where the difference happens when im using roughly the same part of the landscape for both facings. The slower uses 2.6ms in SwapBuffers, the faster one uses 1.6ms in SwapBuffers. The "pre Frame Time" (the drawing itself) is proportional to the number of indices/draw calls.

I assume the above is because the draw commands are actually executed when SwapBuffers flushes the queue? But what is this NtGdiDdDDIEscape function thats using up most of the time in SwapBuffers according to the profiler?

I suspect your low frame rate is hit when you look in a direction that causes polys to be drawn to the viewport back to front.

A depth prepass would be an easy test.
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.
Is your hidden surfaces removal aware of other chunks or only the current one? (assuming chunks)

(Are hidden surfaces on chunks edges removed?)
That could be why render order might matter.

o3o

not really about the problem, but for more visual realism even with just the light you have consider all blocks dark/black from the start, then only adding positive diffuse/specular multiplied by occlusion (normalized from 0 to 1). so that occlusion only reduces added light, and neither diff nor spec can substract brightness.

This topic is closed to new replies.

Advertisement