Measuring frame time with AFR enabled

Started by
1 comment, last by Paxi 9 years ago

I am currently doing some measurements of a scene containing about 80k vertices that implements soft shadow mapping using PCF filter.

When im measuring the frame time with SLI enabled (using force alternate frame rendering 1) both GPUs are utilized but the performance drops. I can understand that behavior when using a low resolution where I achieve 1000+ FPS due the GPU synchronization overhead. However i dont understand that the performance is still lower when I increase the rendering load and the resolution. I have already took a look into the NVIDIA SLI optimization guide here:http://http.download.nvidia.com/developer/presentations/2005/GDC/OpenGL_Day/OpenGL_SLI.pdfbut there should be no issue concerning the facts from the presentation.

Can anyone tell me what I may be missing? Do I maybe need a SLI profile for my application?

I am using 2 GF 970GTX in SLI with the latest drivers installed.


PS: I am currently trying to look into this using NVIDIA Nsight, however it does not record any GPU frames when I enable force alternate frame rendering 1.

PS2: In the meanwhile i figured out that glGetQueryObjectui64v takes more time when SLI is enabled. I am now double buffering my results which gives me likely more accurate results (http://www.lighthouse3d.com/tutorials/opengl-short-tutorials/opengl-timer-query/). I figured that out because the FPS increases well but the frame time does not when using glGetQueryObjectui64v.

...

In the meanwhile I am assuming that the problem is just related to my way of measuring stuff:

So I am actually interested in how to correctly measuring frame time with AFR enabled without stalling the GPU/CPU using the Opengl timer_query object.

Advertisement

Query objects can usually affect the behavior of SLI in that you usually want to read the query result, N frames after its submitted, where N is the number of GPUs in the AFR group. There should be a more up to date SLI best practices from Nvidia floating around the interwebs. Don't remember the exact name of the document or have a link.

Thanks for your input!
You may be talking about that slides: http://developer.download.nvidia.com/whitepapers/2011/SLI_Best_Practices_2011_Feb.pdf right?

I am quad buffering my queries in the meanwhile (code is not final yet but works for a hardcoded quad buffered example)


template<size_t QueryCount>
		class BufferedAsyncTimer {
		public:
			void init() {
				mFirst = 0;
				mSecond = 1;
				mThird = 2;
				mFourth = 3;

				glGenQueries(QueryCount, mQueries[mFirst]);
				glGenQueries(QueryCount, mQueries[mSecond]);
				glGenQueries(QueryCount, mQueries[mThird]);
				glGenQueries(QueryCount, mQueries[mFourth]);
			}

			void start() {
				glBeginQuery(GL_TIME_ELAPSED, mQueries[mFirst][0]);
			}

			void stop() {
				glEndQuery(GL_TIME_ELAPSED);
			}

			double getElapsedTimeMS() {
				GLuint64 result;

				glGetQueryObjectui64v(mQueries[mFourth][0],
					GL_QUERY_RESULT, &result);


				if (mFirst == 0) {
					mFirst = 3;
					mSecond = 2;
					mThird = 1;
					mFourth = 0;
				}
				else if(mFirst == 1) {
					mFirst = 0;
					mSecond = 3;
					mThird = 2;
					mFourth = 1;
				}
				else if (mFirst == 2) {
					mFirst = 1;
					mSecond = 0;
					mThird = 3;
					mFourth = 2;
				}
				else if (mFirst == 3) {
					mFirst = 2;
					mSecond = 1;
					mThird = 0;
					mFourth = 3;
				}

				return static_cast<double>(result) / 1000000.0;
			}

		private:
			GLuint mQueries[4][QueryCount];
			GLuint mFirst, mSecond, mThird, mFourth;
		};

The frame time when measuring with AFR enabled is not slower anymore but exact the same speed as with single GPU.
However the FPS increases from ~600 (single) to ~930 even when receiving the results (so there should be no stalls anymore). Guess there is still a little thing I am missing...

This topic is closed to new replies.

Advertisement