Jump to content

  • Log In with Google      Sign In   
  • Create Account

Erik Rufelt

Member Since 17 Apr 2002
Online Last Active Today, 03:07 PM

#5208885 Next-Gen OpenGL To Be Shown Off Next Month

Posted by Erik Rufelt on 05 February 2015 - 08:51 AM

So, if GLNext comes out soon and is technically equal to Mantle, then maybe it will end up with the first strike bonus, and will kill off Mantle before its even born.


Didn't AMD offer Mantle as a base for GLNext?

One possibility is that AMD simply settled with NV and Intel and merged with GLNext instead of trying to compete with them.

Or that Mantle will be to GL/D3D like CUDA is to CS/CL for Nvidia.

#5208404 Is it possible to create some sort of "custom" stencil buffer?

Posted by Erik Rufelt on 03 February 2015 - 08:50 AM


Check the bottom of that page about unordered access buffers, seems like it would do what you want.

#5208129 Comparison of STL list performance

Posted by Erik Rufelt on 01 February 2015 - 08:50 PM

I think there would be more allocations/deallocations in the c++ version due to copy constructor (if the compiler doesn't compile that out).


In your custom implementation, you first malloc() the 'Widget', and then you malloc a 'ListLink' to store it in, 2 allocations for each node.

std::list will create an internal type that is 'sizeof(Widget) + sizeof(ListLink)' and do one single allocation per node.

That's why you saw the same performance when you changed the std::list store pointers that were allocated with new, as that version also had 2 allocations per node with that setup, 'new Widget' and new 'sizeof(pointer) + sizeof(ListLink)'.


Try timing the loop with 10 million allocations only without any list, and I think you will find that it's not too far from the time of the allocation-step in your list-test.


If you want to count the number of allocations for a C++ implementation, you can override operator new and increment a counter before calling malloc from it.

int g_countNew = 0;
int g_countDelete = 0;

void* operator new(size_t size) {
  return malloc(size);

void operator delete(void *mem) {

#5207979 HLSL d3dcompiler_47 problems

Posted by Erik Rufelt on 31 January 2015 - 07:30 PM

How do you know it's not optimizing it?

Did you actually benchmark the different methods, and if so what GPU and driver were you using?

Any output from the HLSL compiler is some form of cross-platform ASM and I'm pretty sure the driver will re-arrange it in all kinds of ways depending on the underlying architecture.

#5207807 Comparison of STL list performance

Posted by Erik Rufelt on 31 January 2015 - 12:25 AM

I certainly don't mean to contradict any of the good points raised for real applications, but the particular synthetic "benchmark" posted is much much simpler, and can be pretty nicely explained by comparing the running time of the following for-loops.

for(i=0; i < 1000*1000*10; i++) {
  delete (new int);

for(i=0; i < 1000*1000*10; i++) {

#5207618 Comparison of STL list performance

Posted by Erik Rufelt on 30 January 2015 - 12:15 AM

Investigate the ASM output to find out.

I suspect you're actually  measuring the time to do 2 million malloc in your C code compared to ~1 million std::list.

#5207417 Efficient way to erase an element from std::vector

Posted by Erik Rufelt on 29 January 2015 - 06:44 AM


Wouldn't it be enough to test for std::is_trivially_copyable? Requiring the whole more restrictive POD-property seems unnecessary.

Yep. I forgot that C++'s definition of POD is now stupidly restrictive. I usually use "POD" to mean "memcpy-safe", which apparently C++ calls "trivially copyable" now sad.png



I would even say is_trivially_destructible. Though I'm not entirely sure on that one, and actually for performance-reasons I don't think it can even be determined from such traits, as it would depend on their internals. But for the general complexity it's interesting.


As far as I see it, swap is always better than copy/assign (whether or not trivially) when there are parts of the object that need to be destructed, as otherwise any non-trivial original contents of [index] would be destructed before replaced with their new values from back(), and then back() would be destructed at some point (though it could of course at times be optimized out to a copy in the assignment, but worst-case).

When swapping out the objects internals, no potential extra destruction can take place.


However, there is another case where swap is worse, which is when there is no destructor so the object removed at back() never needs to be destructed. At that time it is completely unnecessary to swap the contents instead of just copying them (if copying would be faster, which may or may not be the case of course). For a trivially copyable object it's obvious, one memcpy instead of swapping.


What I was originally wondering was whether there is any chance the compiler can assume that the pop_back() means the memory at back() is undefined and therefore optimize away the swap and replace with a memcpy when appropriate. If destruction of an object means that the contents of the memory that object occupied is undefined afterwards it could theoretically.. and perhaps even realistically in simple inlined cases like this. I'm not sure that is actually a rule though?

#5205890 How to properly switch to fullscreen with DXGI?

Posted by Erik Rufelt on 21 January 2015 - 06:42 PM

I am talking about windowed mode switches. Let's say you want 720p and have a 768p monitor. No problem. But now if you switch to 768p, because of you window border and probably non-zero window location, the bottom right corner of the window is off-screen. This wouldn't be a problem except for Windows API: once the sizable window goes out of bounds, Win API will start to clip it. Asking for a 900p window for example will clip it randomly to a value that is neither 768p nor 900p. Even asking for 768p will clip it to a bit bellow.This only happens if the border is sizable and I found no way around it. For frame-less windows it behaves as expected.


You are really doing things you shouldn't be doing.

If you want a 720p front buffer on a 768p monitor, then you're not in fullscreen in the first place. In that case yes, cover the area you want with a window is a reasonable option, but then most certainly disable DXGI fullscreen. And again, the swap-chain is not and should not be in fullscreen then.

One simple option in that case is to simply set the view-port to a 720p area on a normal 768p fullscreen back-buffer, to avoid the problem.


You seem to be doing some pretty weird things that makes matters much more complicated than they need to be.

#5205659 How to properly switch to fullscreen with DXGI?

Posted by Erik Rufelt on 20 January 2015 - 06:37 PM

If you want DXGI to handle your fullscreen switches for you, don't touch the window. If you want to change window parameters, disable DXGI handling using MakeWindowAssociation and handle it yourself. When DXGI monitors your window it will automatically handle changes to the window as well as change the window when needed. Two separate programs (yours and DXGI) trying to simultaneously handle the same window is asking for trouble and sync issues.


Not sure what you mean by switching to larger modes, if you let DXGI handle your window and change to an actual enumerated mode of an output it should automatically handle the window so it covers that output. If you make the window cover more area than the monitor and want the offscreen parts drawn to (or areas on another monitor), then you don't want fullscreen mode (and can't really have it, as it is defined as exactly the area of one monitor).


As far as I can tell the window itself doesn't really matter in true fullscreen.. and you can get pretty interesting issues by resizing the window to a smaller size than the screen while still drawing in fullscreen. What happens is the screen will still be filled without clipping to the window, and any background windows will look funny at best afterwards.

#5205187 How to properly switch to fullscreen with DXGI?

Posted by Erik Rufelt on 18 January 2015 - 07:55 PM

The application starts correctly, but first Alt-Enter leaves me with a nonfunctional window. Second crashes.

It's better to start in window mode and then switch to fullscreen. It's discussed under Remarks here: http://msdn.microsoft.com/en-us/library/windows/desktop/bb174537%28v=vs.85%29.aspx


I updated the sample to allow starting to fullscreen and changed the "unknown" format for the back-buffer which is probably better if there are different formats (like 30-bit support on some modes). Check the link below for the new code.


@Erik Rufelt: Can you throw that code into a GitHub / Codeplex repository? It would be great to have a few samples to point at when questions like these come up (which honestly seems to be more frequent than expected...).


#5205009 How to properly switch to fullscreen with DXGI?

Posted by Erik Rufelt on 17 January 2015 - 09:43 PM

No, I know how stretching looks like. My artifacts look like if scan-lines would be randomly shortened. So should DXGI be in charge of going fullscreen or not? Because if not, I'll disable the mode switch and do it myself. I can't work any worse than it does today...

But my trend is not up, but down. Something is terribly wrong. Even if I ignore this, the difference compounds. With vsync and 8x MSAA and an empty scene, my framerate drops from 60 to 38 when going fullscreen.


The scanlines sound like a broken GPU or driver bug. Performance could be fullscreen somehow fails and thinks it's occluded by another window or something I guess..



Does anybody have a link to a very well behaved DirectX 10/11 tutorial in C++?


Found this test that I did a couple of years ago.. still works well for me on Win7. Numpad + and - changes between available modes, and standard DXGI alt enter for fullscreen.

I know that they have changed some things for Win8 and store apps so if you're using that you probably want to read the docs on the new swap effects and format requirements...

#include <windows.h>
#include <D3D11.h>
#include <stdio.h>

#pragma comment (lib, "D3D11.lib")


// Main
int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow) {
	// Register windowclass
	ZeroMemory(&wc, sizeof(wc));
	wc.cbSize = sizeof(wc);
	wc.lpszClassName = TEXT("MyClass");
	wc.hInstance = hInstance;
	wc.lpfnWndProc = WndProc;
	wc.hCursor = LoadCursor(NULL, IDC_ARROW);
	wc.hIcon = LoadIcon(NULL, IDI_APPLICATION);
	// Create window
	HWND hWnd = CreateWindow(
		TEXT("D3D11 Window"),
	// Create device and swapchain
	IDXGISwapChain *pSwapChain;
	ID3D11Device *pDevice;
	ID3D11DeviceContext *pDeviceContext;
	ZeroMemory(&scd, sizeof(scd));
	scd.BufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
	scd.SampleDesc.Count = 1;
	scd.BufferCount = 1;
	scd.OutputWindow = hWnd;
	scd.Windowed = TRUE;
	HRESULT hResult = D3D11CreateDeviceAndSwapChain(
	if(FAILED(hResult)) {
		MessageBox(NULL, TEXT("D3D11CreateDeviceAndSwapChain"), TEXT("D3D11CreateDeviceAndSwapChain"), MB_OK);
		return 0;
	// Render target
	ID3D11Texture2D *pBackBuffer;
	ID3D11RenderTargetView *pRTV;
	D3D11_TEXTURE2D_DESC backBufferDesc;
	pSwapChain->GetBuffer(0, __uuidof(ID3D11Texture2D), (void**)&pBackBuffer);
	pDevice->CreateRenderTargetView(pBackBuffer, NULL, &pRTV);
	// Mode switching
	UINT currentMode = 0;
	bool modeChanged = false;
	BOOL currentFullscreen = FALSE;
	pSwapChain->GetFullscreenState(&currentFullscreen, NULL);
	// Main loop
	ShowWindow(hWnd, nCmdShow);
	bool loop = true;
	while(loop) {
		MSG msg;
		if(PeekMessage(&msg, NULL, 0, 0, PM_REMOVE) != 0) {
			if(msg.message == WM_QUIT)
				loop = false;
			else {
		else {
			// Check for mode change input
			bool changeMode = false;
			if(GetAsyncKeyState(VK_ADD) & 0x8000) {
				if(!modeChanged) {
					changeMode = true;
			else if(GetAsyncKeyState(VK_SUBTRACT) & 0x8000) {
				if(!modeChanged) {
					changeMode = true;
				modeChanged = false;
			// Change mode
			bool changedThisFrame = false;
			if(changeMode) {
				IDXGIOutput *pOutput;
				UINT numModes = 1024;
				DXGI_MODE_DESC modes[1024];
				pOutput->GetDisplayModeList(scd.BufferDesc.Format, 0, &numModes, modes);
				if(currentMode < numModes) {
					DXGI_MODE_DESC mode = modes[currentMode];
					TCHAR str[255];
					wsprintf(str, TEXT("Switching to mode: %u / %u, %ux%u@%uHz (%u, %u, %u)\n"),
						mode.RefreshRate.Numerator / mode.RefreshRate.Denominator,
					changedThisFrame = true;
				modeChanged = true;
			// Check fullscreen state
			BOOL newFullscreen;
			pSwapChain->GetFullscreenState(&newFullscreen, NULL);
			// Resize if needed
			RECT rect;
			GetClientRect(hWnd, &rect);
			UINT width = static_cast<UINT>(rect.right);
			UINT height = static_cast<UINT>(rect.bottom);
			if(width != backBufferDesc.Width || height != backBufferDesc.Height || changedThisFrame || newFullscreen != currentFullscreen) {
				// Recreate render target
				pSwapChain->GetBuffer(0, __uuidof(ID3D11Texture2D), (void**)&pBackBuffer);
				pDevice->CreateRenderTargetView(pBackBuffer, NULL, &pRTV);
			// Remember fullscreen state
			currentFullscreen = newFullscreen;
			// Clear backbuffer
			float color[4] = {1.0f, 0.0f, 1.0f, 1.0f};
			pDeviceContext->ClearRenderTargetView(pRTV, color);
			// Present
			pSwapChain->Present(1, 0);
	// Release
	pSwapChain->SetFullscreenState(FALSE, NULL);
	ID3D11Debug *debugInterface = NULL;
	//hResult = pDevice->QueryInterface(__uuidof(ID3D11Debug), reinterpret_cast<void**>(&debugInterface));
	//if(FAILED(hResult)) {
	if(debugInterface != NULL) {
	UnregisterClass(wc.lpszClassName, hInstance);
	return 0;

// Window procedure
	switch(msg) {
		case WM_DESTROY:
		return 0;
	return DefWindowProc(hWnd, msg, wParam, lParam);

#5204932 Porting my game to Linux

Posted by Erik Rufelt on 17 January 2015 - 10:07 AM

Will g++ suffice as my compiler? My game uses C and C++. I'm also a bit of a noob to makefiles...


I haven't done any extensive Linux development, but I ported a game to Linux using this IDE: http://codelite.org/

It feels reasonably close to VC++ and Xcode in general.

#5204736 How to properly switch to fullscreen with DXGI?

Posted by Erik Rufelt on 16 January 2015 - 10:59 AM

Your FPS don't matter as they're so high, and 670 to 630 means 95 microseconds which is nothing.

Display average frame-time in milliseconds or microseconds instead of the number of frames per second to get an accurate picture.


The GPU doesn't become faster in fullscreen. What it will do is use (usually) much better synchronization, to avoid artifacts from bad vsync etc.

It can often make sure that if one frame takes 10 ms, the next takes 13 ms, and then the third 10 ms again, they will all be displayed at 11 ms intervals if that matches the monitor refresh rate, instead of allowing such small random changes to cause stuttering.


DXGI can either be instructed to simply go into fullscreen mode on the current desktop resolution (default), or actually try to change the display mode to match your  back-buffer resolution (mode-switch). I would recommend against mode switching.


Lastly, if the back-buffer size doesn't match the monitor size, it will be stretched. I guess this is where your artifacts come from. You need to handle for example WM_SIZE or in some other way detect when your window is resized, and when that happens Release your render target(s) and references to the back-buffer. Then call ResizeBuffers to resize the back-buffer to the new correct size and re-create your render target views and reset the view-port to the new size etc.

#5201802 unexplainable errors? Ever have one of those?

Posted by Erik Rufelt on 04 January 2015 - 02:57 PM

I would guess outdated files in the build folder or similar. If you first do clean, then exit the IDE, delete all temporary object files and precompiled headers etc. and re-open the project it might fix it. I get such errors many times where I have to do clean and rebuild because the compiler doesn't understand something has changed so it uses for example an outdated version of an updated struct in one file so sizeof() doesn't match what is used in another file and similar.

#5201754 Is it a good idea to block android phones that cause lots of trouble?

Posted by Erik Rufelt on 04 January 2015 - 10:05 AM

Try to get a device like that and find the bug, or if your app has internet-permissions possibly add logging to a remote database so you can see what happens.

If that's not possible, then probably yes, block them.