fastest way to grab imagebits from a game via injection and backbuffer

Started by
4 comments, last by sayezz 11 years, 2 months ago

Hey,

i need to grab the fullscreen/imagebits/frame from a game into a char array. So I inject my grabberDLL via Detours in the d3d.dll into the "endScene".

I read that accessing the backbuffer is the fastes way doing it. So let me tell how I'm doing it:


unsigned char *bits;

IDirect3DSurface9 *pRenderTarget = NULL;
IDirect3DSurface9 *pDestTarget = NULL;
D3DSURFACE_DESC rtDesc;
D3DLOCKED_RECT rect;

void dumb_buffer(LPDIRECT3DDEVICE9 pDevice){

   pDevice->GetRenderTarget(0,&pRenderTarget);
   pRenderTarget->GetDesc(&rtDesc);

   pDevice->SetRenderTarget(0, pRenderTarget);
	  
  pDevice->CreateOffscreenPlainSurface(rtDesc.Width, rtDesc.Height,rtDesc.Format, D3DPOOL_SYSTEMMEM, &pDestTarget, NULL);

  pDevice->GetRenderTargetData(pRenderTarget,pDestTarget); 
		
  if(pDestTarget != NULL){
	pDestTarget->LockRect(&rect,0, D3DLOCK_READONLY);
	bits = (unsigned char*)rect.pBits;
	pDestTarget->UnlockRect();
	pDestTarget->Release();
   }
  pRenderTarget->Release();
   
}

This works, but it is very slow. As soon as I start this method the game stutters. If i change


  pDevice->CreateOffscreenPlainSurface(rtDesc.Width, rtDesc.Height,rtDesc.Format, D3DPOOL_SYSTEMMEM, &pDestTarget, NULL);

to


  pDevice->CreateOffscreenPlainSurface(rtDesc.Width, rtDesc.Height,rtDesc.Format, D3DPOOL_DEFAULT, &pDestTarget, NULL);

it works a littlebit faster, but then i get a black image. "bits" is then 0.

Another way is this variation: Instead of using this:


pDevice->CreateOffscreenPlainSurface(rtDesc.Width, rtDesc.Height,rtDesc.Format, D3DPOOL_SYSTEMMEM, &pDestTarget, NULL);
pDevice->GetRenderTargetData(pRenderTarget,pDestTarget); 

if(pDestTarget != NULL){
   pDestTarget->LockRect(&rect,0, D3DLOCK_READONLY);
   bits = (unsigned char*)rect.pBits;
   pDestTarget->UnlockRect();
}

i could also use a dynamic texture:


pDevice->CreateTexture(rtDesc.Width, rtDesc.Height,1, D3DUSAGE_DYNAMIC,rtDesc.Format,D3DPOOL_DEFAULT,&texture, NULL);

texture->LockRect(0,&rect, NULL, D3DLOCK_DISCARD);
bits = (unsigned char*)rect.pBits;
texture->UnlockRect(0);

but this does not help. The game still stutters.

Someone suggested to use "getBackBuffer", so the code would look like this:


IDirect3DSurface9* pSrcSurface;
IDirect3DSurface9* pTempSurface;
IDirect3DSurface9* pDestSurface;
D3DSURFACE_DESC pDesc;

void dumb_buffer(LPDIRECT3DDEVICE9 pDevice){


	if(FAILED(pDevice->GetBackBuffer(0,0,D3DBACKBUFFER_TYPE_MONO,&pSrcSurface)))
	return;
	pSrcSurface->GetDesc(&pDesc);

      if(FAILED(pDevice->CreateRenderTarget(pDesc.Width,pDesc.Height,pDesc.Format,D3DMULTISAMPLE_NONE,0,FALSE,&pTempSurface,NULL)))
	return;
	if(FAILED(pDevice->CreateOffscreenPlainSurface(pDesc.Width,pDesc.Height,pDesc.Format,D3DPOOL_SYSTEMMEM,&pDestSurface,NULL)))
	return;
	if(FAILED(pDevice->StretchRect(pSrcSurface,NULL,pTempSurface,NULL,D3DTEXF_NONE )))
	return;
	if(FAILED(pDevice->GetRenderTargetData(pTempSurface,pDestSurface)))
	return;
	
	pDestSurface->LockRect(&rect,0, D3DLOCK_READONLY);
	bits = (unsigned char*)rect.pBits;
	pDestSurface->UnlockRect();

	pDestSurface->Release();
	pTempSurface->Release();
	pSrcSurface->Release(); 
    
}

This does not work for me. I get a black image. "bits" is 0.

In all of this methods i'm using "LockRect(..)". When i comment these three lines:


	//pDestSurface->LockRect(&rect,0, D3DLOCK_READONLY);
	//bits = (unsigned char*)rect.pBits;
	//pDestSurface->UnlockRect();

the game does not stutter, but of course then I get no image.

I wanted to mention that in the code snippet above I always create a new Surface/Texture each frame. At the end of my methode I release then the Surface/Texture. I also tried to initialize the Surface/Textture only once, so that I can use all the time the same Surface/Texture so I dont have to create a new one each frame. This does not help with the performance issue. There is no significant improvement with the frames.

Maybe the problem is that with using "LockRect(..)" I'm blocking my graphic card so the game starts to stutter/freeze for a very short time.

The only thing I want to do is to grab each frame of the game. I do not want to change anything. The problem is that the game has no API to grab the frame and stream it. So i have to do it via injection.

Is there a more efficent way to accomplish this? Maybe with another injection or another method? Maybe using pipelines or queues. I'm not an DirectX expert so any help would be great.

Is there a way to grab the frame without using "LockRect(..)"

I have a 64bit i7 @3,60GhZ 8 core pc with 16GB RAM and a NVIDIA Geforce GTX 670 with 2GB RAM. The Game is VBS2 2.02 with the Engine of the game ArmA II. So my PC must be powerfull enough to handle such a old game.

Thanks!

Regards

Denis

PS: I also tried to grab the Screen/Desktop and also useses DirectShow and GDI, but this does not work when the game is in fullscreen.

Advertisement

Are you sure rect.pBits pointer doesn't get invalidated when you call UnlockRect()? I believe that might be the reason you get 0s.

Do you need the full sized image? Getting the GPU to downsize it first will reduce the memory bandwidth you need. If you want to encode the result as a video I'd also let the GPU do the YUV encoding, as that will reduce the size too.

You probably also want to at least double buffer the system memory copy (that is don't lock it on the same frame that you do the copy).

Alternatively just use FRAPS which has already solved the problem.

Thanks for your quick response! I dont think that rect.pBits is invalidated when I call UnlockRect().

Using


 pDevice->CreateOffscreenPlainSurface(rtDesc.Width, rtDesc.Height,rtDesc.Format, D3DPOOL_SYSTEMMEM, &pDestTarget, NULL);
 
pDevice->GetRenderTargetData(pRenderTarget,pDestTarget);
if(pDestTarget != NULL){
pDestTarget->LockRect(&rect,0, D3DLOCK_READONLY);
bits = (unsigned char*)rect.pBits;
SetSharedMemVoid(bits, width, height); // my shared buffer where I save my image
pDestTarget->UnlockRect();
pDestTarget->Release();
}

does work and this also does work:


 pDevice->CreateOffscreenPlainSurface(rtDesc.Width, rtDesc.Height,rtDesc.Format, D3DPOOL_SYSTEMMEM, &pDestTarget, NULL);
 
pDevice->GetRenderTargetData(pRenderTarget,pDestTarget);
if(pDestTarget != NULL){
pDestTarget->LockRect(&rect,0, D3DLOCK_READONLY);
bits = (unsigned char*)rect.pBits;
pDestTarget->UnlockRect();
SetSharedMemVoid(bits, width, height); // my shared buffer where I save my image
pDestTarget->Release();
}

But both does not work when using:


pDevice->CreateTexture(rtDesc.Width, rtDesc.Height,1, D3DUSAGE_DYNAMIC,rtDesc.Format,D3DPOOL_DEFAULT,&texture, NULL);

I need the full sized image because it is a military application for simulating a real EO camera onboard an UAV. That means we have a real camera with a HD resolution and e.g. 30fps, so we have to grab a simulated image with the same resolution and fps.

I can't use FRAPS because I need to pass the bitstream to e.g. openCV image processing algorithms and then to another lib which we use to simulate a GigE Vision Camera. So it is not enough for me to just record the image of the game or just to show the image on the screen. I need access to the bits so I can use them for further processing.

double buffer sounds very interesting. I will google for more information. Do you have some more detailed information or links?

Thanks!


Regards

Double buffering is used here to hide some of the inherent latency from the GPU. The problem is that almost all GPU commands will get put in a queue to be executed later. D3D allows up to about 3 frames of CPU instructions to be buffered up ready for the GPU to execute. However when you lock a buffer that you've just executed some draw commands to the CPU then has to stall waiting for the GPU to finish executing those commands before it can copy the buffer to system memory and use it. That synchronization is typically done when you call Lock() to read data (although you may want to experiment to find the best things to double buffer).

The solution is to create two (or more) buffers. Here's some pseudocode.


// Init code
int current_buffer = 0;
LPDIRECT3DSURFACE9 buffers[2];
CreateOffscreenPlainSurface(..., &buffers[0]);
CreateOffscreenPlainSurface(..., &buffers[1]);
 
// Render code
// Ask the GPU to copy current frame to current buffer
GetRenderTargetData(&buffer[current_buffer]);
// Lock the buffer we filled in last frame (might want to skip this the very first time)
buffer[current_buffer ^ 1]->Lock();
// Do stuff with data
buffer[current_buffer ^ 1]->Unlock();
// Swap buffers
current_buffer ^= 1;

The double/tripper buffer trick didn't helped me. Maybe I'm doing something wrong. Heres what I have so far, does anyone see an error?:


IDirect3DSurface9 *pRenderTarget = NULL;LPDIRECT3DSURFACE9 buffers[3];int n = 0;...void dumb_buffer(LPDIRECT3DDEVICE9 pDevice){ // is called each frame	if(init){ // is called only on the first run, so I dont have to create the Surface on each step         pDevice->GetRenderTarget(0,&pRenderTarget);         pRenderTarget->GetDesc(&rtDesc);  	pDevice->CreateOffscreenPlainSurface(rtDesc.Width, rtDesc.Height,rtDesc.Format, D3DPOOL_SYSTEMMEM, &buffers[0], NULL);	pDevice->CreateOffscreenPlainSurface(rtDesc.Width, rtDesc.Height,rtDesc.Format, D3DPOOL_SYSTEMMEM, &buffers[1], NULL);	pDevice->CreateOffscreenPlainSurface(rtDesc.Width, rtDesc.Height,rtDesc.Format, D3DPOOL_SYSTEMMEM, &buffers[2], NULL);	}		pDevice->GetRenderTargetData(pRenderTarget,buffers[n]); 	int prev= 0;	        if(n ==2){		prev = 1;	} else if(n ==1){		prev = 0;	} else if(n==0){		prev = 2;	}	if(buffers[prev] != NULL){		buffers[prev]->LockRect(&rect,0,  D3DLOCK_READONLY | D3DLOCK_NOSYSLOCK);		bits = (unsigned char*)rect.pBits;		buffers[prev]->UnlockRect();	}        n++;        if(n>=3){	    n=0;        }  }

I have read in an other post that i should not lock the actual buffer but the previouse one, so there is more time to transfer the data:

"By locking the buffer used on the previous frame, you give the driver more time to get the data transferred to the system memory copy - when you LockRect() the surface, the driver has to make sure that it's finished with the surface so it can yield it to the CPU. It introduces a frame of lag because you're processing the data from the previous frame, but it should improve performance." http://www.gamedev.net/topic/576721-directx9-depth-buffer-to-memory-fast/

I have read something about "Vertex Buffers". Should I maybe use them ?

Thanks!

Is there something faster then


pDevice->GetRenderTargetData(pRenderTarget,buffer)

Because when using this, my game flickers, not much but you can notice it. When using


pDevice->GetFrontBufferData(0,buffer)

its the same.

And


pDevice->GetBackBuffer(0,0,D3DBACKBUFFER_TYPE_MONO, &buffer)

does not work at all :(

This topic is closed to new replies.

Advertisement