• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
David Lake

Display screen remotely

59 posts in this topic

I have a program I made for remotely controlling PC's that uses bitblt, bitmaps, a bit of processing and compression and I would like to improve the way it captures and displays the desktop using SlimDX if that can be used to increase performance.

 

At the moment the capture side is very heavy on the CPU, I use the threadpool for threading and split the screen capture and processing/sending of frames into different threads but would like to reduce CPU utilization without compromising performance if possible.

Main problem is I cant control the login screen I would like to know if theres a way arround that.

 

Also I need a faster way(than GDI+) to scale the images before sending, I'm hoping SlimDX can speed this up using the GPU

 

Is there a relatively easy way to use SlimDX to capture the screen and display it remotely thats at least as fast as using bitblt, GDI+ and bitmaps?

0

Share this post


Link to post
Share on other sites

Use dx to scale your image, not GDI, it will be super fast. I have a similar working project and that's what i do, execpt i use OpenGL but it's almost the same thing.

 

For the cpu, i don't really see any way other than add a sleep before or after you capture the screen. It might be a bit slower, but keep in mind that your image is going to

be streamed on the network anyway, which isn't that fast. In fact, i multithreaded mine so, after sending a screenshot, it start a thread to make the next one, while the current one is being sent. Keep in mind though that multithreading add some complexity like syncronisation but i think it's easier in c# than c++, like i did.

 

good luck

 

EDIT: oh, i didn't see your already multithreaded that. Experiment by adding some Sleep somewhere, it might help.

Edited by Vortez
0

Share this post


Link to post
Share on other sites

Thanks, I can easily slow it down but ultimately I want it to go faster without using 2 whole cores.

I'm looking for code samples for speedy transfer of screen capture, possibly with SlimDX, over the network.

I have tried it before but it was too slow because I had to turn it into a Drawing.Bitmap but if theres a way to avoid System.Drawing all together and convert the sprite or whatever it was into a byte array to send it over the network that might make it faster.

Also I'm a complete n00b when it comes to any sort of GPU programming but if I could process and compress the frames on the GPU that would be nerdgasmtastic!

Edited by David Lake
0

Share this post


Link to post
Share on other sites

I don't know c# very well, but in c++ i have direct access to the bitmap buffer memory so i don't know. Im using Delphi for the interface part and a c++ dll to do the hard work (networking, screenshot, hooks, compression).

 

Not sure if compression on the gpu is fesable, im using zlib to compress mine, but the real speed up isn't that. In fact, i use a QuadTree, basically, i split the image in 4 like 4 or 5 times, then i check wich block have changed, and send only the part that changed. The quadtree is used to send different sized part of the images to the other side, so i only need to update some part of the texture most of the time, and, if nothing changed, it just send an empty screenshot message. It's a bit complex but the best optimization i've found yet.

 

It's a bit the same thing you do when optimizing a terrain mesh using quadtree but with an image instead. I also tried using mp1 compression to send frames, like a movie, and it worked, but the image is blurry, and it's slower than my other method so i don't see any reason to use it.

 

Let directx do the scaling for you. And make sure to replace, not recreate, the texture each time you receive a screenshot, it's way faster this way.

Edited by Vortez
0

Share this post


Link to post
Share on other sites

The main problem with using the GPU for this is that the transfers to and from the GPU are relatively slow - the CPU can read and write memory faster. This means unless the processing required is expensive then it probably won't help compared to some optimized CPU code. However if you want to go that way you may find http://forums.getpaint.net/index.php?/topic/18989-gpu-motion-blur-effect-using-directcompute/ useful, there's some source code available there.

 

For data compression I'd go with using the xor operator on pairs of adjacent frames. The result will be zeros where they are identical. You can then apply zlib to the results, which should compress all those zeros really well. Reconstructing at the other end is done with xor too. As the xor operator is really cheap that should be reasonably quick to do even on a CPU.

 

To cut down CPU load make sure you have a frame rate limiter in there. There's no point processing more than 60 frames per second, and you can probably get away with far less than that.

2

Share this post


Link to post
Share on other sites

I store the previous frame and zero ARGB of any unchanged pixils (xor), then compress twice with QuickLZ this makes highly compressible frames much smaller then a single compression, all the processing is done by an optimized dll built using the Intel C++ compiler.

Then when displaying the frame (in a picturebox with GDI+) I simply draw over the previous one and since the alpha of the unchanged pixels is zero, the unchanged pixels of the previous frame remain (ingenious I know!).

 

Whats the fastest way to capture the screen, scale it and get the frame into a byte array using the GPU then display it without using GDI which I find slows down when rendering at a high resolution such as 1920x1200 even on an i7 3820 at 4.3GHz?

 

Oh and as for framerate, I like it to be as fast as possible thats why I dont use Remote Desktop and if it did go over 60 FPS I dont know how to limit it to exactly that.

Edited by David Lake
0

Share this post


Link to post
Share on other sites


For data compression I'd go with using the xor operator on pairs of adjacent frames. The result will be zeros where they are identical. You can then apply zlib to the results, which should compress all those zeros really well. Reconstructing at the other end is done with xor too. As the xor operator is really cheap that should be reasonably quick to do even on a CPU.

 

That's actually a very good idea. Dunno why i didn't think about it before...

0

Share this post


Link to post
Share on other sites

If anything that uses the GPU is slower then wont using it for scaling be slow?

Edited by David Lake
0

Share this post


Link to post
Share on other sites

GPU are not slower than cpu, they are just more optimized to do parallel tasks and work with vectors and 3d/graphics/textures stuffs, while cpus are more for serials operation. Most compression algorithm are serial by nature i think. I think what Adam_42 meant is that it take time to transfer the data to be compressed from normal memory to gpu memory and back, and that time could be used to compress on the cpu, rendering gpu compression useless.

 

I can't really tell why scaling a texture using the gpu is faster, but it is, it's one of the thing the gpu is good at. Also, think about it, isn't it better to send pictures of a fixed size and render it the size you want on the other side, or scale it first, then be stuck with that size on the other side? I prefer the first solution. This way you can resize the window that draw the screenshot and directx will scale it for you effortlessly, all you have to do is draw a quad the size of the renderer window and the texture will stretch with it automatically. If you want the picture not to be distorted, then it's a little bit more work since you have to draw black borders, but it's not complicated to compute either. (In fact, it's not the border you must draw, but rather adjust the quad size so it leave some area black, or whaterver you set the background color to)

 

PS: Sorry if im not explaining very well but english is not my native language.

Edited by Vortez
0

Share this post


Link to post
Share on other sites

GPU are not slower than cpu, they are just more optimized to do parallel tasks and work with vectors and 3d/graphics stuffs, while cpus are more for serials operation.

Most compression algorithm are serial i think.

 

I think he meant the data transfer. GPU's are good for games because the frame stays on the GPU. For his application he would need to constantly send the frame to the GPU, scale it, and then copy it back to the CPU for transfer, which may be too slow (PCI-E has very real bandwidth limits) and can have nontrivial latency.

 

There is always some overhead in doing GPU stuff, which is why in some cases it is best to let the CPU do it because the cost of getting the GPU involved exceeds the benefits.

0

Share this post


Link to post
Share on other sites

 

GPU are not slower than cpu, they are just more optimized to do parallel tasks and work with vectors and 3d/graphics stuffs, while cpus are more for serials operation.

Most compression algorithm are serial i think.

 

I think he meant the data transfer. GPU's are good for games because the frame stays on the GPU. For his application he would need to constantly send the frame to the GPU, scale it, and then copy it back to the CPU for transfer, which may be too slow (PCI-E has very real bandwidth limits) and can have nontrivial latency.

 

There is always some overhead in doing GPU stuff, which is why in some cases it is best to let the CPU do it because the cost of getting the GPU involved exceeds the benefits.

 

 

Yes, thank you.

 

I suppose I'd better be looking for performance optimization elsewhere in the code, after doing a performance analysis it seems array copying is quite slow.

0

Share this post


Link to post
Share on other sites

I suppose I'd better be looking for performance optimization elsewhere in the code, after doing a performance analysis it seems array copying is quite slow.

 

Of course it is, just multiply the width and height of you screen then multiply that by 3 to give you an idea of how much bytes you need copying. my 1360x768 screen is 1360 * 768 * 3 bytes = 3133440 bytes or about 3 Mb. Without some form of compression or optimization this take quite a while to transfer. Although, in my program at least, using 2,8,16 or 24 bits per pixels dosen't seem to help much, i might be bottlenecked somewhere else but still get pretty descent result (couple of frames/second).

Edited by Vortez
0

Share this post


Link to post
Share on other sites

I am aware, my test involved 1920x1200*4 (32bit), but still a 9,216,000 byte frame is nothing for quad channel DDR3 at 2133MHz with an effective performance of 45GB/s and 26.8GIPS per core.

Edited by David Lake
0

Share this post


Link to post
Share on other sites

Another optimization that i do is to paint the desktop background black when the connection is done, then restore it afterward, it's a lot easier to compress, although with the xor trick pointed out by Adam that become rather useless.

0

Share this post


Link to post
Share on other sites

I just meant that this is like 3 times more pixels than my example, so without compression, that must take a while to transfer a single frame over the network.

 

With my 8000 mbits/sec connection that would take 100 seconds to upload and 10 to download... hence the "ouch!" haha.

Edited by Vortez
0

Share this post


Link to post
Share on other sites

I'm curious how you lot got xor to work, I check whole pixels and use the alpha channel to tell if the pixel is unchanged and not just black, if xor is done on each sub pixel with no alpha channel theres no way to tell if its zero or unchanged?

Edited by David Lake
0

Share this post


Link to post
Share on other sites

Why would you need to draw a distinction between black and unchanged?

 

( assuming black is zero )

 

previous[3] = { green, black, green };

current[3] = { black, black, green };

 

encode: ( xor previous with current frame )

green xor black = green

black xor black = black

green xor green = black

 

so pak[3] = { green, black, black }

 

decode: ( xor previous with "packed" frame )

green xor green = black

black xor black = black

green xor black= green

 

which is back to current frame

 

If your doing this for the first frame just use zero filled memory / black as the previous frame.

Edited by aqrit
0

Share this post


Link to post
Share on other sites

The point is not to check if a pixel have changed, but to make black all pixels that haven't changed. Then, when compressing, if 2 images are identical you get a buffer full of zeros, which is very compressible. I haven't tryed it yet but i know it work, it's like xor encryption. All you need is a buffer with the previous image and one with the current image and perform xor on all those bits before, and after sending it. The second pass will restore the original image.

 

Im pretty sure it's more fast than my quad tree algorithm.

Edited by Vortez
0

Share this post


Link to post
Share on other sites

Btw, you can extract your bitmap in 24 bits per pixel if you wish, by settings the LPBITMAPINFO bmiHeader.biBitCount member to 24

// ... some code removed

        // De-select our hbmp
        SelectObject(s_hdc, ex_hbmp);

	// Allocate a BITMAPINFO buffer 
	LPBITMAPINFO lpbi = (LPBITMAPINFO)(new BYTE[BMISize]);
	ZeroMemory(lpbi, sizeof(BITMAPINFO));
	lpbi->bmiHeader.biSize = sizeof(BITMAPINFOHEADER);

	// Get information about the screenshot image format
	GetDIBits(s_hdc, hbmp, 0, h, NULL, lpbi, DIB_RGB_COLORS);
	lpbi->bmiHeader.biCompression = BI_RGB;
	// Make sure it's gonna be extracted in 24 bits format
	lpbi->bmiHeader.biBitCount  = 24;
	lpbi->bmiHeader.biSizeImage = NumPixels * 3;

	// Extract the image in 24 bits format
	GetDIBits(s_hdc, hbmp, 0, h, pSrc->GetBuffer(), lpbi, DIB_RGB_COLORS);

...
Edited by Vortez
0

Share this post


Link to post
Share on other sites

Oh yea I understand now, my brain dont work as well as it used to and im only 24!

 

Yipeee that sped it up a bit!

Edited by David Lake
0

Share this post


Link to post
Share on other sites

1 xor 1 = 0
1 xor 0 = 1
0 xor 1 = 1
0 xor 0 = 0

 

Do some exercise with a pen and paper, with 2 Bytes. Try it twice with identical value, then try it twice again with non-identical value.

You'll get it eventually.

 

Edit: Oh, now it was me that though you where being sarcastic haha.

Edited by Vortez
0

Share this post


Link to post
Share on other sites

Now I would like a faster way to display the image than a picturebox if possible?

 

I also need to remove the alpha channel as bitblt gives me no choice on that, whats the best way to do that, in the xor loop in my dll?

Edited by David Lake
0

Share this post


Link to post
Share on other sites

You need directx or opengl for that, in c#, i dunno how that would work though. All you have to do is create a texture once, then replace it with the new one each frame, then draw it on a quad the size of the screen, using the texture above.

 

As for the alpha channel, i can post all my code but it's in c++

//-----------------------------------------------------------------------------
// Draw the cursor
//-----------------------------------------------------------------------------
void CScreenShot::DrawCurcor(HDC hDC)
{
	CURSORINFO CursorInfo;
	CursorInfo.cbSize = sizeof(CURSORINFO);
	GetCursorInfo(&CursorInfo);

	static DWORD Version = WinVer.DetectWindowsVersion();
	//static HCURSOR hCur = LoadCursor(NULL, IDC_ARROW);

	DWORD CursorWidth  = GetSystemMetrics(SM_CXCURSOR);
	DWORD CursorHeight = GetSystemMetrics(SM_CYCURSOR);

	POINT CursorPos;
	GetCursorPos(&CursorPos);
	
	// Needed for XP or older windows
	if(Version < _WIN_VISTA_){
		CursorPos.x -= CursorWidth  >> 2;
		CursorPos.y -= CursorHeight >> 2;
	}

	DrawIconEx(hDC, CursorPos.x, CursorPos.y, CursorInfo.hCursor, CursorWidth, CursorHeight, 0, NULL, DI_NORMAL);
}

//-----------------------------------------------------------------------------
// Take a screenshot, extract it to a buffer in 24 bits, and compress it
//-----------------------------------------------------------------------------
int CScreenShot::GenMPEGScreenShot(CVideoEncoder *pVideoEncoder, BOOL ShowCursor)
{
	HWND hDesktopWnd = GetDesktopWindow();
	HDC  hdc = GetDC(hDesktopWnd);

	int x = 0;
	int y = 0;
	int w = GetSystemMetrics(SM_CXSCREEN);
	int h = GetSystemMetrics(SM_CYSCREEN);

	HDC     s_hdc   = CreateCompatibleDC(hdc);
	HBITMAP hbmp    = CreateCompatibleBitmap(hdc, w,h);
	HBITMAP ex_hbmp = (HBITMAP)SelectObject(s_hdc, hbmp);

/////////////////////////////////////////////////////////////////////////////////////////

	// Copy the screen image in our bitmap
	BitBlt(s_hdc, x,y,w,h, hdc, x,y, SRCCOPY);

	// Draw the cursor over the image
	if(ShowCursor)
		DrawCurcor(s_hdc);
	
	ReleaseDC(hDesktopWnd, hdc);

/////////////////////////////////////////////////////////////////////////////////////////

	// Create pointers to our buffers object
	CRawBuffer *pSrc = &Buffers.MPEG.ScreenShot;
	CRawBuffer *pDst = &Buffers.MPEG.Encoded;

	// Allocate buffers
	DWORD NumPixels = w * h;
	if(pSrc->GetBufferSize() != NumPixels * 3)
		pSrc->Allocate(NumPixels * 3);

	// Allocate a BITMAPINFO buffer 
	LPBITMAPINFO lpbi = (LPBITMAPINFO)(new BYTE[BMISize]);
	ZeroMemory(lpbi, sizeof(BITMAPINFO));
	lpbi->bmiHeader.biSize = sizeof(BITMAPINFOHEADER);

	// De-select our hbmp
	SelectObject(s_hdc, ex_hbmp);

	// Get information about the screenshot image format
	GetDIBits(s_hdc, hbmp, 0, h, NULL, lpbi, DIB_RGB_COLORS);
	lpbi->bmiHeader.biCompression = BI_RGB;
	// Make sure it's gonna be extracted in 24 bits format
	lpbi->bmiHeader.biBitCount  = 24;
	lpbi->bmiHeader.biSizeImage = NumPixels * 3;

	// Extract the image in 24 bits format
	GetDIBits(s_hdc, hbmp, 0, h, pSrc->GetBuffer(), lpbi, DIB_RGB_COLORS);

	// Delete the BITMAPINFO buffer
	SAFE_DELETE_ARRAY(lpbi);

	// Release the bitmap handles
	if(SelectObject(s_hdc, hbmp)){
		DeleteObject(hbmp);
		DeleteDC(s_hdc);
	}

/////////////////////////////////////////////////////////////////////////////////////////

	// Convert from BGR to RGB
	Convert24bitsBGRTORGB(pSrc->GetBuffer(), pSrc->GetBufferSize());

/////////////////////////////////////////////////////////////////////////////////////////

	// Compress the frame using ffmpeg
	int FrameSize = pVideoEncoder->EncodeFrame(pDst->GetBuffer(6), pSrc->GetBuffer(), pSrc->GetBufferSize());
	
	// Write the packet header 
	WORD MsgID = MSG_MP1_IMG_REQUEST;
	memcpy(pDst->GetBuffer(0), &FrameSize, sizeof(DWORD));
	memcpy(pDst->GetBuffer(4), &MsgID,     sizeof(WORD));

	// Free our source buffer
	pSrc->Free(); 

	// Return compressed buffer size
	Size = FrameSize;
	return Size;
}

0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0