• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
Jimkm

OpenGL
Need sugestion to optimize realtime bitblt->StretchBlt (scaling takes to much CPU)

27 posts in this topic

Here's my problem. I need to copy and scale the screen, ~30 times a second (live screen drawing).
I do this by first getting a handle to the screen DC:

GetDC(NULL)
Then, I create a compatible DC using CreateCompatibleDC, create a DIB using CreateDIBSection, and select the DIB into the new DC. Now I StretchBlt from the screen DC to my own DC  (blit and scale the image) .

So we have

while(1)
{
StretchBlt ... (get the screen and resize from 1440*900 to 1200*700)

}

As you probable know the scaling takes most of the CPU and time.
If i do only

while(1){bitblt()} i get ~400 fps - i.e 400 call of bitblt per second.

if i use the while(1){StretchBlt} i get only ~25 fps - i.e 25 call of StretchBlt per second.

(I need 30). also since i am doing other things as well, CPU will not be idle so this code need to be ran at least at 34 FPS.

As said, the bottleneck is not the Blitting from the screen. the bottleneck  is the scaling (i use SetStretchBltMode(hCompatible, HALFTONE)).
with COLORONCOLOR i get 60FPS but quality is not good (if there only something in the middle of HALFTONE to COLORONCOLOR  ...).

For now i do not want to use directx or open GL.

Any idea how to make it faster (i only need ~10 more FPS :))
I have an idea how to make it faster  but i do know how to do that and if it is possible.
Idea is to:
Bitblt to get the screen into buffer A.
StretchBlt to scale the image into buffer B.
Now on the next bitblt's to use SRCINVERT (XOR) to blt only the differences between the new capture and the old and to use the differences to update the scaled buffer B so no need to do more scaling.

Again just an idea but i do not think it is possible.

I would love to hear your suggestions.

I tries to use bitblt with some custom scaling functions (instead of StretchBlt) i found on the Internet or with image libs like FreeImage but results were dramatically worst than StretchBlt  (even using BOX filtering)

Again, i know that using DirectX/opengl may be faster but currently i do not have the time to learn that or deal with that - so currently i would like to use GDI/GDI+ only.

Thanks!

 

0

Share this post


Link to post
Share on other sites

Hi. are you running this bitblt in a loop like here.

 

while(1)
{
StretchBlt ... (get the screen and resize from 1440*900 to 1200*700)

}

 

may be you need to just update it on a wm_paint or what ever the message is.

0

Share this post


Link to post
Share on other sites

I do not think that there is an option to catch when wm_paint is sent to the screen itself, i.e tocatch some event that is thrown when what you see on the computer screen is changed.

0

Share this post


Link to post
Share on other sites

It seems that halftone blitter is really slow, couldn belive it to be so slow and i did quick testing - it shows the same result youre talking here about

abput 40 ms - what system do you have, it may depend on the system 

 

i dont know some say that in win7 it may run even worse than xp ;/ ?

(found something on this

http://vjforums.info/threads/stretchblt-can-hang-windows-by-hogging-gdi-lock.38179/ )

 

anyway normal mode blitter working ok and you could rescale this by hand code or ev trying to do some "opengl/dx blitter" with fast hardware resizin onboard - which i was not testing yet

0

Share this post


Link to post
Share on other sites

Hi Fir.

I tried to use custom scalling - some even with ASM byut could not get even near strechblt - with the custom scaling fucntion i got ~10 FPS ....

And as say i do not want to use opengl/dx.

 

Am am using win7.

0

Share this post


Link to post
Share on other sites

Hi Fir.

I tried to use custom scalling - some even with ASM byut could not get even near strechblt - with the custom scaling fucntion i got ~10 FPS ....

And as say i do not want to use opengl/dx.

 

Am am using win7.

you may show the code, we can taka a lok here and maybe come advices or conclusions will appear -i could do the test of some down of upscalling in bitmap arrays im not sure but i think it could work at 50 (100?) hz or more - not 100% sure though

 

PS i did some simle test of down and up scaling frame bitmap into some buffer then copying this to framebuffer array again -and even with this double copying i got no trouble

 

for screens size like 500p it was 4 ms (scaling forth somewhere and copying back) for screens like 1000p it was 12 ms (for such two way: scaling+copying back), so not a problem here -

 

got no idea what this halftone is doing to be so slow but this linear scaling works ok

 

Edited by fir
0

Share this post


Link to post
Share on other sites

First i tried this:

http://www.geisswerks.com/ryan/FAQS/resize.html

Gave me around 10 FPS only.

I then tried FreeImage lib (FreeImage_Rescale) got around 7 FPS.

None came even close to StretchBlt preformace.

 

I think the answer should be

1) To find fastest rescal (with good quality) scaling funtion/

OR AND

2)to somehow use the scaling histroy.

I.E to try to scal only the parts that changed from the last scale - but not sure how to do that ...

Maybe with SRCINVERT (XOR) - somehow.

0

Share this post


Link to post
Share on other sites

 

 

got no idea what this halftone is doing to be so slow but this linear scaling works ok

 

 

 

Well without halftone the image quality is not good.

Text does not look good and somehting is unreadable.

The halftone smoth the image by avraging pixels before it scall or something like that.

 

0

Share this post


Link to post
Share on other sites

Scaling is expensive. This is why, for instance, Fraps only offers fullscreen or halfscreen resolution when it captures the screen, so that scaling is either unnecessary or very easy. It's just too costly to handle cases where you're not scaling down to a power of two of the original size, because you have to handle filtering of multiple overlapping pixels, which also blows your cache because the resulting memory access patterns are.. suboptimal to say the least. There is one thing you can try, which drastically helped me back when I was on my old laptop and trying to capture stuff, which is to set your screen resolution to 16-bit colors.

 

Otherwise, what I would try is instead get a copy of the screen in a GPU-bound texture, using your favorite graphics API, scale it using the pixel shader (which ought to be fast enough, even with bilinear filtering) and read it back on the CPU. But this might incur significant capture latency, and might not be feasible, and you said you don't want to do this anyway, so...

 

I'm not sure what your specific needs are, but have you considered simply getting one of those screen recording devices? They don't have any system overhead and you might be able to get one to feed back into the computer to deliver the screen contents in realtime.

2

Share this post


Link to post
Share on other sites

Hi Bacterius

Yes i know scaling is expensive. that is why using the scaling history might help.

I just do not know how to do that. I.E to scale only the parts from the image that changed from the last scale - that way i scale one time and on next scales i only scale

the parts that were changed - but not sure if that can be done and how.

 

P.S

Just tried CxImgLib and got only 20 FPS still lower than  StretchBlt ...

 

P.S2 lowering to 16 bit did not help and in anyway it should run on all systems without making changes - that is why recording device is not a solution.

Edited by Jimkm
0

Share this post


Link to post
Share on other sites

Could you try comparing each pixel to the previous pixel to determine a difference, and then divvy the screen up into 32x32 sized chunks and only StretchBlt the chunks that contained at least one differing pixel?

0

Share this post


Link to post
Share on other sites

Hi Samith.

I am not sure this will work since the scaling also does some smoting and i think that doing this in your way will make the borders of the 32*32 squares to be seen ...

I can try that but i am not sure how to do that.

0

Share this post


Link to post
Share on other sites
The simplest solution would be to perform the resizing of the images in parallel in multiple threads. You could put the captured images into a queue, and have the threads grab the images from said queue and resize them.

It doesn't change the fact, that 40ms is awefully slow for a resize, but if you have the CPU cores to spare, why not use them.
0

Share this post


Link to post
Share on other sites

Hi Samith.
I am not sure this will work since the scaling also does some smoting and i think that doing this in your way will make the borders of the 32*32 squares to be seen ...
I can try that but i am not sure how to do that.

 

You might be able to work around that by having the grid squares "overlap" with a border that is as wide as the kernel of the filter.

It would mean processing some pixels twice, but it would still be less then doing every pixel.

 

Apart from that, and using multiple cores to update squares in paralell, I don't think its much you can do unless using graphics hardware. (which is a massively parallell processor designed for image processing, seems pretty ideal for the task)

 

Why not using graphics hardware? It would be a lot less complex to get to work.

 

MAybe a bit off-topic, but why exactly do you need to scale anyway? (I'm not sure what you mean by "live screen drawing" and why this implies scaling?)

Edited by Olof Hedman
0

Share this post


Link to post
Share on other sites

Olof Hi

I grab screenshot, scall it to the destination device resolution and trasmit it to that device.

The device that the screens are sent to can not do the scalling.

I already use the extra CPU to encode the images so it will not help to scall using the extra cpu.

I am more looking for a way to decrease cpu time than to have more fps (if i will be able to decrease cpu time then the extra fps will come by themself).

0

Share this post


Link to post
Share on other sites
Hi. I don't know why you have that while(1) in your render
What breaks it.
Can you show your render code.
0

Share this post


Link to post
Share on other sites

Whats break the while(1) is the user that press the stop button in the gui to stop.

It is

while(user did not press the stop button)

{

capture screenshot and scale it (Streachblt)

}

0

Share this post


Link to post
Share on other sites

I say the slowness is comming from your code set up.

ankhd hi.

I do not think so since i think this is the most FPS my CPU can give me with StretchBlt which is done on the cpu (i use gpu monitor and i see that it does not do that on gpu).

But if you have more details why it is coming from my code then i would love to hear.

0

Share this post


Link to post
Share on other sites
What exactly are your requirements on the quality?

You wrote that "color on color", which is nearest downsampling, is not good enough. What about downsizing the image to exactly half it's original size with a box filter, compressing and sending that to the target device, and linearly upsampling there?
You can simulate that in GIMP/Photoshop to see, what it looks like.
0

Share this post


Link to post
Share on other sites

take ma advice and do it such way

 

1) wrote your own resizing code (inbeetween the raw scaling and halftone), this shouldnt be hard - you will not get as fast as simple resizing blitter  when doing something more complex than raw scalling but you also should get not as slow as halftone when writting something simpler than halftone

 

2) find some library - this should be the same, if you use something in between than raw sacling and halftone that should be both slower than raw scalling and faster than halftone 

 

thats all

0

Share this post


Link to post
Share on other sites

take ma advice and do it such way

 

1) wrote your own resizing code (inbeetween the raw scaling and halftone), this shouldnt be hard - you will not get as fast as simple resizing blitter  when doing something more complex than raw scalling but you also should get not as slow as halftone when writting something simpler than halftone

 

2) find some library - this should be the same, if you use something in between than raw sacling and halftone that should be both slower than raw scalling and faster than halftone 

 

thats all

I already tried few pro image library - all gave me worst FPS than strachblt

0

Share this post


Link to post
Share on other sites

I don't know how much time per day you've been spending on this, but in the time since you started this thread you could have already written a GPU solution. I know you say it's not what you want to do, but seriously, it wouldn't be that hard and you'll have excellent performance.

0

Share this post


Link to post
Share on other sites
what cpu are you using for benchmarking?

some possible solutions:
1. scale to half and then upscale to the resolution you wanted, e.g.
1440*900 -> 720*450 -> 1200*700

2. write a custom scaler for the solution you need.

3. maybe your source is in some YCbCr format or something? you could scale the source

4. interlace scaling (scale just every 2nd line per frame -> double fps)

5. scale to half and use a letterbox

6. don't interpolate, just fetch the closest pixel to show.
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Similar Content

    • By Toastmastern
      So it's been a while since I took a break from my whole creating a planet in DX11. Last time around I got stuck on fixing a nice LOD.
      A week back or so I got help to find this:
      https://github.com/sp4cerat/Planet-LOD
      In general this is what I'm trying to recreate in DX11, he that made that planet LOD uses OpenGL but that is a minor issue and something I can solve. But I have a question regarding the code
      He gets the position using this row
      vec4d pos = b.var.vec4d["position"]; Which is then used further down when he sends the variable "center" into the drawing function:
      if (pos.len() < 1) pos.norm(); world::draw(vec3d(pos.x, pos.y, pos.z));  
      Inside the draw function this happens:
      draw_recursive(p3[0], p3[1], p3[2], center); Basically the 3 vertices of the triangle and the center of details that he sent as a parameter earlier: vec3d(pos.x, pos.y, pos.z)
      Now onto my real question, he does vec3d edge_center[3] = { (p1 + p2) / 2, (p2 + p3) / 2, (p3 + p1) / 2 }; to get the edge center of each edge, nothing weird there.
      But this is used later on with:
      vec3d d = center + edge_center[i]; edge_test[i] = d.len() > ratio_size; edge_test is then used to evaluate if there should be a triangle drawn or if it should be split up into 3 new triangles instead. Why is it working for him? shouldn't it be like center - edge_center or something like that? Why adding them togheter? I asume here that the center is the center of details for the LOD. the position of the camera if stood on the ground of the planet and not up int he air like it is now.

      Full code can be seen here:
      https://github.com/sp4cerat/Planet-LOD/blob/master/src.simple/Main.cpp
      If anyone would like to take a look and try to help me understand this code I would love this person. I'm running out of ideas on how to solve this in my own head, most likely twisted it one time to many up in my head
      Thanks in advance
      Toastmastern
       
       
    • By fllwr0491
      I googled around but are unable to find source code or details of implementation.
      What keywords should I search for this topic?
      Things I would like to know:
      A. How to ensure that partially covered pixels are rasterized?
         Apparently by expanding each triangle by 1 pixel or so, rasterization problem is almost solved.
         But it will result in an unindexable triangle list without tons of overlaps. Will it incur a large performance penalty?
      B. A-buffer like bitmask needs a read-modiry-write operation.
         How to ensure proper synchronizations in GLSL?
         GLSL seems to only allow int32 atomics on image.
      C. Is there some simple ways to estimate coverage on-the-fly?
         In case I am to draw 2D shapes onto an exisitng target:
         1. A multi-pass whatever-buffer seems overkill.
         2. Multisampling could cost a lot memory though all I need is better coverage.
            Besides, I have to blit twice, if draw target is not multisampled.
       
    • By mapra99
      Hello

      I am working on a recent project and I have been learning how to code in C# using OpenGL libraries for some graphics. I have achieved some quite interesting things using TAO Framework writing in Console Applications, creating a GLUT Window. But my problem now is that I need to incorporate the Graphics in a Windows Form so I can relate the objects that I render with some .NET Controls.

      To deal with this problem, I have seen in some forums that it's better to use OpenTK instead of TAO Framework, so I can use the glControl that OpenTK libraries offer. However, I haven't found complete articles, tutorials or source codes that help using the glControl or that may insert me into de OpenTK functions. Would somebody please share in this forum some links or files where I can find good documentation about this topic? Or may I use another library different of OpenTK?

      Thanks!
    • By Solid_Spy
      Hello, I have been working on SH Irradiance map rendering, and I have been using a GLSL pixel shader to render SH irradiance to 2D irradiance maps for my static objects. I already have it working with 9 3D textures so far for the first 9 SH functions.
      In my GLSL shader, I have to send in 9 SH Coefficient 3D Texures that use RGBA8 as a pixel format. RGB being used for the coefficients for red, green, and blue, and the A for checking if the voxel is in use (for the 3D texture solidification shader to prevent bleeding).
      My problem is, I want to knock this number of textures down to something like 4 or 5. Getting even lower would be a godsend. This is because I eventually plan on adding more SH Coefficient 3D Textures for other parts of the game map (such as inside rooms, as opposed to the outside), to circumvent irradiance probe bleeding between rooms separated by walls. I don't want to reach the 32 texture limit too soon. Also, I figure that it would be a LOT faster.
      Is there a way I could, say, store 2 sets of SH Coefficients for 2 SH functions inside a texture with RGBA16 pixels? If so, how would I extract them from inside GLSL? Let me know if you have any suggestions ^^.
    • By KarimIO
      EDIT: I thought this was restricted to Attribute-Created GL contexts, but it isn't, so I rewrote the post.
      Hey guys, whenever I call SwapBuffers(hDC), I get a crash, and I get a "Too many posts were made to a semaphore." from Windows as I call SwapBuffers. What could be the cause of this?
      Update: No crash occurs if I don't draw, just clear and swap.
      static PIXELFORMATDESCRIPTOR pfd = // pfd Tells Windows How We Want Things To Be { sizeof(PIXELFORMATDESCRIPTOR), // Size Of This Pixel Format Descriptor 1, // Version Number PFD_DRAW_TO_WINDOW | // Format Must Support Window PFD_SUPPORT_OPENGL | // Format Must Support OpenGL PFD_DOUBLEBUFFER, // Must Support Double Buffering PFD_TYPE_RGBA, // Request An RGBA Format 32, // Select Our Color Depth 0, 0, 0, 0, 0, 0, // Color Bits Ignored 0, // No Alpha Buffer 0, // Shift Bit Ignored 0, // No Accumulation Buffer 0, 0, 0, 0, // Accumulation Bits Ignored 24, // 24Bit Z-Buffer (Depth Buffer) 0, // No Stencil Buffer 0, // No Auxiliary Buffer PFD_MAIN_PLANE, // Main Drawing Layer 0, // Reserved 0, 0, 0 // Layer Masks Ignored }; if (!(hDC = GetDC(windowHandle))) return false; unsigned int PixelFormat; if (!(PixelFormat = ChoosePixelFormat(hDC, &pfd))) return false; if (!SetPixelFormat(hDC, PixelFormat, &pfd)) return false; hRC = wglCreateContext(hDC); if (!hRC) { std::cout << "wglCreateContext Failed!\n"; return false; } if (wglMakeCurrent(hDC, hRC) == NULL) { std::cout << "Make Context Current Second Failed!\n"; return false; } ... // OGL Buffer Initialization glClear(GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT); glBindVertexArray(vao); glUseProgram(myprogram); glDrawElements(GL_TRIANGLES, indexCount, GL_UNSIGNED_SHORT, (void *)indexStart); SwapBuffers(GetDC(window_handle));  
  • Popular Now