• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By fleissi
      Hey guys!

      I'm new here and I recently started developing my own rendering engine. It's open source, based on OpenGL/DirectX and C++.
      The full source code is hosted on github:

      I would appreciate if people with experience in game development / engine desgin could take a look at my source code. I'm looking for honest, constructive criticism on how to improve the engine.
      I'm currently writing my master's thesis in computer science and in the recent year I've gone through all the basics about graphics programming, learned DirectX and OpenGL, read some articles on Nvidia GPU Gems, read books and integrated some of this stuff step by step into the engine.

      I know about the basics, but I feel like there is some missing link that I didn't get yet to merge all those little pieces together.

      Features I have so far:
      - Dynamic shader generation based on material properties
      - Dynamic sorting of meshes to be renderd based on shader and material
      - Rendering large amounts of static meshes
      - Hierarchical culling (detail + view frustum)
      - Limited support for dynamic (i.e. moving) meshes
      - Normal, Parallax and Relief Mapping implementations
      - Wind animations based on vertex displacement
      - A very basic integration of the Bullet physics engine
      - Procedural Grass generation
      - Some post processing effects (Depth of Field, Light Volumes, Screen Space Reflections, God Rays)
      - Caching mechanisms for textures, shaders, materials and meshes

      Features I would like to have:
      - Global illumination methods
      - Scalable physics
      - Occlusion culling
      - A nice procedural terrain generator
      - Scripting
      - Level Editing
      - Sound system
      - Optimization techniques

      Books I have so far:
      - Real-Time Rendering Third Edition
      - 3D Game Programming with DirectX 11
      - Vulkan Cookbook (not started yet)

      I hope you guys can take a look at my source code and if you're really motivated, feel free to contribute :-)
      There are some videos on youtube that demonstrate some of the features:
      Procedural grass on the GPU
      Procedural Terrain Engine
      Quadtree detail and view frustum culling

      The long term goal is to turn this into a commercial game engine. I'm aware that this is a very ambitious goal, but I'm sure it's possible if you work hard for it.


    • By tj8146
      I have attached my project in a .zip file if you wish to run it for yourself.
      I am making a simple 2d top-down game and I am trying to run my code to see if my window creation is working and to see if my timer is also working with it. Every time I run it though I get errors. And when I fix those errors, more come, then the same errors keep appearing. I end up just going round in circles.  Is there anyone who could help with this? 
      Errors when I build my code:
      1>Renderer.cpp 1>c:\users\documents\opengl\game\game\renderer.h(15): error C2039: 'string': is not a member of 'std' 1>c:\program files (x86)\windows kits\10\include\10.0.16299.0\ucrt\stddef.h(18): note: see declaration of 'std' 1>c:\users\documents\opengl\game\game\renderer.h(15): error C2061: syntax error: identifier 'string' 1>c:\users\documents\opengl\game\game\renderer.cpp(28): error C2511: 'bool Game::Rendering::initialize(int,int,bool,std::string)': overloaded member function not found in 'Game::Rendering' 1>c:\users\documents\opengl\game\game\renderer.h(9): note: see declaration of 'Game::Rendering' 1>c:\users\documents\opengl\game\game\renderer.cpp(35): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>c:\users\documents\opengl\game\game\renderer.cpp(36): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>c:\users\documents\opengl\game\game\renderer.cpp(43): error C2597: illegal reference to non-static member 'Game::Rendering::window' 1>Done building project "Game.vcxproj" -- FAILED. ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========  
      #include <GL/glew.h> #include <GLFW/glfw3.h> #include "Renderer.h" #include "Timer.h" #include <iostream> namespace Game { GLFWwindow* window; /* Initialize the library */ Rendering::Rendering() { mClock = new Clock; } Rendering::~Rendering() { shutdown(); } bool Rendering::initialize(uint width, uint height, bool fullscreen, std::string window_title) { if (!glfwInit()) { return -1; } /* Create a windowed mode window and its OpenGL context */ window = glfwCreateWindow(640, 480, "Hello World", NULL, NULL); if (!window) { glfwTerminate(); return -1; } /* Make the window's context current */ glfwMakeContextCurrent(window); glViewport(0, 0, (GLsizei)width, (GLsizei)height); glOrtho(0, (GLsizei)width, (GLsizei)height, 0, 1, -1); glMatrixMode(GL_PROJECTION); glLoadIdentity(); glfwSwapInterval(1); glEnable(GL_SMOOTH); glEnable(GL_DEPTH_TEST); glEnable(GL_BLEND); glDepthFunc(GL_LEQUAL); glHint(GL_PERSPECTIVE_CORRECTION_HINT, GL_NICEST); glEnable(GL_TEXTURE_2D); glLoadIdentity(); return true; } bool Rendering::render() { /* Loop until the user closes the window */ if (!glfwWindowShouldClose(window)) return false; /* Render here */ mClock->reset(); glfwPollEvents(); if (mClock->step()) { glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT); glfwSwapBuffers(window); mClock->update(); } return true; } void Rendering::shutdown() { glfwDestroyWindow(window); glfwTerminate(); } GLFWwindow* Rendering::getCurrentWindow() { return window; } } Renderer.h
      #pragma once namespace Game { class Clock; class Rendering { public: Rendering(); ~Rendering(); bool initialize(uint width, uint height, bool fullscreen, std::string window_title = "Rendering window"); void shutdown(); bool render(); GLFWwindow* getCurrentWindow(); private: GLFWwindow * window; Clock* mClock; }; } Timer.cpp
      #include <GL/glew.h> #include <GLFW/glfw3.h> #include <time.h> #include "Timer.h" namespace Game { Clock::Clock() : mTicksPerSecond(50), mSkipTics(1000 / mTicksPerSecond), mMaxFrameSkip(10), mLoops(0) { mLastTick = tick(); } Clock::~Clock() { } bool Clock::step() { if (tick() > mLastTick && mLoops < mMaxFrameSkip) return true; return false; } void Clock::reset() { mLoops = 0; } void Clock::update() { mLastTick += mSkipTics; mLoops++; } clock_t Clock::tick() { return clock(); } } TImer.h
      #pragma once #include "Common.h" namespace Game { class Clock { public: Clock(); ~Clock(); void update(); bool step(); void reset(); clock_t tick(); private: uint mTicksPerSecond; ufloat mSkipTics; uint mMaxFrameSkip; uint mLoops; uint mLastTick; }; } Common.h
      #pragma once #include <cstdio> #include <cstdlib> #include <ctime> #include <cstring> #include <cmath> #include <iostream> namespace Game { typedef unsigned char uchar; typedef unsigned short ushort; typedef unsigned int uint; typedef unsigned long ulong; typedef float ufloat; }  
    • By lxjk
      Hi guys,
      There are many ways to do light culling in tile-based shading. I've been playing with this idea for a while, and just want to throw it out there.
      Because tile frustums are general small compared to light radius, I tried using cone test to reduce false positives introduced by commonly used sphere-frustum test.
      On top of that, I use distance to camera rather than depth for near/far test (aka. sliced by spheres).
      This method can be naturally extended to clustered light culling as well.
      The following image shows the general ideas

      Performance-wise I get around 15% improvement over sphere-frustum test. You can also see how a single light performs as the following: from left to right (1) standard rendering of a point light; then tiles passed the test of (2) sphere-frustum test; (3) cone test; (4) spherical-sliced cone test

      I put the details in my blog post (https://lxjk.github.io/2018/03/25/Improve-Tile-based-Light-Culling-with-Spherical-sliced-Cone.html), GLSL source code included!
    • By Fadey Duh
      Good evening everyone!

      I was wondering if there is something equivalent of  GL_NV_blend_equation_advanced for AMD?
      Basically I'm trying to find more compatible version of it.

      Thank you!
    • By Jens Eckervogt
      Hello guys, 
      Please tell me! 
      How do I know? Why does wavefront not show for me?
      I already checked I have non errors yet.
      using OpenTK; using System.Collections.Generic; using System.IO; using System.Text; namespace Tutorial_08.net.sourceskyboxer { public class WaveFrontLoader { private static List<Vector3> inPositions; private static List<Vector2> inTexcoords; private static List<Vector3> inNormals; private static List<float> positions; private static List<float> texcoords; private static List<int> indices; public static RawModel LoadObjModel(string filename, Loader loader) { inPositions = new List<Vector3>(); inTexcoords = new List<Vector2>(); inNormals = new List<Vector3>(); positions = new List<float>(); texcoords = new List<float>(); indices = new List<int>(); int nextIdx = 0; using (var reader = new StreamReader(File.Open("Contents/" + filename + ".obj", FileMode.Open), Encoding.UTF8)) { string line = reader.ReadLine(); int i = reader.Read(); while (true) { string[] currentLine = line.Split(); if (currentLine[0] == "v") { Vector3 pos = new Vector3(float.Parse(currentLine[1]), float.Parse(currentLine[2]), float.Parse(currentLine[3])); inPositions.Add(pos); if (currentLine[1] == "t") { Vector2 tex = new Vector2(float.Parse(currentLine[1]), float.Parse(currentLine[2])); inTexcoords.Add(tex); } if (currentLine[1] == "n") { Vector3 nom = new Vector3(float.Parse(currentLine[1]), float.Parse(currentLine[2]), float.Parse(currentLine[3])); inNormals.Add(nom); } } if (currentLine[0] == "f") { Vector3 pos = inPositions[0]; positions.Add(pos.X); positions.Add(pos.Y); positions.Add(pos.Z); Vector2 tc = inTexcoords[0]; texcoords.Add(tc.X); texcoords.Add(tc.Y); indices.Add(nextIdx); ++nextIdx; } reader.Close(); return loader.loadToVAO(positions.ToArray(), texcoords.ToArray(), indices.ToArray()); } } } } } And It have tried other method but it can't show for me.  I am mad now. Because any OpenTK developers won't help me.
      Please help me how do I fix.

      And my download (mega.nz) should it is original but I tried no success...
      - Add blend source and png file here I have tried tried,.....  
      PS: Why is our community not active? I wait very longer. Stop to lie me!
      Thanks !
  • Advertisement
  • Advertisement
Sign in to follow this  

OpenGL Need sugestion to optimize realtime bitblt->StretchBlt (scaling takes to much CPU)

This topic is 1393 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Here's my problem. I need to copy and scale the screen, ~30 times a second (live screen drawing).
I do this by first getting a handle to the screen DC:

Then, I create a compatible DC using CreateCompatibleDC, create a DIB using CreateDIBSection, and select the DIB into the new DC. Now I StretchBlt from the screen DC to my own DC  (blit and scale the image) .

So we have

StretchBlt ... (get the screen and resize from 1440*900 to 1200*700)


As you probable know the scaling takes most of the CPU and time.
If i do only

while(1){bitblt()} i get ~400 fps - i.e 400 call of bitblt per second.

if i use the while(1){StretchBlt} i get only ~25 fps - i.e 25 call of StretchBlt per second.

(I need 30). also since i am doing other things as well, CPU will not be idle so this code need to be ran at least at 34 FPS.

As said, the bottleneck is not the Blitting from the screen. the bottleneck  is the scaling (i use SetStretchBltMode(hCompatible, HALFTONE)).
with COLORONCOLOR i get 60FPS but quality is not good (if there only something in the middle of HALFTONE to COLORONCOLOR  ...).

For now i do not want to use directx or open GL.

Any idea how to make it faster (i only need ~10 more FPS :))
I have an idea how to make it faster  but i do know how to do that and if it is possible.
Idea is to:
Bitblt to get the screen into buffer A.
StretchBlt to scale the image into buffer B.
Now on the next bitblt's to use SRCINVERT (XOR) to blt only the differences between the new capture and the old and to use the differences to update the scaled buffer B so no need to do more scaling.

Again just an idea but i do not think it is possible.

I would love to hear your suggestions.

I tries to use bitblt with some custom scaling functions (instead of StretchBlt) i found on the Internet or with image libs like FreeImage but results were dramatically worst than StretchBlt  (even using BOX filtering)

Again, i know that using DirectX/opengl may be faster but currently i do not have the time to learn that or deal with that - so currently i would like to use GDI/GDI+ only.



Share this post

Link to post
Share on other sites

Hi. are you running this bitblt in a loop like here.


StretchBlt ... (get the screen and resize from 1440*900 to 1200*700)



may be you need to just update it on a wm_paint or what ever the message is.

Share this post

Link to post
Share on other sites

I do not think that there is an option to catch when wm_paint is sent to the screen itself, i.e tocatch some event that is thrown when what you see on the computer screen is changed.

Share this post

Link to post
Share on other sites

It seems that halftone blitter is really slow, couldn belive it to be so slow and i did quick testing - it shows the same result youre talking here about

abput 40 ms - what system do you have, it may depend on the system 


i dont know some say that in win7 it may run even worse than xp ;/ ?

(found something on this

http://vjforums.info/threads/stretchblt-can-hang-windows-by-hogging-gdi-lock.38179/ )


anyway normal mode blitter working ok and you could rescale this by hand code or ev trying to do some "opengl/dx blitter" with fast hardware resizin onboard - which i was not testing yet

Share this post

Link to post
Share on other sites

Hi Fir.

I tried to use custom scalling - some even with ASM byut could not get even near strechblt - with the custom scaling fucntion i got ~10 FPS ....

And as say i do not want to use opengl/dx.


Am am using win7.

Share this post

Link to post
Share on other sites

Hi Fir.

I tried to use custom scalling - some even with ASM byut could not get even near strechblt - with the custom scaling fucntion i got ~10 FPS ....

And as say i do not want to use opengl/dx.


Am am using win7.

you may show the code, we can taka a lok here and maybe come advices or conclusions will appear -i could do the test of some down of upscalling in bitmap arrays im not sure but i think it could work at 50 (100?) hz or more - not 100% sure though


PS i did some simle test of down and up scaling frame bitmap into some buffer then copying this to framebuffer array again -and even with this double copying i got no trouble


for screens size like 500p it was 4 ms (scaling forth somewhere and copying back) for screens like 1000p it was 12 ms (for such two way: scaling+copying back), so not a problem here -


got no idea what this halftone is doing to be so slow but this linear scaling works ok


Edited by fir

Share this post

Link to post
Share on other sites

First i tried this:


Gave me around 10 FPS only.

I then tried FreeImage lib (FreeImage_Rescale) got around 7 FPS.

None came even close to StretchBlt preformace.


I think the answer should be

1) To find fastest rescal (with good quality) scaling funtion/


2)to somehow use the scaling histroy.

I.E to try to scal only the parts that changed from the last scale - but not sure how to do that ...

Maybe with SRCINVERT (XOR) - somehow.

Share this post

Link to post
Share on other sites



got no idea what this halftone is doing to be so slow but this linear scaling works ok




Well without halftone the image quality is not good.

Text does not look good and somehting is unreadable.

The halftone smoth the image by avraging pixels before it scall or something like that.


Share this post

Link to post
Share on other sites

Scaling is expensive. This is why, for instance, Fraps only offers fullscreen or halfscreen resolution when it captures the screen, so that scaling is either unnecessary or very easy. It's just too costly to handle cases where you're not scaling down to a power of two of the original size, because you have to handle filtering of multiple overlapping pixels, which also blows your cache because the resulting memory access patterns are.. suboptimal to say the least. There is one thing you can try, which drastically helped me back when I was on my old laptop and trying to capture stuff, which is to set your screen resolution to 16-bit colors.


Otherwise, what I would try is instead get a copy of the screen in a GPU-bound texture, using your favorite graphics API, scale it using the pixel shader (which ought to be fast enough, even with bilinear filtering) and read it back on the CPU. But this might incur significant capture latency, and might not be feasible, and you said you don't want to do this anyway, so...


I'm not sure what your specific needs are, but have you considered simply getting one of those screen recording devices? They don't have any system overhead and you might be able to get one to feed back into the computer to deliver the screen contents in realtime.

Share this post

Link to post
Share on other sites

Hi Bacterius

Yes i know scaling is expensive. that is why using the scaling history might help.

I just do not know how to do that. I.E to scale only the parts from the image that changed from the last scale - that way i scale one time and on next scales i only scale

the parts that were changed - but not sure if that can be done and how.



Just tried CxImgLib and got only 20 FPS still lower than  StretchBlt ...


P.S2 lowering to 16 bit did not help and in anyway it should run on all systems without making changes - that is why recording device is not a solution.

Edited by Jimkm

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement