Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Member Since 16 Jan 2012
Offline Last Active Yesterday, 07:19 PM

#5219974 Multithreaded mesh loading and 0xcc returned from glGenVertexArrays

Posted by PunCrathod on 29 March 2015 - 09:25 AM

You can't use the same context in multiple threads at the same time. You can "release" a context and "bind" it to another thread but that obviously makes you unable to use it in the original thread. You can however use multiple contexts and have them share resources with each other wich is propably what you want.



#5197453 game running slow but only consuming a tiny proportion of the CPU

Posted by PunCrathod on 10 December 2014 - 02:37 PM

I can't count the number of times I've seen people try to fix a performance problem without profiling. It almost always happens like this:

Yeah same here. And some people fail to find the problem even with a profiler because they cant figure out how to use the profiler or how to interpret the data.



VS Express may not have a profiler tool, but you can do your own profiling with the .NET StopWatch class.

Except he stated in the op that he is working on a c++ project and earlier today he said that he got a new VS which does have a profiler. I tought that with one time telling people not to skim the op and the thread would be enough. Skimming the posts and giving wrong advice because of it only confuses the op more.



thanks for all the helpful answers - VS Express doesn't have a profiler - but I've now got Community which does. I've checked and I'm I'm running a dual core i5 with HT, so there's 4 logical cores and the VS profiler is showing it maxing out at 25% CPU, where Task Manager was showing 12-13% - so it is one thread maxed out. And I'll not be trusting numbers from Task Manager again...

Great. Now I would suggest that you run the cpu sampling(may require admin priviliges the first time) and you can see wich function uses the most cpu and then you can even look at what lines take the most time inside that function. Tell us what you find and hopefully we can give you some easy optimization advice that will get your fps up to desired level.


Also if you can could you post the lines that use most of the time it would be easier to see what needs to be done.

#5197251 game running slow but only consuming a tiny proportion of the CPU

Posted by PunCrathod on 09 December 2014 - 02:14 PM

Yes, I did smile.png To be honest my first guess would be that one core is 100% busy and on 8 core machine it would exactly like 12%-13%. However I didn't mention it as OP says he has only 2 cores. Even if it's 2 actual cores + 2 virtual ones, it's still not enough. I agree it is guess game here, but even with the code available it will be. Only profiler can give some results, but unless OP will upgrade from Express to at least Community edition the profiler won't be available and guessing is all we have smile.png


Yeah without profiling the code itself will only be helpfull if someone is lucky enough to spot something thats off(more likely if that person has dealt with similar cause on his own code recently). But cpu not maxing out is only possible when some function is waiting for an event to fire wich in a singlethreaded program most likely is sleep(), A polling function, reading from a slow media(HDD) or a blocking call to gpu. But as said before we can only guess.


The reason I'm asking about the way you poll your socket is because socket.poll function is usually by default a blocking call that waits for a timeout. This timeout can only happen when the os "ticks". Wich happens in windows every ~16ms if I remember correctly. Depending what library you use you propably can set it to nonblocking.

#5195714 Best way to follow a moving object

Posted by PunCrathod on 01 December 2014 - 09:21 AM

If you only have 100x100 grid it would propably be faster to just floodfill from the player. That way you only have to calculate ~10 000 distances once after the player moves from a node to another and every node knows what is the fastest direction to go to reach the player. No matter how many enemies you have the pathfinding time is constant as they only have to look the next direction from the grid. If it still isn't fast enough you can spread the floodfill to multiple frames of say 1000 nodes per frame. The additional lag of the further away nodes shouldn't be very noticeable as the path far away would stay the same most of the time.

#5191657 My high score do not update after restart

Posted by PunCrathod on 07 November 2014 - 06:58 AM

= is an assign operator. What it basically does is put whatever is on the right side into whatever is on the left side. Tough I can see how people new to programming could assume it would work differently.

#5191646 My high score do not update after restart

Posted by PunCrathod on 07 November 2014 - 05:51 AM

I don't know how gameEnd is called or how your gameoverscript works but I'll make a guess that

points = highscore;

is propably wrong way around and should be

highscore = points;


#5170289 Rendering differences in OpenGL versions?

Posted by PunCrathod on 30 July 2014 - 05:44 AM


I don't think the graphics card type should make the difference

This is most likely the reason. Different hardware and driver will often result in different behavior, reaching from not supported features, simple driver bugs to different interpretation of shader code (AMD = more strict, NVIDIA = more relaxed).



From my experience it's more about the driver than the hardware. Also the AMD=strict, NVIDIA=relaxed hasn't been true for a long long time. It varies based on driver versions more than witch hardware you are on. With the most up to date drivers AMD is actually more relaxed allowing some incomplete textures and buffers as well as implicit casts that truncate values in shadercode while NVIDIA just gives GL invalid operation errors. Neither AMD or NVIDIA goes strictly by the standard and the relaxness/strictness varies with driver versions and what you want to do so you always have to test everything with both.


Back to op.


Yes you should use shaders to all things possible. They will almost always be faster and more reliable. And once you get used to rendering everything with shaders it actually gets easier to do than with immidiate mode. And don't be afraid of using multiple shaders. A lot of people will always say to write generic code so you can reuse it as much as possible. But with shaders you actually want to do the opposite and make as specific shaders as possible. This is because generic all purpose shaders are usually painfully slow and it would be heaps faster to just use multiple slightly different shaders even if you have to split some drawcalls into two or three.

Altough that being said. I still sometimes use immideate mode when I'm debugging the non rendering code before I have established a proper rendering system using shaders because it is super simple to get a few triangles to the screen with it.


Also be careful about the version numbers. The glsl version numbers don't match with the opengl version numbers until after opengl 3.3. see http://en.wikipedia.org/wiki/OpenGL_Shading_Language#Versions for a complete list of glsl versioning.

#5164930 Using VBOs for dynamic geometry

Posted by PunCrathod on 05 July 2014 - 02:31 PM

You don't actually need to invalidate the VBO when using GL.BufferData(). Inputting new data already does everything you need. If you want to use the invalidating you need to use GL.MapBuffer(). More on the subject here -> http://www.opentk.com/doc/graphics/geometry/vertex-buffer-objects.

Also if you are rendering the buffer that you just updated then the potential speedup of using multiple buffers goes to waste seeing as the drawarrays has to wait for the datatransfer to complete before starting to do the actual rendering in wich case you might aswell use a single buffer. You need to update the data of the buffer that is going to be rendered next frame instead. Besides multibuffering VBO:s isn't usually going to give you much anyways as the bottleneck is most of the time somewhere else.


Most times when gfx programmers talk about double or triple buffering what they mean is that they have two or three "screens" to wich they do all the rendering and in case of double buffering they swap the buffers after all rendering to the current frame has been completed. And in triple buffering they swap the two background rendering buffers after rendering is finished and swap the currently not in use rendering buffer with the displayd buffer when the monitor has finished presenting the buffer.


Be careful of overoptimization. What you should do is set yourself a goal fps. And only start optimizing if you get below that fps. Anything above it shouldn't matter at all. If you want 60+ fps, you add a feature and your fps drops from 200 to 120 just shrug it off and continue adding the next feature. And always start with the easiest optimizations first as they are more likely to take less time to implement and over half the time it will get you above the target fps.


Edit: oh and before you start to optimize anything profile the damn thing thoroughly so you avoid using tens of hours optimizing the part that takes 0.01% of the actual process. Use http://msdn.microsoft.com/en-us/library/system.diagnostics.stopwatch.aspx to measure the time it takes on "your" end of the process on different parts of the program. And GL.BeginQuery(QueryTarget.TimeElapsed,...); and GL.EndQuery(QueryTarget.TimeElapsed); to measure the time it takes for the driver and the gpu to perform the tasks that were issued between them.

#5164059 Using VBOs for dynamic geometry

Posted by PunCrathod on 01 July 2014 - 10:17 AM

Creating a pool and resizing is a good idea. However there is another problem I'm facing when I use VBOs:

I'm currently using a List to store vertex data and render them as described above. This is very comfortable, because I can add/remove vertices from the list every frame. The downside is that rendering a huge amount of vertices takes ages. So VBOs are much faster for rendering, but a lot of flexibility is taken away, because now I need to use arrays to store vertex/index data. With VBOs I have to build a new vertex/index/normal buffer whenever the geometry changes and in my case, this can happen every frame. I know that this is not a VBO/OpenGL problem per se, but maybe someone has good ideas to solve that.

Your code looks like c# so I'm assuming you haven't looked at the list.ToArray() function wich is almost free so you can GL.bufferdata(list.toarray,count). Also you don't need to generate new buffers. Just update the old ones.

#5148731 Fantasy Guild Management Sim

Posted by PunCrathod on 22 April 2014 - 08:35 AM

Have you ever tried the ogre battle series. It had an intresting take on the combat where you couldn't directly control any of the characters but nstead you chose the strategy they used. For example you could set the strategy to kill lowest hp first, party leader first, strongest attacker first etc. Then the combat just happend on its own.


I suppose management games aren't that popular but I certainly would play a fantasy guild managament game.

#5122165 measuring latency

Posted by PunCrathod on 08 January 2014 - 08:14 AM

I know that in order to apply 'smoothin out' algorithm (client side prediction, extrapolation), one has to take into account the latency factor. But it kinda dazzled me as to whom should calculate it and how it is generally measured.


1. Who gets the burden?

-Do the server do the calculation and tell the client overtime?

-Do the client do it (since it is him who needs to be smoothened)

-Do the server and client each has their own pingtime?


2. How would one calculate it?

right now I simply use the simplistic ping approach (e.g., send "ping" message and teh server quickly throw back the reply).

however, it was too simplistic and pingtime was generally change overtime (they sure are very random). Thus the second question. But if we use the average pingtime, it sometimes won't be correct, right? plus if I have to use average pingtime, that means I have to already have several ping results at hand. What if the game has just begun? should I spend my time calculating pingtime first?


3. How was it generally used?

Example scenario:

-client send message that he just pressed "move forward" button. In order to hide latency, he directly moves his avatar forward locally.

-by the time the message reaches the server, however, it's already too late. perhaps by 100 ms. then the server should simulate as if player pressed the button 100 ms ago, thus the server must do "extrapolation" by 100 ms. CMIIW.

-client stopped pressing forward key, thus he sends "stop pressing forward" message to server.

-the server gets the message 100 ms later, oops that means our player is moved too far by 100 ms, so his position must be corrected by 100ms

(pos = pos + vel * -0.1). as such, the server send a position update message to player

-the client received the message, assume it's late by 100 ms. but since this is corrective position, the client simply stick his current position to the server's corrective position. He might also use interpolation (so it looks like sliding over rather than snapping) to alleviate graphical jerkiness.


So am I right on number 3?


4. bonus question. I recently have a chat with some gamedev over IRC, we were discussing minecraft. I threw a question regarding its networking, and some fellow said that the send rate is very low (a particular guy even said that it's only 5 TIMES A SECOND!! WTF!!). that means it sends every 200 ms. now I wonder about its update rate. since minecraft is kinda "physics"-is, I bet they use at least 30 fps update rate. But since I can't get a look at their sources, I can't confirm my curiousity. Do any of you guys know about this? thanks a lot.


Those are my questions, I hope you guys could help this confused guy. Anyway, thanks for your time reading this blocks of text.

1. This depends on where and why do you need it. For smoothing out latency in a fps you propably would calculate latency in the server since it needs to know where each client was when one pressed the trigger to see who got shot.


2. There are a few ways. Easiest being just to measure how long the server or client takes to respond to a simple short message.


3. There are quite a few things you could use the latency for. Movement prediction and correction is perhaps the most common use.


4. Minecraft suprisingly doesn't have that much physics in it. And as far as I know its network update rate depends on the servers tickrate so with a lot of mods it can be as low as once in every 2-10 seconds. The tickrate is capped at 20 meaning it would update 20 times a second at most. But the update rate is not that important in a game like minecraft. A 5hz rate would be more than enough. For example battlefield 3 and 4 updates 10 times a second and its a twitch shooter where life and death depends on a few milliseconds. This is exactly where latency corrections and such come into play to smooth things out.

#5117566 Is GPU-to-CPU data transfer a performance bottle-neck?

Posted by PunCrathod on 17 December 2013 - 08:05 AM

And do you need to use doubles? Computing your v expression with doubles is slower than with floats (the double division will be really slow) and also converting it to float takes some time.



You don't really need to worry about doubles or floats. As long as your memory bandwith does not run out doubles are almost always as fast as floats on a 64 bit processor as they are both usually calculated with the same alu using the same 80bit registers. Granted with 1 million 3d points you have 3 million doubles and thats about 24 megabytes and there goes your bandwith. If all the data could fit in L3 cache(or L4 if you have it) then you propably would not get much of a difference in performance without using some fancy stuff like SSE(wich can infact process double the amount of floats than doubles in one cycle). But in this case even with SSE you wouldn't get much of an improvement as your bandwith is already holding you back.



Maybe you could sort on GPU and keep the sorted array on GPU and use it as an indirect parameter to your rendering, perhaps as an index buffer. That way, you wouldn't have to stall at all.



This is propably the best bet on getting a million particles sorted inside a reasonable amount of time. Just as a comparison a game that I'm making with a couple of friends has a particle system that has two textures containing the particle data. One for reading and one for writing and swapping them around after rendering. Granted it's only in 2d and no need for sorting but since all the data is kept on vram all the time its blazing fast and we can have 5 million particles without any noticeably decrease in performance(less than 1ms difference in frame time between 100k and 5mil particles with a nvidia gtx 660). And as op said in the first post the shader did the job in less than 1ms too so the best solution would be to figure out a way to avoid transferring all the data between cpu and gpu.


We actually did ours by having three different shaders. One that "rendered" new particles in the particle data texture, one that updated the data between frames and one that rendered them to a framebuffer to be displayed on the screen. So the only data that needed to be sent anywhere was a list of particle emitters active during a frame sent to the first shader and the rest was just a few draw calls. I believe they call this approach ping-ponging. I'm not too familiar with how it works as my two friends do most of the rendering code in our project but I hope this gives you the motivation to try something similar yourself.


However if your particles interact with the rest of the simulation then it gets complicated and I have no idea how to make that happen.

#5114960 Offset mass angular/linear velocity?

Posted by PunCrathod on 06 December 2013 - 03:25 PM

It seems you are looking for rigid body collisions.


1. This gets complicated real fast. Reading a book on physics is a good start but if you don't want to read a book you can take a look at this http://www.myphysicslab.com/collision.html

Just be avare that there are a lot of vector math and physics related terms and if you are not familiar with them this will propably not help you much.


3. As long as your objects arent bolted down to an axel then yes. Objects always rotate around their center of mass.


4. This one is also a lot of complicated math involving a lot of vectors. The link in #1 mostly covers this if you can read all the scientific terms and know vector math.


Seeing as you are only 14 my advice is to pay close attention in math and physics classes at school. It may sometimes seem boring but when you start to understand all of it and you get your first physics simulation program working it gets fun real fast.

#5098774 How do I use multithreading?

Posted by PunCrathod on 04 October 2013 - 11:05 AM

Boolean flags aren't atomic (so you have to know details of the platform and ensure there's enough padding that other mutable data won't be too close to the flag), and there's no ordering guarantee that the changed flag won't become visible to the main thread before the star system data is actually committed to RAM (which creates a potential race condition where the main thread sees the flag is true, but reads the star system data before it's been written/completed). As you mention, this then requires memory barriers (both compile-time and run-time varieties) to be inserted after writing to the flag and before reading from the flag (not mentioned above). If you're writing that kind of low-level synchronization code though, you really need to understand why those 4 barriers are required, which is not a suitable beginner task.
Beginners should instead use pre-made synchronization primitives, like critical-sections/mutexes, or a higher level parallelism library.

You don't need to protect the flag. A memory barrier in the generating thread just before setting the flag to true is all that's needed. The data is then quaranteed to exist before main thread sees that the flag is set to true. If you are advanced enough to be thinking about pregenerating the gameworld in advance during runtime then understanding this should not be a problem for you.

I'm not saying you shouldn't study how memory barriers etc work but a background worker thread isn't that complicated. I just think its arrogant to claim someone can't possibly comprehend multithreading at all if he hasn't been programmin for 10 years and hasn't learned the inner workings of whatever platform they are targeting. Where I studied programmin a few years back background worker threads were considered beginner stuff.

#5098719 How do I use multithreading?

Posted by PunCrathod on 04 October 2013 - 05:16 AM

Threads are not so hard people make them to be. If I understood right about what you want to do this is the easiest multithreading case there is. Correct me if I'm wrong.

You want to have another thread generate your starsystem while your main thread is handling the game. Without the additional thread you would do a small portion of the generation between each frame and the performance is not good enough for you.

There is no data racing or any other complicated stuff going on if you generate the starsystem in a single thread and make sure you do not access the starsystems data from anywhere until the generation is complete. This kind of threading does not take that much effort to get it right and will increase your games performance quite a bit. You only need to have a threadsafe way to tell the main thread if a starsystem is still being generated. You also can use n generation threads to generate n starsystems at the same time.

My tip is to not use any global variables to manage the threading as those get complicated rather quickly. Have a "generated" flag that's defaulted to false and set it true after generation is complete in your starystems class or struct or whatever you use to store your starsystems in memory. You may add a memory barrier to after the starsystem is generated to make sure all threads have the same data.

This way you'll have a full thread generating the starsystem as fast as possible without any need for complicated threadmanagement stuff. A lot of people are trying to scare people away from threads because they don't understand it themselves. Granted there are a lot of complicated stuff to do if you want to multithread realtime simulations but a simple background thread to generate the next part of the gameworld before the player is close enough to interact with the part is a simple task.

You seem to have managed to get a thread going. Just try and put the code to generate a starsystem into the thread and see what happens. If something gets broken and you can't figure out why then come back here and explain what and how and someone will propably be able to help. Multithreading is not rocketsience.