Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Member Since 27 Aug 2002
Offline Last Active Today, 07:03 AM

#5222197 Relation between TFLOPS and Threads in a GPU?

Posted by Krohm on 09 April 2015 - 04:37 AM

There's no relationship because GPUs have no real "threads" in the CPU sense. The choice (of DirectCompute) to use the word thread is so bogus I cannot believe it made to final documentation. That said...


It depends on the chip family and even on the specific segment.

The number of instructions executing per tick is just: $$ops = processingElements * clockRate_{hz}$$



Which takes us to the magic world of processing elements: what are those?

They are part of the ALU carrying out the useful work. Many people think one PE ~ 1 thread and given current GPU capabilities you can currently do that. But in practice 1 PE is a much more fine-grained element and you are given the choice of how to setup the PEs to make up "threads".


The native concept of a "thread" for GPU is the Wavefront (AMD GCN, OpenCL) or the Warp (NV). They are basically the same thing: packs of 64/32 processing elements.

I am going to use the word thread for your convenience but be warned it is inaccurate term.

Marketing wants you to believe a PE is a thread but if that would be the case then a single CPU thread using SSE would be quad-threaded.


The amount of "threads" executing at a given time (assuming you always saturate device) is 

$$threads=processingElements / threadSize$$


So for example GM200 titan x has 3072 "cores" (marketing jargon) which are really 3072 PE (CL jargon). With a warp size of 32, you have 96 threads in flight. WRONG! You have 96 warps!

If tomorrow NV decides their warp size becomes 16, you'll have 96*2 warps.

This is at each given clock.


During processing, the GPU will switch across several warps. The amount of warps in flight depends on device and the actual program being executed. There's usually an upper bound but I'm not well aware of NV architectures.


EDIT: I messed up the second formula somehow.

#5218417 Amateur Looking For Advice On Where To Go Next

Posted by Krohm on 23 March 2015 - 03:24 AM

I'm not sure Java is going anywhere so I support your idea of going away.

I honestly wouldn't start from C++ nowadays. C# and JavaScript are better candidates in my opinion for the time being.


I'm pretty sure UE4 allows some fairly extensive scripting without even writing code (through Blueprints). I strongly suggest to play a bit with some engine, maybe only from a level design perspective. It allows you to keep a view of the target at an higher level. It likely doesn't make you a better programmer but if you want to ship a whole product, you must think at the whole picture.

#5217627 Mantle programming guide and white paper released

Posted by Krohm on 19 March 2015 - 07:20 AM

Pretty ironically,


From Anandtech

So pull up a chair, get comfortable, and find large quantities of caffeine as this isn’t the sort of material for a quick read – the PDF weighs in at a hefty 435 pages. That’s pretty much par for the course when it comes to API guides though – the Direct3D 11 API is almost certainly just as long (though I couldn’t seem to find a comparable PDF).


OpenGL 4.5 core: 825 pages.

OpenGL 4.5 compatibility: 999 (yikes!) - no idea how much is really shared

GLSL: 209 pages.




#5215991 Decompressing PNG / JPG images on the GPU

Posted by Krohm on 12 March 2015 - 12:42 AM

I'm pretty sure AMD has a GPU-accelerated media pipeline. No idea how much is available, how much is GPU, how much is their own internal asics. Odds are it works with their GPUs only, anyway...




Give up man. This is only going to be painful. D3D9-level devices do not have enough flexibility and compared to modern CPUs they might not have enough performance advantage either. D3D9-SM3 devices might have been worth worth talking in 2008. Maybe.

#5214730 Placing enemies in the map/world

Posted by Krohm on 05 March 2015 - 07:59 AM

Interesting question.


As I've tried to prototype a SHMUP game some time ago I can appreciate some complications.

Is your game something like this?


Personally I haven't found existing editors much of an help - maybe I haven't searched hard enough. The main problem is not much the code but rather figuring out the numbers to put there as enemy patterns must be effectively authored in screen space and easily visualized with full time control for fast iteration/tweaking.


What I would do today: 1) have another googling at tools 2) hack together a HTML5 <canvas> utility to pour out JSON data.


This is in contrast to more static design such as in FPS: I've had no trouble adapting Blender in this case but note enemy movements in this case are very different things. Given the amount of paths, it would be quite inconvenient to pack this data in Blender.

#5213844 Dynamic Memory and throwing Exceptions

Posted by Krohm on 02 March 2015 - 02:27 AM

Maybe worth recalling somebody isn't AAA. Exceptions are extremely handy and considering the first few posts of this thread are clearly written by someone who doesn't have an accurate view of what's going on, I'd suggest to stick to what C++ suggests to do as canon as long as there isn't a specific product to talk about.

#5213151 Component-based architecture - entity and component storage

Posted by Krohm on 26 February 2015 - 01:06 PM

As a side note, if physics is your stuff, play with some physics API first!



Then you have a really weird definition of a component. His game objects are very clearly composed and not monolithic. That's all using components means, in any context; ECS is _hardly_ the first place the word "component" has ever been used in computer science or even game development.

Well, you got me there. I should have been more explicit in intending the word component in this case is to be intended uniquely as intended in CES.


#5213091 Component-based architecture - entity and component storage

Posted by Krohm on 26 February 2015 - 09:06 AM

No idea what exactly is going on there but what you have done isn't a component thing to me.

Just because you can put arbitrary "component" object handles inside an array, which allows you to build "entities" does not mean you are component based.

The above is not component based either, it's switching behaviors exactly like a monolithic entity would do. I'll agree that has some very slight flexibility added.


No idea what a "physics" component is supposed to be either. I assume it is a rigid body representation.


Here is ECS, condensed to the its core.


There are no entities. There are only the components.

See Fig-2.gif



The message "between the lines" and showcased in the above diagram is: the execution/update of components is independent from other types and can be - in theory - completely asynchronous. I dare everyone in writing a fully async, fully ECS-only system but that's for another time.


So, what does that mean? It means basically the opposite of this:

One idea that came to mind was having a vector of pointers for each type of component and passing the corresponding vector to the corresponding system.

You should really have the components exist in the systems only and link to them on need rather than keep them floating around and putting them back in on need. Seriously, where do you think rigid bodies are going to go each frame? In the physics library. Where do you think the models will go if not in the rendering subsystem? No point really in taking them out: you take out reference / pointers to them and leave them live in their own land using a base class or a proxy of some sort. Internally the subsystem accesses everything it needs while externally you don't.

#5211676 Problems with partial OpenCL kernel dispatch

Posted by Krohm on 19 February 2015 - 08:00 AM

Wait, you can specify read_only on globals? It was my understanding it was for image objects only.

Parameter 7 to clEnqueueNDRangeKernel is currently (int)ArraySize(waitOn). Leaving aside it is a cl_uint, the pointed events must complete and I have no idea what is going on with them.


Considering png typically goes with integers I would also check out the way you mangle the resulting data.


Ultimately, some drivers have watchdogs and will kill dispatches if they take too much time to run. Considering the inner loop seems to be doing nothing (the value is trashed right away) I think that's fairly indicative. Am I missing some side effect?

#5211671 terrain editor resolution based on height

Posted by Krohm on 19 February 2015 - 07:39 AM

i want to have the ability to adjust the resolution on the area where the terrain are raised/lowered,

Interesting idea in theory. In practice this would require a perfectly regular structure such as the point grid to become irregular. I guess you could do some sort of quadtree to provide more resolution. I remember a paper about quadtree-accelerated parallax occlusion mapping which could be adapted to your uses. It's not very complicated but I have doubts it's really worth it. Last time I checked the Unreal developer network it looked like not even Unreal (3) had support for that so I have doubts about its usefulness.


how do i handle or store the height data? right now the height data are just stored in a 2x2 array and each array is equivalent to one vertex in terrain grid

That's fairly peculiar. Why are you doing that? I would just use a 16-bit grayscale map. Or perhaps RGBA32 for super extra precision but I don't see much of a point in thinking at this as a bidimensional sample of sort. Please elaborate, I'm curious.


I'm pretty sure what you're looking for can be done using quadtree-s.

#5210219 Moving capsule and ball intersection?

Posted by Krohm on 12 February 2015 - 02:39 AM

Considering your last reply, I'd strongly suggest use of a proper physics library.

#5208872 Next-Gen OpenGL To Be Shown Off Next Month

Posted by Krohm on 05 February 2015 - 07:44 AM

I'm not sure this is good news. Considering ARB Khronos large history of success with GL I cannot avoid thinking they'll do something incredibly dumb (such as the nonsensical fences added recently) for the good of marketing.

#5207395 Is there a way to draw super precise lines?

Posted by Krohm on 29 January 2015 - 04:49 AM

I honestly don't know how anyone could think "hardware" lines to be usable. Even assuming they get drawn "where expected" they still don't render "as expected" (they don't interact with zooming properly).


A relatively old NVidia article on filtered lines.


I also want to quote for emphasis:

If you want thicker/softer lines, the foolproof solution is to render a quad that bounds every pixel-center that could possibly need shading, and then in the pixel shader, derive an opacity/alpha value from that pixel's distance to the mathematical line.
I'd go that route myself... even though I've often been confronted with endpoint rendering, mitering and other issues in the past. It would probably be possible to do all this in a set of shaders nowadays and it would sure provide enough quality for 99% of uses in games.

#5207385 Standard structure of a large scale game

Posted by Krohm on 29 January 2015 - 04:10 AM

Try considering games such as Quake 1. You will see they were considerably more homogeneous in mechanics.


Special cases in code are... I wouldn't say they are canon, but they happen and sometimes they save the day. I recall of someone admitting they shipped a game with code such as:

if(levelIndex == 5) {
    entity[12].position.y += 1.0f; // because it doesn't get placed correctly for some reason and we really need to ship!

Of course it is bad practice.


What changed from old games to new games? They become more data-oriented and, as they grown more complicated they embraced scripting.


While my "full scale" game has been in long-term cryo-stasis for a while, I still think its design was really well tought-out. It had no gamestates at all. Everything went through something similar to components and the engine knew nothing about the relationships between those components. Those were provided by gameplay-specific code by scripting.



Now, I'm not suggesting you go scripting right away. It involves complicated things. Data oriented design could still get you far.


Example: we have a FPS with hit-spacebar to action mechanic.

We enumerate the "actions" required. Doors open. Buttons get pushed. Pinballs gets played.

When loading up the data, we find an entity using action[2]. We create a PinballAction object and attach it to the entity.

PinballAction will be very complicated (push current keybinding, set new keys, set new camera parameters, activate the complicated pinball simulation with the newly bound keys... ).

That still requires a switch to instance the correct class, but that switch is no more in the "live" parts but rather in the loading paths (which are hopefully simpler).


If you go scripting, you save that switch as well. Of course, if your actions are simple going scripting will likely be not worth it.

#5204464 Multiple Lights on game map with forward rendering

Posted by Krohm on 15 January 2015 - 07:22 AM

NB: all the game I've seen showed like ~32 number of lights. But I'm not sure if it's "hardcoded" to their engine. (I can't imagine looping thru 32 light properties in the shader just to compute 1 pixel's color)
Well, I did that as an experiment years ago on a GeForce 6600 GT, 128bit GDDR3. It did work (with quite fine parallax occlusion mapping) albeit not smooth enough to be used in production.


I would expect modern mobile to match that performance level at least on some silicon but since a lot of devices have hi-dpi screens I cannot really promise anything.


If you are sure you always have a high light count, consider deferred.


TL;DR: it is possible and sometimes viable to iterate them. Whatever it is in your case it's another matter. 




If only I could store light properties in a texture and read em back in the fragment shader, that'd be great.

Wouldn't this be the same as a lightmap (or more like deferred shading)?


Absolutely not. Deferred is (sort of) splatting your lights on the (not quite) "finished" scene.

What he plans to do here is to use texture as an array (as a side note, this is what I did years ago).

It is doable... but be careful with instruction counts. Your shader might get killed after a while. This will be probably both hardware and driver-dependent. Proceed with caution!