Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 13 Sep 2012
Offline Last Active Yesterday, 05:39 PM

#5269520 [SOLVED] Uniform buffer actually viable?

Posted by on 05 January 2016 - 06:48 PM

It also does really feel like a "hack", and it really only works with instancing.


You think that is a hack? Well, Unity stores instance ID numbers as part of the texture coords so I'd say its an improvement biggrin.png And I remind you you're using textures to store things that aren't textures what is that if not a hack? tongue.png Providing an instance ID its always been a hack because the API doesn't provides a direct way to do it. Only very recently with gl_DrawID and even that isn't widely supported nor has good performance. Its an issue you need to work around given what you have.


Anyway, here is Mathias explanation about how you can have IDs for whatever you want to draw, thats implemented in Ogre3D:




TL;DR; Setup a fixed attribute from indices to 0 to MAX_INSTANCES once, and just manipulate it with instanced call instance id. Oh and before you mention "I'd have to draw everything with instanced draw calls!", Mantle doesn't even has non-instanced drawing IIRC, probably Vulkan won't have either, so I'd suggest to get used to it.


 so this is technically doable for some shaders where it's clear how much data will be written from the start


Not really necessary. You don't need to know that, you just need to know the size of the data your struct instance contains and how much memory glBindRange can handle in your GPU. Like this:


// Allocate temp buffer up to max bindable ubo range.
Buffer buffer = alloc(MAX_UBO_RANGE);
// Uploaded task counter.
int tasksUploaded = 0;
while(!buffer.full() && !tasks.empty())
  Task t = tasks.next();
  // Pad to vec4 if necessary here.
// Here ring buffer works its magic.
ubo.bindNextRange(TRANSFORM_SLOT, buffer.size());
// Draw what you have uploaded so far.
drawTasks(tasks, tasksUploaded);


There, something like that, its just an iteration, write all the data to a buffer, then glBufferSubData or glMapBuffer it to your UBO. Then repeat until you have drawn everything.


(Here you can find a nice explanation on the internal differences between UBOs and TBOs http://www.yosoygames.com.ar/wp/2015/01/uniform-buffers-vs-texture-buffers-the-2015-edition/ )


Again, this reduces state changes and interaction with the driver dramatically. A similar technique is suggested in a GTC NVIDIA presentation, although they do things slightly differently. Look it up, it was called something like "Advanced OpenGL scene rendering", GTC presentation, various PDFs around that had data on how does the indexing into the shader impacts performance vs the amount of time it saves on the CPU side (overall win even on older hardware from what I read).


Graham Sellers, of AMD/Mantle/Vulkan/Modern OpenGL fame, also mentioned in an Ogre3D thread to store all meshes in as few buffers as possible, separated only by vertex input format. Here, read all the thread, good stuff in it:




In that way you can also reduce all the buffer/vao binding to a minimum.


I don't think I'm able to do that from Java besides hoping that the memcopy function I'm using for unsafe memory access outside the Java heap (= C performance) does that under the hood, which seems unlikely.


Now that you mention it maaaybe HotSpot does something like that. Although probably works just for copies between Java arrays. Maybe Spasi an do something about this in LWJGL... 

#5269326 Questions About Blur Effect

Posted by on 04 January 2016 - 09:02 PM

This Intel article its a good resource, discusses blurring techniques, optimizations and other considerations:



#5269275 Java, still being a good option for game dev in 2016 or there are other optio...

Posted by on 04 January 2016 - 04:41 PM

The single biggest problem you're going to have with Java is version-of-the-week hell.  Write once run anywhere is a joke. 
Have you had an issue where desktop Java applications broken inbetween releases of the same Java version? (ie, from Java 7 u60 to Java u72 or something). Otherwise its an unfounded myth.


I've run my projects in more VM versions of OpenJDK and OracleJDK that I can count, played Minecraft with all the damn updates that came every two weeks or so, never had an actual VM compatibility problem. They do take retro compatibility very seriously (to a fault even), thus why no API was ever actually removed from the runtime.


Hell I've never in years had an issue where Eclipse would crash because some VM was incompatible, and Eclipse its a massive application.


And please, don't even mention applets. They shouldn't exist, its a Good Thing™ most of them stopped to work.


In any case, there wouldn't be any single damn problem if you just provided a link to download the JRE but Oracle is composed top to bottom from a pile of steaming stinky assholes and they bundle crapware with their damn JRE distributions (luckily they dont do it with the JDK). So yeah, bundle a JRE (20Mb to 40Mb, libGDX guys provide a tool  to reduce the size of the VM by removing unwanted crap).


Still this is an issue that you will have in some measure whatever you choose. C# needs the .NET runtime (or Mono depending on the platform, which is a whole other issue altogether), Java needs the JRE, C++ will need whatever MSVC runtime you're using (or some specific glibc version depending on the platform), etc.


The answer to all of those is: Ship the dependencies with your application, and save yourself a headache.

#5269080 [SOLVED] Uniform buffer actually viable?

Posted by on 03 January 2016 - 05:51 PM

Here: http://www.gamedev.net/topic/655969-speed-gluniform-vs-uniform-buffer-objects/ I ended up implementing the idea I had at that time.


There was also a discussion with Mathias that I can't seem to find, he explained the instanceId with more detail.


Anyway, say that we have some per instance data. Like mv and mvp matrices:


Thats a single struct:


struct Transform
  mat4 mvp;
  mat4 mv;
  // Then some padding to respect std140 if necessary. 12 bytes / vec3 at most.


Now, thats 128 bytes per struct right? If you wanted to place them sequentially on a buffer and bind the range for each Transform struct, yeah, you'd need to place 128 bytes, then pad, for every Transform instance.


Lets say we got our typical 64kB UBO and we define it like this

layout (std140, binding = TRANSFORM_SLOT ) uniform TransformsBlock
  Transform[MAX_TRANSFORMS] transforms;


Where MAX_TRANSFORMS its max ubo size divided Transform instance size, given our 64kB UBO, that'd be 512 instances. Tightly packed.


Now the issue here is that while now you don't need to pad, since you're binding a lot of transforms at the same time, you need to index into the array to get the proper one for whatever you're drawing. There are many ways of providing an instance index, like with an additional attribute, with a normal uniform, with the instance ID, with a combination of instance ID and a vertex attribute, etc. I think Mathias talked about the instance indexing in the Vulkan thread, can't remember.


Anyway, once you got the index per instance uploaded its as straightforward as:


mat4 mvp = transforms[instanceId];


There, now you just need to bind the whole range to that TRANSFORM_SLOT, no more padding in between instance data.


Also, have in mind that you shouldn't put everything into a single buffer. Separate them between "globals" (stuff that never changes), per frame parameters, and per instance parameters. And choose appropiate update strategies for each. Probably mapping a global or per frame buffer for a single tiny update is a waste, glBufferSubData would suffice.


The strategy I use right now is to have a sort of ring buffer of a couple MB. I compute the maximum amount of instances I can upload in a single pass, given the kind of buffers that pass needs (say, TransformBlock and MaterialBlock).


Say that the max is 512. First I bind the ring buffer. Then iterate over the transform data, upload those 512 instances, then bind that range to the transform slot. Then iterate over the material data, upload 512 instances, then bind that range to the material slot. Then draw those 512 instances. Rinse and repeat for the rest of the draw tasks. The only padding I have is in between the kind of block I'm updating. Each block itself has its internal array of structs tightly packed.


Since its a ring buffer I just upload to the next available range, until it wraps around and starts again, by the time it wraps around that data will be quite a few frames old if you give it a couple megabytes.


That means that I can draw 10 thousand different things with 20 updates, 20 bind ranges, and only one buffer binding (drawcall count is a different matter, ideally you can also draw each instance batch with a single draw).


You can get smarter and pack UBOs in a way to reduce the calls even further via passing reduced forms of the matrices, packing different kinds of data into the same struct (ie, instead of having separate slots for transforms and material, just put them in the same struct), uploading all of the instance data in one step, and then just do a loop of bindRange-draw for all of them, and so on. That way you can handle batches of thousands with a dozen calls tops.

#5268936 Criticism of C++

Posted by on 02 January 2016 - 09:55 PM

Yup. Language wars are silly because it doesn't make sense to use only one language in the first place. Every language has its weak points and every language except for Java has its strong points.
 biggrin.png I'm above that shit Khat, I'll just upvote you.

#5268767 Benefits from manual function loading.

Posted by on 01 January 2016 - 08:24 PM

. I was wondering if there was any benefit to manually loading functions for OpenGL rather than using a library like GLEW or similar.
If your goal is to waste your time, then that would be one benefit.


Use a lib: GLFW, GLEW, SMFL, SDL, etc. OpenGL is hard enough as it is, you don't need to artificially create more complexity around it.

#5268750 Help understanding glewinfo.txt

Posted by on 01 January 2016 - 06:50 PM

There are two things in play here: The OpenGL context that you can create and the OpenGL features you can use.


As you guessed, with your card in Windows you can create an OpenGL 3.1 context, and an OpenGL 3.3 context in Linux.


What does that means? That you can use all the features up to that context version, contexts are inclusive so OpenGL 3.3 context has 3.0, 3.1 and 3.2 features. If your card didn't supported a 3.2 feature but all of 3.3 features, that doesn't means you can create a 3.3 context since you're missing the required 3.2 features, and 3.3 includes 3.2.


Now, while your card supports those contexts, it also supports some of the features of other OpenGL versions. Thing is, since it only supports some features, then you card can't give you a higher context. However, you can use those other features as extensions.


So you create the highest context your card supports, and if you want, import the rest of the features your card supports as extensions. That way while you can't say, create an OpenGL 4.2 context, you could use some of the features introduced in that version via extensions.


For example, I use a 3.3 core context in my application, but most of the D3D10 cards out there support a few useful extensions of higher OpenGL versions, like arb_texture_storage, arb_shading_language_420pack, etc.

#5267051 My A* Hierarchical pathfinding.

Posted by on 19 December 2015 - 12:08 PM

I see various places where you can simplify equality tests a bit:


// This is a more canonical equals, with the instanceof operator, which is a bit more robust.
public final boolean equals(Object o)
  if ( this == 0 ) return true;
  if (o == null || !(o instanceof Node) ) return false;
  return this.equalsNonNull((Node)o);
// Now we do an equals for when we know we're comparing nodes.
public final boolean equals(Node o)
  if ( this == 0 ) return true;
  if (o == null) return false;
  return this.equalsNonNull(o);
// Most reduced scenario, we know its a Node and that isnt null.
public final boolean equalsNonNull(Node o)
  // Here we can use the reduced equalsNonNull of Coordinate.
  // If it could be null, use equals(Position) instead.
  return this.position.equalsNonNull(o.position);
// Now equals for Coordinate objects:

// Most generic case.
public final boolean equals(Object o)
  if (this == o) return true;
  if (o == null || !(o instanceof Coordinate)) return false;
  return this.equalsNonNull((Coordinate)o);
// Equals for when we know we're comparing coordinates.
public final boolean equals(Coordinate o)
  if ( this == 0 ) return true;
  if (o == null) return false;
  return this.equalsNonNull(o);
// Most reduced scenario, we know its a Coordinate and that isnt null.
public final boolean equalsNonNull(Coordinate o)
  return (this.x == o.x && this.y == o.y);


And use them where they're needed given assumptions in the surrounding code (ie, you don't always need the generic equals(Object), sometimes you know you're comparing Node/Coordinate, or that something can't be null).


Also, reduce your objects, don't use "Coordinate" that only has two fields. Simply put the x/y on the Node or something.


For each object you have 12 bytes of overhead, and each object access is more or less a pointer indirection (unless HotSpot can work some magic there too). So a typical "Position2D" object would look in memory like this:


12 bytes overhead

+ 4 bytes x coord

+ 4 bytes y coord

+ 4 bytes for 8 byte alignment (everything is 8 byte aligned).


Total of 24 bytes and one mandatory indirection for only 8 bytes of useful data. So flatten a bit your structures. Also look up alternative HashMap and ArrayList implementations, like from fastutil's. 


Can't help with the algorithmic complexity though biggrin.png

#5266369 good approach to randomly generate the contents of a dungeon or room

Posted by on 14 December 2015 - 05:35 PM



... hehehehehehehe



#5266055 Cross Platforming: Switching to Java?

Posted by on 12 December 2015 - 01:43 PM

Oh, well, but is not maintaining Java on multiple platforms way easier?
I might get downvoted but I'm going to say yes. Library linking/loading is standard and behaves the same in all the OSes that (desktop) Java supports.


Now the issue is that desktop Java isn't the same as Android Java. Yes, with desktop Java you get fairly simple multi platform support in all major OSes, with deployment of the application itself as simple as copying exactly the same .jars in all of them. LWJGL is well made, it will automatically load the native lib of the platform you're running the application on (for all the combinations between Linux, Windows, OSX, x86, x86_64).


But Android is different, you will have to code specific parts for it (input, display, sound, etc), moreover, you will need to "downgrade" your language support for whatever Java 6/7 bastard Android supports nowadays. iOS was supported through RoboVM for free, but Xamarin bought the company and moved it to their kind of strategy (ie, gotta pay up monthly). Same scenario if you want to use C#.


Also while you can reasonably expect the runtime of any OS to run your C++ program (or at worst you need to bundle some tiny binary, say a MSVS2015 redistributable), with Java you need to either bundle a 40-60Mb VM (not as complex as it sounds though) or provide a link for the user to download the VM from Oracle's site (and remember, Oracle bundles crapware with their JRE isntallers). Moreover, the "executable" itself might be multi platform, but it wont get you an OSX installer, Windows installer or a .deb package. That part you have to do on your own, probably regardless of the language you're using.


I still think its an scenario better than what you're left with C++, there are plenty of parts of the standard lib that are the same across desktop and Android, deployment is simpler albeit heavier, and ultimately, Java is a much simpler language to manage than C++.

#5265828 ECS: Systems operating on different component types.

Posted by on 10 December 2015 - 10:27 PM

Thats an issue I often find myself trying to solve.


See the thing is, you totally can (and I've seen it) create a "messaging" api between systems. Ie, one system fires up an event that goes straight to another system. Which would be the case of the "activate" event for example.


Thing is, I feel like event/messaging api is bolted on. The system "should" write data into an entity's component, and the system that handles the response should fetch the data when its their turn. As you described, this isn't something simple to do in many cases.


The specific issue here is when this happens. With event systems you need to figure it out because its very important, imagine that if the physics system casts a ray and fires an event, the receiver does something with it, but it probably will be a system that also iterates over its entities. If you make a single event system, they might get processed at the same time, if you do it per system ,you could make a queue that only gets processed when that system is processed, or you could try to do without events and just write data, read data, and make it part of the normal entity-component iteration.


Order of system processing and system inter-dependencies are very important for getting consistent results, finding bottlenecks and possibly multi threading some of it.


Its still one of the things I'm not decided on, thus why I haven't added an event api on top of dustArtemis.


One idea I had for this kind of problem is to have one system in charge of maintaining a needed spatial structure. When entities get added/removed, system iterates over the affected entities and maintains its spatial structure.


Different systems can hold a reference to that system, and issue queries to it. So the system thats in charge of activating stuff (SystemA) can query "well, for this direction and reach, SystemB, give me what I am hitting". Thats a data query, and means that SystemA depends on SystemB to work on, that means that SystemA has to be executed after SystemB has updated its internal spatial structure for that frame. That makes dependencies obvious.


Then it would be a matter of:


for (entity in entitiesBeingHitted)
  if (entity.hasComponent(ActivableBehavior)


Then the system in charge of activated stuff can iterate over the activated entities in that frame and (possibly) execute trigger scripts or something, which in turn could plug other events in other subsystems. In this case I think adding/removing components to entities, and whatever consequence that has in the engine, should be a fast operation for this to work properly.


It might get overly complex, and maybe a straightforward but well defined messaging api between systems is better, as I said, I'm evaluating my options.

#5265494 Convert triangle list to a triangle strip?

Posted by on 08 December 2015 - 03:52 PM

I know of two tools, NVTriStrip, from nVidia, and another one that tried to improve on it.


Here you can read about both of them: http://users.telenet.be/tfautre/softdev/tristripper/vs_nvtristrip.htm


TriStripper http://users.telenet.be/tfautre/softdev/tristripper/


NVTriStrip http://www.nvidia.com/object/nvtristrip_library.html

#5265484 entity system implementation

Posted by on 08 December 2015 - 03:09 PM

Sure that would work. I can't think of any occurrences where i've wanted to see if it has a specific component without actually wanting to do something with it though, not personally.


The way I do it I use those bits to "filter" entities per system.


Each system maintains a list of the entity IDs it can process. For this to happen, all systems get notified when an entity is added, removed, or any of its components swapped. Now each system needs a fast and sure way to know if the entity has the components required for the system to be "interested" in it. That's where the bit set comes in, all you need to do is:


if (entityBits & systemBits) == systemBits)
  // keep entity.
  // remove entity.


Its cheap enough that the "performance concern" there is just maintaining the entity list ordered, not checking the entities themselves.


EDIT: Also if it wasn't clear, this way the systems never have to check if an entity has XYZ component in the main processing loop (ie, foreach entity doStuff(entity) ). They just straight use them since validation was made on the added/removed/changed events at the start of the frame.


You can find many of those "tricks" in my repo https://github.com/dustContributor/dustArtemis


The interesting parts are EntitySystem, Aspect, ComponentManager and EntityManager. Rest you can figure out from following those.


Thats Java, so there are more things you can do if you use a language that allows you to flatten objects or use sequentially stored structs.

#5264179 Instancing, and the various ways to supply per-instance data

Posted by on 29 November 2015 - 08:15 PM

It seems like this could alleviate restrictions with attrib locations. However, you are limited by GL_MAX_UNIFORM_BLOCK_SIZE

Yes you are... but not totally.


Your limit is specifically for the amount of memory it can be bound at that particular UBO slot. AFAIK 64kB for everything except AMD these days. The trick is that the buffer itself can be bigger, much bigger.


Taking in account that the expensive part here often is uploading the data (many tiny uploads == bad), you can allocate something like 1Mb, upload all your data there with a couple calls at most (ideally only one), then just call glBindBufferRange in-between drawcalls.


Not sure how AMD handles glBindBufferRange, but nVidia, in their presentations, said that glBindBufferRange its a super cheap call to make, and since GCN seems to have the upper hand on memory management, probably its cheap to do with AMD too. Adding to that, since you already pre-uploaded all (or most) of your data to the big buffer, the state changes in between the draw calls should be minimal.


EDIT: The memory you can bind to an UBO slot can be as low as 16kB actually. 64kB is a common limit, but not the one that the spec defines as minimum. I recall one Intel forum post that  asked why in D3D11 you can have 64kB for a constant buffer but in OpenGL you only had 16kB, only then the Intel driver team incremented the limit to 32kB. That was for Windows drivers, Linux Intel drivers apparently tend to have better support.

#5264138 Vulkan is Next-Gen OpenGL

Posted by on 29 November 2015 - 12:48 PM


All the current information I know is under NDA, at best I can say 'there is movement' smile.png


I'll sign a NDA if you'll tell me what you know ;D



Khronos membership is like 15k USD a year. Then you can sign the NDA :P