Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!

1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


Member Since 18 Jan 2008
Offline Last Active Yesterday, 03:17 PM

#5207139 Nice fast XML parsers with validation?

Posted by samoth on 28 January 2015 - 04:58 AM

I think that "validation" and either "nice" or "fast" are mutually exclusive.


Do you really need validation in that part, though? Since you are in control of generating the XML (UI layout editor), you should be able to ensure that the document is created in a well-formed and, well... valid (according to your DTD/XSD) way.


If you are a little paranoid (that is, you don't trust yourself writing a correct generator), you can use any not-so-pretty-not-so-fast parser inside the UI layout editor to read in the document immediately after writing it out.

If that passes, then your nice-and-fast non-validating parser that you use everywhere and for which you've already written tons of code (code that works and has been tested) will be mighty fine to read it (after all, the document is valid and that doesn't change, so there's no need to check that over and over again).

If it doesn't pass, you need to fix the layout editor until valid documents come out, but there's no point validating the known-to-be-broken document on the user end anyway.

#5206731 OO where do entity type definitions go?

Posted by samoth on 26 January 2015 - 11:52 AM

Generally if the only difference between the enemies is their statistics and models, you shouldn't need different classes. Polymorphism in this context is a tool for implementing different behaviours, not different properties.




I would, as a very simplified example, do something like:

struct monster
    const char* name;
    int hp;
    int ac;
    int damage;
    script* special_attack;
    // ...
    // mesh* model;

struct instance
    monster* type;
    int current_hp;

    instance(monster* type_in) : type(type_in), current_hp(type_in.hp) {}
monster orc = { "orc", 60, 20, 20, nullptr /*orcs have no special attack*/};
monster elf = { "elf", 20,40, 20, &lightning_coming_outa_my_butt };
monster goblin = { "goblin", 5, 5, 5, &cowardish_retreat};

That way, the 753213 orcs in your game that are "just orcs" all use the same template for data (and with the compiler merging string constants they all use the exact same memory for the name, too), whereas a special "hero" monster can have its own stats but can still be based on the stock template (make a copy, add some ability modifiers, and assign a name).


As a refinement, you'll probably want to load the monsters from a datafile rather than hardcoding.

#5206672 (Hex)Shape-Pattern Shading?

Posted by samoth on 26 January 2015 - 03:57 AM

You mean something like this?

#5206003 What's a good way to play random environment sounds?

Posted by samoth on 22 January 2015 - 10:10 AM

An alternative, if you want something more regular with no possibility of clusters, but still "kind of random" may be to initialize your counter to a configurable value plus random (for example, something like 1000 + random(200)). This idea is stolen from an old Gamasutra article on random tree placement.


Yet another approach is described in Not so random random, the basic idea is trying to avoid events that "never" occur and events that occur too often by weighting the threshold.

#5203869 How unreliable is UDP?

Posted by samoth on 13 January 2015 - 03:53 AM

Lets just ignore fragmentation and assume everything you send is under the old 512 byte (or 3kb according to who you ask or where) limitation of IP/UDP.
This depends on the protocol version, not so much whom you ask smile.png

IPv4 devices must be able to handle 576 byte datagrams whereas IPv6 devices must be able to handle a minimum of 1280 bytes.

Though even if you use IPv4, your devices and all intermediate routers are almost certainly IPv6-capable devices anyway (unless you're in a third or fourth world country and using a 15 year old router). Virtually the entire internet backbone has been IPv6 for years, so 1280 byte datagrams are a pretty safe bet.


Have you reinvented TCP? Not hardly. TCP does way more (like flow control, fragmentation, sessions, etc).

Flow control is something that you most likely have to implement as well, since flow control is the one thing that prevents packet loss.


This experiment shows the very problem:

When sending small packets with at least one ms sleep in between, 100% of all packets were received and in order. As soon as I dropped that sleep, however, I saw something like 15-20% packet losses

Here, the sleep was a very basic "congestion control". You can in theory push 8Gbit/s to your network card over PCIe, but the cable will only take 1Gbit. What is the network card supposed to do? It can buffer a few datagrams, and then it will have to drop the rest until the buffered ones are sent. This is why congestion control is needed.


It is very similar when there are half a dozen routers in between, only then you're not the only one sending datagrams.

#5202210 Are Subroutines Efficient?

Posted by samoth on 06 January 2015 - 06:40 AM

Whether subroutines are efficient or not depends on what you use them for. In your case, the answer will be: No.


A subroutine necessarily has a certain (small) overhead within the shader because it is the equivalent of calling a function via a function pointer obtained from an uniform variable. This means it involves (at least) one additional instruction which must make an access to constant memory on the GPU. It also means that the optimizer cannot inline or optimize through the function call (The fact that an uniform decides where the branch goes, on the other hand, eliminates the overhead on divergent branches that a normal dynamic branch would have. Uniforms are the same for the entire draw call, so there is no such thing as a divergent subroutine call.)


On the other hand, switching shaders is one of the most expensive state changes (only topped by switching render targets) whereas uniform updates are the cheapest possible state changes.


That means if you can use subroutines so you have 2-3 fewer shader switches per frame, you win. (Let's say you need to draw three different kinds of objects at three levels of detail, all of which use a somewhat different lighting formula). Yes, the shader will run very slightly slower, but overall you win because you avoid expensive state changes.


On the other hand, if you use subroutines for something like toggling specular lighting or normal maps on/off because the user changed a config switch, you lose. This is something that doesn't happen three times per frame, it happens once or twice during the whole programs execution, if that. Using #ifdef here is perfectly adaequate. It doesn't really matter whether it takes half a second to recompile the shader either. Nobody will notice, you just switch shaders when they're ready.

#5200745 glDrawElementsInstanced() vs glDrawElementsIndirect()

Posted by samoth on 30 December 2014 - 04:09 AM

glDrawElementsIndirect() appears to be a wrapper to glDrawElementsInstanced()
I am unsure what might possibly make you think that.


glDrawElementsInstanced is basically the same as (and, on hardware that doesn't support instancing natively, indeed implemented as)

for(gl_InstanceID = 0; gl_InstanceID < n; ++gl_InstanceID)

... whereas glDrawElementdIndirect is something much more complicated:

  • pull a struct from a buffer
  • interprete the struct to figure out what primitive and count the user wants
  • call glDrawElements accordingly
  • repeat n times

Of  course you could trivially implement glDrawElementsInstanced in terms of glDrawElementdIndirect, but that would not be very memory-efficient (think about someone drawing 100k instances). Implementing indirect draw in terms of instancing, on the other hand side, is something I couldn't imagine. How would that work?

#5200458 What is better? TransformFeedback or OpenCL Kernel?

Posted by samoth on 28 December 2014 - 02:17 PM

OpenCL has the disadvantage that GL/CL synchronization is implemented as a pair of extensions and not very well supported (it's not likely to improve any time soon either, since at least one major IHV has absolutely nothing to gain from making OpenCL work any better). Which means that properly synchronizing is a nuisance.

#5195745 NULL vs nullptr

Posted by samoth on 01 December 2014 - 11:52 AM

nullptr: Pointer with an address of 0
Only by coincidence, formally this is a wrong and possibly dangerous assumption (though admittedly it's nevertheless true on most (all?) current mainstream systems).


nullptr is a null pointer constant convertible to a null pointer value of any type, that is a pointer of type nullptr_t with a size identical to sizeof(void*) with an unspecified value. A zero integral constant can be converted to a null pointer value (but isn't one). A null pointer value compares equal with another null pointer value or a null pointer constant of the same type (it might also compare equal with something else, but nothing is said about that). It can be converted to the bool prvalue false and to an integral type, resulting in the same value as if converting (void*)0 (which is not necessarily 0).

#5195688 30k concurrent players on a (private) MMO server...is this possible ?

Posted by samoth on 01 December 2014 - 07:04 AM

For something round-based, this is pretty trivial, but for something kind of "action, PvP", I think the claim is a bit unreasonable for a single server (withough e.g. doing some P2P trickery -- which is not really "a single server" -- as suggested above).


Here's all the game logic you need
Well yes, almost. Though in reality it needs to be a little bit more complex, and the "little bits" are unluckily quite expensive. Something like:
while((result = system_poll()) != timer_object_id)

for(auto p : player)
    auto newpos = player.pos + player.vel;

    if (!map.blocked(newpos))
    else          // a non-cheating client software should normally
        log(...); // not allow this move in the first place
for(auto c : clients)

In particular, send_updates must figure out what to send to each client. Sending the complete map is out of the question (death by bandwidth) as is sending a fixed small subsection of the map (including tiles with no changes, but possibly excluding tiles outside your area which are however interesting). Deciding what to send is a O(N2) problem, and N2 for large N is not precisely a trivial thing to cope with. Of course you can prune away some of those 30k (zones, spatial hashing, you name it), but

the OP makes explicit mention of "PvP", "siege" and "very few small areas", which suggests that no matter what you do, N will be considerably large. If only 1,000 of those 30,000 fight against each other in each siege, then collision/interest queries are 1 million operations (so, 30 million alltogether, for 30 such sieges). That's without actually creating a response, encrypting it, copying data to the network stack or processing it. 30k calls to send are not entirely free either (assuming most people are actually actively fighting, they will want updates every round!).


Also, system_poll() (that might be epoll_wait or whatever) and actually reading and processing the readied sockets will have considerable overhead which is more than just a few cache misses. The usual assumption is that connecting to 30k clients means most are inactive most of the time anyway, and epoll is O(1) in respect to that. However, it is of course still O(N) in respect of actual stuff that happens. There is no way it could be any different, if N events happen, you must process N events.

For "PvP in constrained areas" that "most are inactive anyway" assumption does not hold -- most are active most of the time.


A single call to any not-totally-trivial kernel function like epoll_wait or recvfrom costs a minimum of 5,000 clocks (I haven't actually measured this to be honest, but I think it's a rather conservative estimate for those two functions -- in fact, receiving a datagram may easily cost 4-5 times as much if you include the kernel overhead for interrupt, going through iptables, reassembly etc, and copying data on the user end), so doing maybe 5k calls to epoll_wait and 20k calls to recvfrom per second might already consume roughly 40 milliseconds, and there you haven't actually done anything useful yet (yes, 40ms does not look like a lot, there are 1,000ms in a second... but mind you, that's just for receiving stuff, not actually doing something yet -- plus you may want more than just one update per second, at 10 ticks per second, you only have 100ms available).


Especially in a PvP game, regular complaints about cheaters will inevitably come, and you will have to deal with them in a manner that satisfies the (anyway never satisfied) audience. Which means that as the most basic thing you need to log pretty much everything, and logging must be kind of reliable and failsafe, and in a format so your customer service / moderators / bot heuristics can easily and efficiently replay, parse, or query data. Which, of course, is possible but not at all a "free" operation.

#5194743 OpenGL Shader cache

Posted by samoth on 26 November 2014 - 04:15 AM

I'm personally not a great fan of separable programs. Yes, the official IHV point of view is "GPUs have worked that way for years anyway" (quoted from some old nVidia presentation), but linking a program gives the optimizer the opportunity to optimize across stages, say what you will.

Insofar, for me it's between #3 and #4 (using #4 but I'm sure #3 would work just fine anyway).



Have a way setup to detect GPU and driver version.
That isn't even necessary, luckily it just works (and since it's a lot of trouble and error-prone, I would like to avoid it).


The command void ProgramBinary( ... ); loads a program object
Loading a program binary may also fail if the implementation determines that
there has been a change in hardware or software configuration
LinkProgram and ProgramBinary both set the program object’s LINK_STATUS to TRUE or FALSE, as queried with GetProgramiv, to reflect success or failure

(Section 7.5 of the OpenGL 4.5 specification)


Also, calling glUseProgram will generate INVALID_OPERATION if loading the binary has failed (section 7.3). Thus, there is no reason to query some obscure non-standardized values from the operating system and try to figure out a particular model and driver combination from these. Luckily  smile.png

#5194408 What heap wil dynamic memory be created on

Posted by samoth on 24 November 2014 - 08:02 AM

No. There is only 1 heap.
The reason some libraries can’t deallocate allocations made by other libraries is because each library puts meta-data before an allocation to tell it how much has been allocated and etc.

That's correct for Linux/Unix. For Windows (which this seems to be about), as a blanket statement, this is wrong (although the part about metadata is correct).


DLLs do not necessarily have their own heap, but they are allowed to create one, and some indeed do. This creates a "private" heap which reserves a contiguous area of address space (explained in the Remarks section here) with a handle different from the one returned by GetProcessHeap.


One notable example is Microsoft's CRT (and hence any library that dynamically links against it), which will lead to problems if you mix statically linked executables with dynamically linked DLLs, as outlined for example here. The bottom line is that the linker is smart enough not to let you mix the two CRTs in the executable, but it is possible (and in principle legitmate, and undetectable for the linker) to have some other DLL linked to a different CRT than the main program, and boom.


Other DLLs may create private heaps because the default heap is the "classic" one, not the "low fragmentation" version on Windows versions up to 7 (I may be lying here, it might be Win7 inclusive, not sure...). For some libraries, it may make sense to request the low fragmentation heap.

SQLite is one example of a library that calls HeapCreate.


In addition to that, every compiler (that includes different versions of the same compiler that use a different ABI, such as e.g. GCC/MingW 4.7 →  4.8) adds some metadata to allocations. When you write something like new or delete in your program, this usually maps to malloc and free, plus calling constructors and destructors (the C++ standard does not say that, so this is in principle not correct, also you could overload operators, but it is correct in practice).

malloc and free, on their part, do some voodoo (usually something like add 4 bytes for the allocation length, and then round up to 8-byte alignment, which is the "real" address they'll return and accept), and call HeapAlloc and HeapFree. Whatever they do is not specified and may vary among compilers.


So, in short, you can't expect that deleting something will work reliably (it might work, accidentially!) if the whole combination of compiler plus module is not 100% identical in respect to how the object was allocated.




So if I create a structure like so:

struct Blob
unsigned char* pData;// std::vector< unsigned char > Data
size_t Length;

(Would the vector dealoccate itself from the right heap?)

The vector will, depending on the situation, call the standard allocator's deallocate function or operator delete, which may in fact do nothing (it might put the memory block on a free list) or which may eventually -- possibly via a roundtrip through free()-- call HeapFree. It will not magically figure out the correct heap, if there are several ones.


By the way, I'm not quite sure what the intent is with that raw pointer, the length, and the vector. In any case, the vector would need to have a "life" in some other place, or the raw pointer could not possibly be valid (if you do something that causes the vector to resize, it won't stay valid either). There's probably an easier and safer way of doing the same thing, such as  using blob = std::vector<unsigned char>; -- vector already knows its length (size) and data() is always a valid pointer.

#5194284 using technology as magic

Posted by samoth on 23 November 2014 - 10:21 AM

That idea with draining life is not too much different from what already exists e.g. in Ryzom, and if you leave out the fact that Ryzom was a total economical failure (several times), it worked quite well. It was even harsher in Ryzom.


Spells (or any action, for that matter) need to be balanced with a "counterweight". That's usually mana for magic and stamina for melee. However, unless you use spells that are way below your level, you do not have enough counterweights in mana alone to cast a spell (and you cannot possibly cast the highest spells even at maximum level). You have the choice to put in "time" as counterweight, or "hp".


"Time" means your spells take a lot longer, and during cast time you will be hit automatically by all but the lowest creatures and your spell fails. In other words, you're dead. So "hp" is the only working solution. Which means, of course, you can fire the biggest badass spells, but you are also much easier to kill.

It's certainly a risk tradeoff, but it isn't necessarily creating frustration, it may very well add to challenge (and very strongly encourages team playing).

#5193416 is there a better way top refer to assets in a game?

Posted by samoth on 18 November 2014 - 06:22 AM

I strongly disagree with anyone recommending integer values or enumerations. They're ugly has hell and they can seriously damage your "flow." What happens if I want my artist and designers to constantly iterate their work? They'll have to get knee deep in my source code just to add a few lines in strange places.

That's why you shouldn't have them in the source at all in my opinion. Neither strings nor integers/enums.


Artist edits the "resource definition file" or whatever you call it, preferrably with a special editor for easier workflow, but in the simplest case that can happen in a text editor, writing out XML or JSON or any other format, even a custom one if you want.


Artist refers to "kaboom.wav" as "explosion_sound" when referencing it from within "grenade". The toolchain packs the whole stuff together into a binary file. That file can contain the strings and you look up assets by strings (but this requires using the equivalent of a map structure at runtime), or the build system will translate "explosion_sound" to, say 51 and "grenade" to, say, 213. If the artist edits the file, it may happen that the numbers are different, but that doesn't matter since only the build system has to worry that the mapping is consistent (that is, if asset #213 references #51 and due to a change #51 becomes #63, then #213 references #63). The application only uses what the datafile provides, it needs not care about consistency.


While it is true that the overhead of hashing a string or even looking up a string in a map is neglegible compared to disk I/O it is also true that this overhead is completely unnecessary. Hashes or IDs can be calculated at compile-time if you insist on having the names hardcoded (but I recommend against that unless you really only have 5 assets), and are otherwise calculated by the build tool.


Most of us are not on systems any more where encoding "filename.mus" in the source code causes too much data in the executable.

But it's not really about the size of that string (nor the overhead).


Artists do not want to, and should not tamper with source files. And you do not want to, nor should you need to recompile the whole program only because the artist decided to add another sound or another sprite. Making the application run is your responsibility -- keep it there. Putting the "art stuff" together is the artist's responsibility -- keep it there, as well. Don't mix the two, and don't mess with something that's not your responsibility. Changing one component should not require rebuilding the other, nor should it possibly make it fail. Saying that hardcoding assets and having artists edit source files is a guarantee for failure would probably be going too far, but you get me. It's something that can break, and things that can break will eventually break.




if i'm drawing 16,000 non-instanced meshes, i don't want to be looking up the array index for the mesh filename of each one

Good grief, who is modelling all these? Surely you mean 160 -- not 16,000?

#5192291 Resource management

Posted by samoth on 11 November 2014 - 12:38 PM

It allows you only to rehydrate a a shared_ptr, which means you're taking an owning interest in the resource, if only temporary. There's a risk, then, of forgetting to release that shared_ptr mistakenly, causing the resource to leak. [...] The main expense of using shared_ptr isn't the size of the control block, or even the fact that its twice as large as a raw pointer -- its that every time you create or destroy a shared_ptr to the same resource you have to jump through hoops to be thread-safe while you change the use_count and potentially destroy the object.

Maybe I don't understand your issue correctly, but isn't that exactly what one would want?


The asset/resource cache/loader/manager/whatever will almost certainly run asynchronously. Which means unless you can register a (temporary) ownership, the cache will pull the chair you're sitting on from beneath your butt, figuratively. You're in the middle of uploading a model's vertices when the "manager" frees the memory to make room for something else it wants to load. Bang, you're dead.


Yes, you can do clever stuff like the resource IDs in that bitsquid article, and they are easier faster, blah blah... but they also are not threadsafe at all. Making that beast threadsafe (lockfree, of course) is a nightmare. Or, you would have to design it so the cache is only allowed to delete objects at end-of-frame or such... not precisely pretty. Or, you would have to do without threads.


Holding shared_ptrs and handing out weak_ptrs avoids all problems one could think of. Yes, incrementing the refcount on the shared_ptr is not a free operation. It's an atomic increment, which is like 6-7 cycles instead of 2. But in the light of a single context switch or even a disk access, that's really no biggie. You don't have ten million assets loaded simultaneously, you maybe have a thousand or so (most likely less). So that's a few thousand atomic increments.


Yes, ownership is somewhat "blurred" because the moment you lock() the weak_ptr you obtain a (temporary) ownership, presumed that the object is still in the cache. But hey, that is just what you want, it is what you need. And, it doesn't impair the cache's ability to do its job. The cache/manager/whatever is still in complete control of what's being kept and what's ditched. It will hold a single shared_ptr of all the objects that it wishes to remain cached, and ditch that one once an object is to be destroyed. And destroyed it will be, but at the right time, when nobody is reading from it -- not randomly somewhere mid-frame. You could call it a kind of "deferred" delete.


The nice thing is, you have the guarantee that it will work, there is no way it could possibly fail (shared_ptr makes sure of that!), and it's something like dozen lines of code. There's no way short of an infinite loop (and then you have a different problem!) you could leak either since the shard_ptr you get from locking the weak_ptr is of automatic storage duration.