Jump to content

  • Log In with Google      Sign In   
  • Create Account

Juliean

Member Since 29 Jun 2010
Offline Last Active Yesterday, 07:59 PM

#5301211 Finding that balance between optimization and legibility.

Posted by Juliean on 18 July 2016 - 08:30 AM

2) Eliminate all branches (even if it means more calculations)

 

Except when the calculations actually outweight the cost of a mispredicted branch... right? I'm not sure on the details, but shouldn't this misprediction cost be something like ~100 cycles on modern desktop CPUs? So if you can skip calculations that take significantly longer than that, a branch is the better choice.

 

Also on desktop, branches that are easy to predict have very little cost. Like something that checks for an memory allocation error that when thrown will terminate the program, the branch will always be false anyways so the only cost should be the branching operation itself. Thats different on consoles where there is no branch prediction (I don't think the current generation added it, did they?), but I didn't program on consoles myself so far so I can't say much about it.




#5300597 Overall Strategy For Move-Semantics? [C++11]

Posted by Juliean on 13 July 2016 - 02:25 PM

Hello,

 

so something I've been wondering for a while. How do you generally account for move-semantics/rvalue-references, overall? What I mean is, in order to take advantage of move-semantics, I've generally been doing the following since now:

 

1) Make all classes have move-ctors, where possibly and advantageous. Since I'm making havy use of STL wrappers, most classes have functioning default-move-ctors already, with the exceptions being classes making use of std::unique_ptr and the latter.

 

2) When I know that a function will take an object that is expensive to copy, but I know that its only going to be a temporary in all forseably use cases (ie. a local function that is only called for one-time object initialization/serialization), I declarte it as &&, like this:

class MyClass
{
    using MyVector = std::vector<ComplexClass>;
    void Function(MyVector&& vData)
    {
        m_vData = std::move(vData);
    }

private:

    MyVector m_vVector;
}

However, things get a bit more complicated when I cannot foresee how an variable is to be used, or if I know I will call it with temporaries, as well as have to pass in fixed data members that cannot be moved anyways.

 

So to all you fellow C++11-users, how dou you take care of this? I bascially see 4 options:

 

1) Do not account for it at all. Just use

void Function(const MyVector& vData)
{
    m_vData = vData;
}

like you would've done without C++11 and move semantics, and take the additional copies, memory allocations and deletions. It doesn't matter in most cases (so we're basically in premature optimization land), and/or maybe the compiler can figure this out for himself (which I doubt in any case where the compiler will not inline the function, but I'm by far no expert).

 

2) Write both version for move and no-move operations:

void Function(const MyVector& vData)
{
    m_vData = vData;
}

void Function(MyVector&& vData)
{
    m_vData = std::move(vData);
}

Obviously this will take care of both use cases, but it will require additional work for every function that benefits for move-semantics, and produces code duplication for non-trivial functions. Also it gets really messy once there are multiple parameters that could have move semantics.

 

3) Write only the move-semantic version

void Function(MyVector&& vData)
{
    m_vData = std::move(vData);
}

and when calling the function with a non-temporary, explicitely create a temporary:

const MyVector vDataFromSomewhere;
object.Function(MyVector(vDataFromSomewhere));

While this is just as efficient for this case than before (the temporary I create will be moved in), it requires additional typing for every non-temporary I pass in (so I now have to specify how I want to pass it in via eigther temporary ctor or std::move for every paramter that there is, ugh).

 

4) Now the next option was pretty much what sparked the question. I found out that I could do this:

void Function(MyVector vData)
{
    m_vData = std::move(vData);
}

MyVector vTemporaryData;
object.Function(std::move(vTemporaryData));

Which will move vTemporaryData to vData, and then vData to m_vData. So this means that for non-temporaries, I do not have to make additional typing, and it should also be equally efficient. For temporaries, it should also work like with MyVector&&, though in both cases there is an additional move-ctor call that would otherwise be evaded (though I imagine the compiler could be able to optimize this out, plus an move-ctor call is really nothing compared to copying a vector of 1000 elements).

 

________________________________________________________________________

 

So now I know this is not the most important problem in the world and I probably shouldn't worry about it, but its just something that I found interesting and wanted some opinions/real world stories on. Do you account for move-semantics in your code (if you generally use C++11, of course). If so, do you use any of the four options I presented, or do you use it on per-case basis like I used to do before (or maybe is there something completely different that I didn't see like option 4 for the longest time)? 

 

I find option 4 the best in this regard, though it has something weird and unusual, to now suddently be passing in all the expensive objects per value instead of reference... though I guess the same applied to before I found out that I could savely return stuff like MyVector from functions due to RVO/move semantics.




#5300582 How to automate texture deleting?

Posted by Juliean on 13 July 2016 - 12:36 PM

By the way, I have the 4th edition of the C++ Programming Language by the Bjarne guy, which covers c++11, and I haven't even started it. Should I read it or should I search for something that covers C++14. What do you think?

 

C++11 is fine enough. Before starting c++14, you need C++11 anyways, because 14 is in most parts a mere addition to 11, and 11 has way more "ground-breaking" and important changes than 14. So just learn C++11 and see how you can apply this to your everyday work, and then figure out what C++14 adds on top of that.




#5300511 How to automate texture deleting?

Posted by Juliean on 13 July 2016 - 06:44 AM

Second question: I have 10-20 textures, for every enemy, and i'm wondering, is it better to load all the textures in one place/function ( loadMedia() for example ), or load every texture in its own enemy class, depending on the enemy?

 

Definately load them in one place. Enemies should only reference a texture and neigther own nor even load it themselves. A texture should just be an attribute and not functionality of an enemy class, so it could look something like this:

class Enemy
{
public:

    SDL_Texture* pTexture; // I'm not familiar with SDL so thats just pseudo-code
}

where you set pTexture when creating an enemy. There are lots of things that you can do to improve upon this design, but it should be a good starting ground.




#5300449 Getting address of an item in a std::vector

Posted by Juliean on 12 July 2016 - 04:28 PM

... it will cause a heap corruption. Which I am guessing is caused by 'data' being in the wrong place, created by the class member 'data_set()'.

 

Are you using a C++11 conformant compiler, in case of MSVC at least Version 2015 or above? If not, than your problem most likely has to do with your "asset" class not having an explicit copy-constructor.

 

So what would happen here is that if you push a second asset to the vector, it will have to create a different memory buffer and copy the asset from the old vector to it. Since you don't have a copy constructor, it will perform a shallow copy, essentially just setting the memblock pointer to the value of the old asset instance. Then thiss instance is deleted, calling "delete memblock" in the destructor (note: if(p) delete p is not required, delete on nullptr is a valid operation). Anyways, now the memory "memblock" is pointing too is already deleted, but there is a second instance referencing the same memory, namely in the copied instance! And once this is deleted (ie. when you add another asset, or when vectorAsset-variable is deleted, it will call the Asset-destructor, which will try to delete the now-already-invalid memory region from before, via the memblock-member.

 

The reason I mentioned C++11 is because with this, there is a move-ctor which can also be implicitely generated (in visual studio only from 2015 onwards), and this problem would not happen with that. So if you are already using such a setup, your issue lies elsewhere. In any way, this is something you should adress, because it is almost certain to create problems at some point, ie. if you forget to take a reference to an asset and instead take a value-copy by accident. What you should do is eigther of those:

 

1) Create a custom copy-constructor, that makes a deep copy of the memory block that is owned by the asset.

2) Use a wrapper like std::string which will automatically take care of that without you having to write a custom copy-ctor

3) Disallow copying of the asset altogether (by making copy-ctor private or deleting it via delete modifier), though this will require you to use C++11 and use a move-ctor instead

 

Option 3 would be preferable, though if you require assets to be copyable at some point you will have to use eigther 1 or 2 anyways. Having a move-ctor in C++11 really is a bonus though, because otherwise adding any single asset will have a huge overhead if it requires to copy all existing assets via deep copy.




#5300276 Vec4D: SSE-ASM, SSE-INTRINSICS, NORMAL

Posted by Juliean on 11 July 2016 - 06:33 PM

In what way? Obviously there'd be the extra instruction to move the value out of the register, but what other performance do you lose? It probably will create a dependency, but OTOH if you used the original value there'd be a dependency anyway. Maybe I'm missing something?

 

You will trigger a very, very costly Load-Hit-Store, which is especially expensive on consoles, but is also far from trivial on desktop CPUs. The same happens for conversion from int to float too, and to put it short in case of a TL;DR; from the article: You mess up your CPUs pipelining by doing so, which is way worse than an additional move-instruction.




#5300270 Spatial Partitioning in a Hyper Light Drifter Kind of Game (Question)

Posted by Juliean on 11 July 2016 - 06:03 PM

Yup, size is the important factor here. Lets say you wanted to make exactly hyper light drifter. I'll just go all out and say:

 

A simple grid will be enough. By the amount of interactive NPCs per screen, a simple O(n^2) "check every NPC with every other NPC" ie. for collision detection will be entirely fast enough for this game (since from what I've seen there is like a max of ~10 moving NPCs active at once, mostly enemies, and until you get to multiple hundreds or maybe thousands of items, n^2-algorithms will generally be fast enough).

 

As long as you don't forsee having something like 10000 NPCs that all need full movement with collision detection at one time, using something like quadtrees will not gain you much - yes, you will increase performance, but at a factor that doesn't matter. You have 16.66 ms that you can fill. Why would you not chose the faster algorithm though? Well of the added complexity, which will take development time that could be spent on more important tasks.

 

(Thinking about it, for a small number of "entities" the simple each-vs-every style algorithms might actually even be faster than having to update and query a compliated data structure like quadtrees and the like, due to cache locality)

 

I'd also like to know how should I store the map. For instance, if there is a zone that is 500x500 tiles, should I separate the map in different smaller "zones"? Would that increase performance?

 

Why, of course this won't magically increase performance, it depends on what you do with those zones. For example, with rendering a uniform size tilemap, you can easily render any size of tilemap by looking just at as many tiles as the screen can fit, like with this pseudo-code:

const Vector2 startTile = cameraPos / TILE_SIZE;
const Vector2 numTilesToLookAt = screenSize / TILE_SIZE;
const Vector2 endTile = startTile + numTilesToLookAt;

for(int i = startTile.x; i < endTile.x; i++)
{
    for(int j = startTile.y; j < endTile.y; j++)
    {
         vTilemap[i][j];
    }
}

This will run equally fast for a 100*100 tilemap as with a 100000*10000000 tilemap, so you gain nothing by dividing the tilemap into zones. In fact, you might make matters worse as you will now have to handle drawing from multiple zones instead of being able to just access from the one zone vector.

I would probably only do this separation if you have to, like if your tilemap is so huge without transitions, that you cannot possibly fit it into memory at once - or if it is huge and you have to append/remove huge blocks at once at runtime. So not the common case in what I can see.




#5299917 Finding that balance between optimization and legibility.

Posted by Juliean on 09 July 2016 - 05:25 PM

So regardless of how you calculated what your cache hit rate is supposed to be, did you actually measure it?

 

Second, your function has whole bunch of if-conditions, some of which are easy to predict (idx == -1), and other that aren't.

for( i = 0; i < 8; i++ )
    if( rads == rads_cache[i] )
      break;
  if( i == 8 )
  {
    i = idx;
    rads_cache[i] = rads;
    cosf_cache[i] = cosf( rads );
    sinf_cache[i] = sinf( rads );
    idx = ( idx + 1 ) % 8;
  }

While you need to measure this, at least "if( rads == rads_cache[i] )" is going to be very hard if not impossible to predict. (the loop might be able to unroll and if not, be easier to predict but still no quarantee, as well as for the i == 8). So if your use case is really "cos/sin" being called exactly 3 times, than there has got to be a cleaner solution. If its really just random calls which sometimes share the same value, having an unpredictable branch will cost you more than calling sinf/cosf multiple times. As always, make sure to actually profile, but in general, the times where singular trigonometric functions are worth caching are over (especially at the cost of a branch). If you can cache a whole set of calcualtions/trig functions, sure, but if its just 3x sinf+cos vs. sinf+cos+branch, I'd say generally take the firmer (and don't forget to actually measure, to see what is actually faster, let alone if this even makes a measurable difference).




#5298500 [MSVC] Why does SDL initialize member variables?

Posted by Juliean on 29 June 2016 - 03:33 AM

I think you're missing the point that the "S" in "SDL" stands for "Security".

 

It's nothing to do with assisting debugging, it's nothing to do with spotting non-security issues in your code.

 

So in the case of a pointer it's either initialized or it's not, and if it's not initialized then it's going to be pointing at some memory address that's effectively random.  Hence security: a 3rd party could use your unitialized pointer to gain access to some other arbitrary data in your process address space.

 

Truth be told I really didn't get that security means this kind of security, lol. I just thought security mean that it makes the programm less prone to crashing, but seeing how it also incorporates the /GS switch (for protecting against buffer overruns) it makes perfect sense.

 

So seems that is my answer right there. Seems that in this case you will obviously want to have SDL turned on all the time in all builds, because there is no point just protecting your developement/debug build. In this case I think I'll just turn it off altogether, as of now I don't require the additional security (I dare you to hack my offline games :D ) and I certainly don't want the overhead of additional checks for things that I could have easily found using something like cppcheck.




#5298383 [MSVC] Why does SDL initialize member variables?

Posted by Juliean on 28 June 2016 - 07:54 AM

Well, if you have such an error than it's safer to have it on nullptr than some random value. The random value might be one you are allowed to access. nullptr reliably crashes.

 

True, for the case where the pointer might never be null anyways its safer, but in the case where nullptr is an legit value like in my example, it can lead to bugs that otherwise would not occur. Especially if its off for shipping builds, which I always asumed is normal with SDL.

 

This seems vague -- does it check for uninitialized members, or zero them, or both? If it generates a breakpoint when a debugger is attached, but also zero-fills if no debugger is attached, then it's arguably helpful -- helps find the bug and mitigates its harm in the wild.

 

Well further in on they say that it zero-fills the values before running the constructor, so it will probably just fill the entire memory-range of the class with 0s. Thus it doesn't produce a breakpoint in debugger but really just simply masks away all your unitialized pointers, integers, etc... as if you set them to 0 yourself.

 

But yes, generally it's better for your compiler/runtime to initialize members to something like 0xcdcdcdcd by default, as that's likely to cause your code to crash, which lets you find the bug.

 

Well that would actually be useful as it won't just do some random stuff with the unitialized pointer while also showing the crash. I'm pretty sure there is a compiler switch or something that does that, so the zeroing out really mystifies me even more.




#5298308 Is using one the switch statement better then using multiple if statements?

Posted by Juliean on 27 June 2016 - 05:36 PM

First, you misslead the readers on what switch is. Since switch is named so improper, switch is not a switch, unless there is break keyword instruction in every sub-block of condition .
What switch does is that it evaluates all conditions in their order and executes their subblock if they are true. Logicily switch is equal to this

{// the escape block

 if (A){}

if(B){}

if ©{}

{} // the default:

}

 

What? Thats not even close to true. A switch without break will jump to the matching label, an execute all labels after that.

 

So this switch

switch(c)
{
case 'A:
	A();
case 'B':
	B();
case 'B:
	C();
default:
	D();
}

Is not even close to equal to what you have posted, but really if you want some code that is executonally equivalent than this is it:

if(c == 'a')
{
	A();
	B();
	C();
	D();
}
else if(c =='b')
{
	B();
	C();
	D();
}
else if(c == 'c')
{
	C();
	D();
}
else
{
	D();
}

But really that is quite pointless, I think we all can agree that if talking about switch we are talking about switch with "break" labels. Not having break from AFAIK is mostly bugs, or shorthands for reducing duplicate code if blocks are shared:

switch(direction)
{
case UP:
case DOWN:
     // y-movement
     break;
case 1:
case 2:
     // x-movement
     break;
}

@Servant:

 

If you are using switch() statements for micro-optimizations, there are other tricks to be aware of also; putting your more-frequently used branches closer to the top of the switch() supposedly helps, though I've never tried it.

 

 

Does it? I've heard that this can help for regular if/else-chains (on said "dumb" compilers), but from switch all I've heard is that you should put labels in order, like always go 0 on top to 10 on bottom, best case without holes (so 0..10 or 10...0 is optimal, "10, 5, 9, 1" not so much). Reason I've read is that this makes it much easier for calculating where to jump to for the compiler, though I belive any compiler where if/else can be compiled to same assembly as switch should be able to figure out ordering on their own.




#5298087 Is using one the switch statement better then using multiple if statements?

Posted by Juliean on 26 June 2016 - 03:16 AM

 

Better how? Switch-statements can be way faster then if/else-chains, specifically if there are many items to check for.
 
The compiler is free to optimize the switch-statement as he pleases.


The compiler is free to optimize almost anything as it pleases. There is absolutely nothing whatsoever that specially allows a switch statement to be better optimized than if-statements.

Older compilers _wouldn't_ optimize the if-statements as well, but that was a technology limitation. The switch statement is easier to recognize as what it is, while slightly more clever analysis is required to determine that a series of if-statements can be similarly optimized.

Modern compiles are not so stupid, however. Example: https://godbolt.org/g/i5NBKH Note that foo() and bar() are both compiled into jump tables despite one using a switch statement and the other an if-statement chain.

 

 

Thats neat, didn't know that, yet as been said MSVC isn't that clever (thats where my assumption originally came from).

 

There appears to be one case for a switch being faster/less assembly even on clang though, that is if you can tell the compiler that the default-case will never be hit (by replacing return 0 with __builtin_assume(0)). This gets rid of four lines of assembly in your example. Doing the same with if/else-chains doesn't work, and so I quess if you are working on a real tight loop that switches on a value within a known range (like an enum class), than this can still be a point for switch (unless there is a way to get clang to emit the same assembly for the if/else-chain).




#5298047 Is using one the switch statement better then using multiple if statements?

Posted by Juliean on 25 June 2016 - 05:59 PM

Better how? Switch-statements can be way faster then if/else-chains, specifically if there are many items to check for.

 

The compiler is free to optimize the switch-statement as he pleases. For a small number of items, the compiler might just produce an if/else-chain in the background. However, if you have like say 20-30 items, then the switch can ie. be converted to a table-lookup, which is way cheaper than having to check perform up to 30 if-conditions.

 

A reason for using multiple if/elses is mostly if you cannot use a switch, ie. if you are comparing strings, or other non-literal types.

 

But keep in mind that unless you are calling that block of code many many times per frame, this won't matter at all. At this level, always prefer code clearity over performance. So if you are asking if switch-statements are better in this regard: Largely depends. For many items, switch can easily be more readable, and safer (since you cannot accidentially check for the same item multiple times).

However for just 1 or two items, if/else tends to be more readeably, if not for being shorter. Consider this:

if(x == 2)
    foo();
else
    bar();

// switch is moar awesome!

switch(x)
{
case 2:
    foo();
    break;
default:
    bar();
    break;
}

So to sum up, both have their uses. Worry only about efficiency if you know you are working on performance critical code and/or  if you did proper benchmarking. Otherwise, choose whichever is more appropriate to the situation in terms of readability and clearity - just don't got around and throw switch-statements at every single if-statement and you should be fine :)




#5296677 Batching draws - keeping buffers up to date

Posted by Juliean on 15 June 2016 - 10:12 AM

The problem that you have is even if only 10% of your data actually changes, its not quaranteed that this data is in the same area in memory. You could have 100k sprites, and 10k are dynamic, but those 10k are scattered around the buffer so you actually require 10k bufferSubData-calls for updating it (which is pretty bad compared to just one update that discards the entire last buffer).

 

But if you have static vertex data that can be identified or marked as static, than you can put those into a separate buffer that isn't updated on a per-frame basis, only when needed. You than have a separate buffer that is refilled on each frame, and you draw from both of them.




#5296490 How to avoid Singletons/global variables

Posted by Juliean on 14 June 2016 - 09:30 AM

There's only know one way to not use slot of not use global variable , but it does not make sense to me

 

Event in your second example, you use a global variable, you just moved it to the other class. A common way to avoid this is by doing depenceny inversion injection (oups), by passing in your texture manager to the texture class:

class Texture
{
public:
   Texture(string a_name, TextureManager& textures){
         m_name = a_name;
         m_ID = textures.GenTexture(a_name);
   }
}

Now you have TextureManager as a regular non-static class somewhere, and pass its instance in whenever needed.

 

Or better yet, remove the depenceny of Texture<-TextureManager completely, and have your texture just take an int, while having the TextureManager have a CreateTexture-method:

class Texture
{
public:
   Texture(string a_name, int ID){
         m_name = a_name;
         m_ID = textures.GenTexture(a_name);
   }
}

//Texture Manager.h
class TextureManager
{
 public:

    Texture& CreateTexture(string a_name)
    {
        for(int i =0 ; i<m_textures ; i++)
        {
           if(m_textures[i].name == a_name)
           {
             return m_textures[i].Texture;
           }
        }
        int id = OPENGL_genTexture(a_name);
        
        auto texture = new Texture(a_name, id);
        m_textures.Add(a_name , texture);
        
        return texture;
    }
    
private:

 //this store the name of the texture and the ID of the texture
 Map<string, Texture*> m_textures;
}

Note that I've also changed the TextureManager to store a "texture" pointer instead of just the id. I'm not sure whether your Texture-class is actually representing an handle? In this case, this change doesn't make much sense, but on the other hand it would be very bad to store a string inside an handle since this makes passing it around/storing it very costly. So eigther create one Texture-object, store it in the map and return it on subsequent calls. Or remove anything but the ID from the texture class, so you can use it like a primitive type if you plan on using an approach like that.






PARTNERS