Jump to content

  • Log In with Google      Sign In   
  • Create Account

snowmanZOMG

Member Since 29 Aug 2010
Offline Last Active Today, 11:15 AM
-----

#5057628 Draw Call Sorting Using std::map

Posted by snowmanZOMG on 28 April 2013 - 08:44 PM

Sorting data is a huge way to get gains in performance.  This is due to a number of effects, the chief ones being caching and driver or state switching overhead.  When you sort data, you're more likely to hit the instruction cache since you're probably going to be doing similar operations for a large group of data at once.  You may also get data cache hit rate improvements as well.

 

With respect to graphics in particular, the GPU is very intolerant of state switching.  Its entire performance potential is contingent on not having special cases!  Whether this means branching or switching off entire buffers/textures, it doesn't really matter, they all have pretty significant costs on execution speed.  The reasons for this are due to GPU architectures which I'm not very well versed in, but if you have a basic understanding of CPU architectures and pipelines, and extrapolate that to hundreds of simple execution units, then you can get an idea of why the GPU is so intolerant of handling "uncommon cases" (branching, state switching, etc).

 

The article you refer to touches upon many different ideas used by people to improve the performance of game engines; more specifically, rendering speed.  Suppose you weren't sorting your draw calls and instead you just drew every item in your game individually and kicked off a draw call for each item.  Since your items may not be sorted, you may have to set up state just to draw the item.  When you finish with that, you end up going to the next item which could be totally different and requires a different set of state to be enabled.  This is very costly.

 

Remember, the GPU wants to be fed HUGE batches of data at once for it to munch on.  It's completely against what the GPU wants for you to process data and feed it in this manner.  When you sort the data/state, you can give it larger batches for it to draw.  Because you're switching state much less often and you're giving it larger amounts of data to process on each draw, it should lead to higher performance during rendering.

 

Additionally, it may be required for you to sort to even get a correct result.  Transparency requires you to draw back to front.  If you violate this, you get an image that does not look correct.  There are many ways people solve this problem, but they pretty much all revolve around doing some kind of sort to get the primitives fed to the GPU in the right order for alpha blending.

 

Now, to your main question:

 

Do people use std::map for sorting?  Almost never.  Not if you care about performance.  STL map is incredibly slow.  Its slowness can be attributed to the nature of the data structure: pointer based tree.  Almost certainly, the STL is allocating a new node on the heap for every single element.  You have no real guarantees on where that allocation is in the memory space.  This leads to poor cache behavior.  Additionally, each node contains a lot of extra linkage information just to make the tree work, so it's pretty bad in memory use as well.

 

If you're only interested the data and its sorted version, there is no reason for you to use the map.  Just throw it into an array and then sort the array.  This will be significantly faster.  Your example of 2000 items is just far too small to be able to see any difference given the resolution of your timer.

 

Here's an example piece of code I wrote to illustrate (requires C++11 support, I use g++ 4.8.0 to compile):

#include <vector>
#include <map>
#include <algorithm>
#include <chrono>
#include <random>
#include <utility>
#include <cstddef>
#include <cstdint>
#include <cstdio>

using namespace std;

class Clock
{
public:
    Clock()
        : m_constructTime(GetTime())
    {
    }

    static double GetTime()
    {
        TPSeconds now(HighResClock::now());

        return now.time_since_epoch().count();
    }

    double GetTimeSinceConstruction() const
    {
        return GetTime() - m_constructTime;
    }

private:
    typedef std::chrono::high_resolution_clock HighResClock;
    typedef std::chrono::duration<double, std::ratio<1>> Seconds;
    typedef std::chrono::time_point<HighResClock, Seconds> TPSeconds;

    double m_constructTime;
};

int main()
{
    const size_t N = 5000000;
    mt19937 rng(0);
    vector<pair<uint32_t, size_t>> randomData;
    randomData.reserve(N);

    // Generate random data.
    {
        Clock clock;

        for (size_t i = 0; i < N; ++i)
        {
            randomData.emplace_back(rng(), i);
        }

        printf("Random data generation took %f seconds!\n", clock.GetTimeSinceConstruction());
    }

    // Insert random data into a map.
    {
        multimap<uint32_t, size_t> data;
        Clock clock;

        for (auto iter = randomData.cbegin(); iter != randomData.cend(); ++iter)
        {
            data.insert(*iter);
        }

        printf("Map sort time: %f seconds.\n", clock.GetTimeSinceConstruction());

        double start = Clock::GetTime();
        int sum = 0;

        for (auto iter = data.cbegin(); iter != data.cend(); ++iter)
        {
            sum += iter->second;
        }

        double end = Clock::GetTime();
        printf("Map iteration time: %f seconds, sum %d.\n", end - start, sum);
    }

    // Sort our random data.
    {
        Clock clock;

        sort(randomData.begin(), randomData.end());
        printf("Vector sort time: %f seconds.\n", clock.GetTimeSinceConstruction());

        double start = Clock::GetTime();
        int sum = 0;

        for (auto iter = randomData.cbegin(); iter != randomData.cend(); ++iter)
        {
            sum += iter->second;
        }

        double end = Clock::GetTime();
        printf("Vector iteration time: %f seconds, sum %d.\n", end - start, sum);
    }

    return 0;
}

 

Compiled with:

 

g++ --std=c++11 sortspeed.cpp -O3 -o sortspeed

 

This code generates 5 million random pieces of data (just some unsigned ints) and then inserts them into a map and sorts an array version of the data.  Time is measured for each version and printed out.  The sum performed in the code is bogus.  It's just there to give the iteration something to look at and forces the compiler to actually generate code that touches each element in the containers.

 

On my Intel Q9550 @ 2.83 ghz, I get the following output:

Random data generation took 0.096909 seconds!
Map sort time: 5.418803 seconds.
Map iteration time: 0.550815 seconds, sum 1642668640.
Vector sort time: 0.563086 seconds.
Vector iteration time: 0.013663 seconds, sum 1642668640.

 

As you can see, the map is significantly slower to the vector, both in terms of sorting speed and iteration speed, by a factor of 10 or more.  You need to carefully determine your requirements for the algorithm you're trying to execute and use the correct data structures to maximize performance.  Although a toy piece of code, this illustrates that certain structures may provide what you need, but they aren't always the minimum requirement.  The map might be more suitable for a problem if it were necessary for you to insert/delete from the list many times.  In games, especially for rendering, insertion and deletion is often not required.  Many engines simply keep a memory block for queuing up draw requests and once the frame is finished, it resets a pointer back to the beginning of the list to effectively erase it in constant time.  Then, the engine just requeues up all elements that need to be drawn again for the next frame.  Because of the fact that the engine doesn't always know what needs to be drawn at any given time, it's often simpler and faster to just rebuild the list of drawables instead of trying to apply deltas to it (inserting/removing the elements that are viewable or not).

 

Edit:  I decided to run some perf tools on the code to provide a little more context.

 

Map only:

Random data generation took 0.097099 seconds!
Map sort time: 5.419488 seconds.
Map iteration time: 0.553151 seconds, sum 1642668640.

 Performance counter stats for './sortspeed':

       6846.409743 task-clock                #    1.000 CPUs utilized          
       263,232,170 cache-references          #   38.448 M/sec                   [33.36%]
        81,311,169 cache-misses              #   30.890 % of all cache refs     [33.40%]
    19,219,819,177 cycles                    #    2.807 GHz                     [33.38%]
     3,390,728,460 instructions              #    0.18  insns per cycle         [50.03%]
       762,888,168 branches                  #  111.429 M/sec                   [49.96%]
        80,913,764 branch-misses             #   10.61% of all branches         [49.97%]

       6.849046381 seconds time elapsed

Vector only:

Random data generation took 0.097330 seconds!
Vector sort time: 0.595129 seconds.
Vector iteration time: 0.013621 seconds, sum 1642668640.

 Performance counter stats for './sortspeed':

        716.372024 task-clock                #    0.997 CPUs utilized          
        22,809,109 cache-references          #   31.840 M/sec                   [33.01%]
           849,197 cache-misses              #    3.723 % of all cache refs     [33.01%]
     1,845,627,135 cycles                    #    2.576 GHz                     [33.88%]
     1,508,337,419 instructions              #    0.82  insns per cycle         [50.53%]
       352,990,250 branches                  #  492.747 M/sec                   [50.24%]
        48,613,614 branch-misses             #   13.77% of all branches         [50.26%]

       0.718218847 seconds time elapsed

Notice the cache miss rate on each: 30.89 % (map) vs 3.723 % (vector).

 

Edit 2: I decided to run my code with 2000 elements.

Random data generation took 0.000035 seconds!
Map sort time: 0.000456 seconds.
Map iteration time: 0.000042 seconds, sum 1999000.
Vector sort time: 0.000149 seconds.
Vector iteration time: 0.000004 seconds, sum 1999000.

Your timer does not have enough resolution to be able to detect the difference.  Again, 2000 items on a computer is very small.  You need to be thinking about hundreds of thousands or millions for it to really matter to computers (unless your algorithm is very slow...).




#5047891 How to update a lot of random tiles in a 2D array?

Posted by snowmanZOMG on 28 March 2013 - 11:55 PM

You haven't provided nearly enough detail about your problem to really get at the potential problems.  Based on what I've read you may have one or more of the following problems (among others which may not be listed here):

  1. Bad algorithm.
  2. Poor memory access patterns.
  3. Bad memory allocation and usage causing excess garbage collection pressure.

Updating only tiles within your "update radius" should be very fast unless your radius is so large, that it contains the entire map.  Even then, you'd have to have a colossal map to have it take enough time to notice a really long pause (~1 second or more).  How are you determining which tiles are in your "update radius"?

 

Maintaining homogeneous lists of each tile type could be advantageous, but you pay for it with memory.  Suppose you put each type of tile in its own list and each list element points directly to the actual tile in your grid.  This gives you the advantage of not having to search the entire grid for all elements of a certain type.  But you're paying for this advantage in speed (by not having to search) by spending memory to have all that information already computed.  I would say it's a worthy trade as long as your game requires you to perform many operations/queries on only subsets of your data, where your subsets typically match your tile types.  If you have operations that require iterating over the entire 2D grid anyways, then this may not be worth doing, since at each tile you visit, you could perform the operation right then and there.

 

Creating new tiles entirely for the update is probably a bad idea.  C# is garbage collected.  If you create tons of new tiles every frame and remove references to the old tiles, you could be creating enormous pressure on the garbage collector.  You should avoid orphaning data at all costs and just mutate existing memory if you want to avoid garbage collection hiccups.

 

Your description of "update" is woefully vague, but I suspect your update is performing something pathologically expensive or your overall algorithm for iteration is just far too excessive for what you want to accomplish.  Even for large 2D grids, you shouldn't have much of a problem updating the entire grid if you've carefully designed your algorithms and data structures.




#5042735 How to analyze run time complexity in code in a trivial way

Posted by snowmanZOMG on 13 March 2013 - 10:37 AM

For a very large number of algorithms you'll end up writing in every day work, it's pretty easy. They tend to be simply counting up the number of times a loop iterates and possibly accounting for nested loops.
 
O(n):

for (int i = 0; i < n; ++i)
{
    ...
}

 
O(n2):

for (int i = 0; i < n; ++i)
{
    for (int j = 0; j < n; ++j)
    {
        ...
    }
}

 
The same method can be analogized to include any function calls made within the loop bodies.  Say, for instance:

for (int i = 0; i < n; ++i)
{
    for (int j = 0; j < n; ++j)
    {
        f(j, n); // f() is O(n).
    }
}

The function call is basically another nested loop, hence O(n3).

But this kind of analysis really only works for very simple algorithms that do simple iteration. Once you get into recursion, you need to start doing some more math to be able to come to asymptotically tight analyses. For some illustrative examples, take a look at some of the material here: http://www.cs.uiuc.edu/~jeffe/teaching/algorithms/

 

In particular:

 

http://www.cs.uiuc.edu/~jeffe/teaching/algorithms/notes/99-recurrences.pdf

http://www.cs.uiuc.edu/~jeffe/teaching/algorithms/notes/98-induction.pdf

 

http://www.cs.uiuc.edu/~jeffe/teaching/algorithms/notes/00-intro.pdf

http://www.cs.uiuc.edu/~jeffe/teaching/algorithms/notes/05-dynprog.pdf

http://www.cs.uiuc.edu/~jeffe/teaching/algorithms/notes/06-sparsedynprog.pdf

http://www.cs.uiuc.edu/~jeffe/teaching/algorithms/notes/14-amortize.pdf

http://www.cs.uiuc.edu/~jeffe/teaching/algorithms/notes/16-unionfind.pdf

http://www.cs.uiuc.edu/~jeffe/teaching/algorithms/notes/27-lowerbounds.pdf

 

The first two links are particularly important/relevant.  The rest sort of just show you the analyses in action.  I would highly recommend knowing how to solve for recurrences.  You cannot hope to analyze anything but the simplest algorithms without knowing how to solve for recurrences.




#5032718 Performance & Struct sizes

Posted by snowmanZOMG on 15 February 2013 - 10:57 AM

* Optimal struct size
On a 32bit CPU, are there advised sizes (32,64, 128, ...)? And, do these double on a 64bit CPU? When dealing with vectors and structs, the size often grows quickly.

 

Optimal struct size depends entirely on the nature of the memory access patterns and the entire memory system for the machine on which the algorithm is being executed.  Your question here seems to imply that if you choose a certain size for the struct, you'll get good performance (or some other nice optimization by some other metric), when no such thing is guaranteed.  What matters to the CPU is that you're always grabbing memory locations in an order that guarantees a high likelihood of touching something already in the cache.  A small struct size can make that easier to achieve, but a small struct size can be worthless if you're grabbing array[0] then you grab array[1024].

 

 

* Filling records
Lets say an optimal size would be 64 bytes, but I only have 60. Is it wise to add some dummy bytes to fill (not concerned about memory usage)?

I guess this is what Delphi does by default btw, unless you declare a "packed record".

 

Let your compiler handle this.  In fact, I would say don't worry about this at all unless you truly have some time critical piece of code.  The only places I've seen that consistently had gains from something like this were acceleration structures for graphics algorithms where you could literally be doing millions of queries.  I remember stressing over bits to try to put a Kd-tree node into a cache line for a particular system, and in the end, it only netted me 10%-15% performance improvement.  Don't get me wrong, that kind of performance improvement can be huge, but it's one of those last resort optimizations and all you're doing is trying to reduce the size of the constant in your asymptotic analysis.

 

 

* Splitting in multiple records
In case 64bytes would be an optimal size, but I need at least 100 bytes... Would it be smart to divide the data in 2 records, each 64 bytes each?

 

Again, this depends entirely on your memory access pattern and the nature of the machine you're running on.  The only correct answer here is: "Maybe".  If we were to assume that the algorithm you're executing touches a particular set of data more often than another, you may find gains by splitting your algorithm and data into one that relies on a hot data and cold data split.  You put things you commonly access into the hot data struct and things you access less frequently into the cold data.  The goal with this sort of strategy is to pack more of the hot data into cache lines so you can take advantage of the caching system.  The rationale for the performance gain hinges on the fact that you should be accessing hot data much more frequently so it is very costly if you cache miss on this memory access, however, if you access cold data, you're likely to cache miss, but since you don't access this data very often it shouldn't have a huge negative impact on performance.  It's using Amdahl's law to arrange memory.

 

 

* Dunno how caching works exactly, but I guess there rules apply especially for looping through arrays structs. But, does it also apply when passing just a single struct to a function?

 

The memory hierarchy for modern computer systems are both simple and very complicated.  They are simple in the high level concept, but very complicated in the nitty gritty details.  I would highly recommend learning up on computer architecture to get a better idea of how the hardware works in general.  You can find some great amount of detail of memory and caches in http://www.akkadia.org/drepper/cpumemory.pdf.  But, to answer your question, caching always applies.  Every single data access you do will be affected by the cache hierarchy on modern CPUs (since just about every single CPU now has a cache).  You have your instruction cache, your data cache, and nowadays, multiple levels of data cache (sometimes shared between cores, sometimes not).  The CPU works pretty much exclusively with registers and the cache, not with main memory.

 

Whether or not these things matter to you in your particular situation is another question.  I'm of the opinion that performance always matters, but you only have so much time in your life to implement your algorithms.  The real issue is which parts of your program have performance implications that actually matter to your end user.  Again, Amdahl's law.  Use it, use it everywhere.




#5016258 Relation between memory used and frames per second

Posted by snowmanZOMG on 31 December 2012 - 09:53 PM

You really should strive to have no memory leaks though.  Even though desktop platforms can be forgiving, it can be a huge nuisance to end users if you leak memory since it will cause the whole system to drag even though it's just your application that's leaking.  Mozilla has been fighting this exact fight for years now, and only recently have they made significant improvements to the way Firefox reclaims memory, specifically from addons, which are a prime source of memory leaks.

 

There are some good points made in this GDC 2008 presentation by Elan Ruskin about a bunch of things related to game development: http://www.valvesoftware.com/publications/2008/GDC2008_CrossPlatformDevelopment.pdf.  I would highly recommend taking a look at slide 25 and onwards for some treatment of memory in games.  A lot of those things are really only necessary for large studios or games that really push limits of systems, but there are a lot of useful tips in there, such as knowing where all the memory is going.




#5016052 Relation between memory used and frames per second

Posted by snowmanZOMG on 31 December 2012 - 07:45 AM

It depends on the platform.  On console platforms, it's bad.  You almost cannot leak any memory at all, because consoles don't have a sophisticated virtual memory system that modern desktop PCs do.  When you run out of memory on a console, you just crash and burn.  But it's unlikely you're working on a console.

 

Desktop PCs, because of their fancy virtual memory systems, memory leaks are still bad but they're not nearly as catastrophic as on consoles.  The amount of "fast" working memory for the operating system to hand out to running processes will slowly decrease and it will start to page out processes onto the disk, which is extremely slow.  I like to think of memory use on desktop PC platforms as mostly about being what I like to call "A good software citizen".  Use your share of memory; don't be greedy and send back what you don't need.  If you do end up using too much, it's usually not the end of the world, but everyone is going to hate you, especially the user of the computer.




#5013657 Relation between memory used and frames per second

Posted by snowmanZOMG on 23 December 2012 - 07:40 AM

You should definitely profile.  Optimizing without a profiler is just hopeless and foolish for all but the simplest of programs.  What profiler to use depends on your system.  I haven't used Visual Studio in quite some time, but if you're using that then you may be able to use the Performance Analyzer.

 

Another thing you could do is to add timers to your code; enclose pieces of code you want to time so you can figure out how long that portion took.  Be sure to also take note of the total frame time as well so you can see how much time that portion takes up in relation to the entire frame.

 

If you want a poor man's sampling profiler:

 

http://stackoverflow.com/questions/375913/what-can-i-use-to-profile-c-code-in-linux/378024#378024

 

The profiler will tell you what portion of the code to focus your attention on, but it doesn't really tell you anything about why something is slow.  You need to at least have an understanding of algorithms and computer architecture (probably also a little dash of operating systems) to be able to know what that "why" is.  Typically, people inspect the algorithm first, since that usually yields the largest gains for relatively minimal effort.  A terrible algorithm replaced with a good one can yield huge gains, especially as problem sizes increase.  But once you're at a fast algorithm, you may be stuck at a wall that's limited by your particular implementation.  This is where computer architecture knowledge often becomes useful.  People then proceed to optimize out slow instruction sequences with fast ones and also rearrange data to allow for faster access.  Sometimes, people flat out "cheat" because they know something specific about the problem and can precompute things and start some computation further along because of those precomputed results.

 

The top answer to this question gives a pretty good account of how optimizations usually go: http://stackoverflow.com/questions/926266/performance-optimization-strategies-of-last-resort




#5013619 How do you make an AI follow an A* path in a 2D platformer game

Posted by snowmanZOMG on 23 December 2012 - 03:08 AM

IADaveMark basically just told you the answer you're looking for.

 

You have to remember that A* is just a search algorithm that goes through different states and pruning out those states that aren't desirable so you can find the path you're looking for.  It doesn't care about the exact details of how you get to each location in the state graph.  It just matters that you can get from one state to another in some way that reflects the cost of that transition.

 

If you properly set up the graph, the graph itself retains the information about where you can jump to or drop to.  The A* search simply tells you to get from A to B to F to G, where there might be a drop from B to F.  It is up to your agent to realize that you need to drop down from B to F when you reach B.




#5013546 Relation between memory used and frames per second

Posted by snowmanZOMG on 22 December 2012 - 07:54 PM

In my experience, how much memory you use matters very little as long as you're not paging out to disk.  What matters more is how your memory is laid out and the subsequent access patterns to those layouts.  If you don't access memory in a cache friendly way, your performance will die.




#5006558 Is C++ too complex?

Posted by snowmanZOMG on 03 December 2012 - 04:04 AM

C++ is too complex. Anyone who doesn't (or isn't willing to) admit that probably hasn't worked with the language enough to realize that or is delusional. Try writing your own parser for the language... But that doesn't mean one can't use it in a manageable way. Of course, I'm speaking mostly from a language implementer's point of view, but many language complexity issues come to light to every day programmers through things like syntax and language semantics.

I recently graduated with a BS in computer science and I have a few friends who are lower classmen and they're now looking to get jobs and I have often looked over their resumes to at least do a bullshit check on it. My school teaches Java as the first language (gross...), but eventually forces people to learn C or C++ through data structures classes and a dash of functional programming in a compilers and languages class. Almost everyone LOVES Java and hates C, C++, and OCaml. Anyways, some people become what I would call "Syntactically proficient" in C++, where I would trust them to implement very small pieces of code or features. Nothing more. It amazes me the audacity people have when they claim their proficiencies in particular languages, even though they have absolutely ZERO credibility in making them. My favorite, of course, is the listing of languages they know and they write something like:

Languages Known:

  • C++ (Expert)
  • Java (Expert)
  • Python (Intermediate)


I know of only a handful of names in the world (We're talking like, 5-10 people) which could be on such a resume that could claim EXPERT in a language as complex as C++... Anyways, this turned out to be more of a rant. The reason why we (game developers) stick with such a complex and nasty language is for many different reasons. For me, one of the biggest reasons is simply it's the best we've got for the job we have. When you need to write high performance code, C++ surely can let you do that. If you need very precise control of hardware, C++ can let you do that. But be wary, if you need both of those, you will likely need to know what you're doing, and C++ does not spare those who don't know what they're doing.

My advice for people who want to know C++ more intimately:
  • Write more C++ code and READ other people's C++ code. Figure out what you do and what they do. Why are you the same or different?
  • IMPLEMENT YOUR OWN COMPILER!
  • LEARN ABOUT LANGUAGE DESIGN!
  • Learn other languages.
  • Read the C++ language standard.

I put 1 up first since I think it's most easily done by the vast majority of people. 2 and 3 though I think really lets you understand some of the fundamental issues that drive the decisions the C++ language committee has to make. You also begin to understand why entire features are designed the way they are in terms of semantics and syntax. It also helps you put into perspective other programming languages (sure helped me). The last point, isn't really going to be helpful if you aren't already pretty proficient in C++ and programming language design issues. In fact, it may be of very little help to anyone unless you're interested in implementing a C++ compiler.

*Edit*
I should clarify my suggestion #2: Implement your own compiler for a simple, object oriented language. COOL perhaps. Implementing your own C++ compiler would be a herculean task. It takes teams of experts to implement one.


#5002778 object-oriented vs. data-oriented design?

Posted by snowmanZOMG on 20 November 2012 - 04:05 PM

which one of these paradigms is better to follow in gamedev? or maybe both in combination?


Use what's most appropriate for the problem you're trying to solve. It may be difficult to know which solution is "more appropriate" if you don't have the requisite experience. You can often use pure DoD and avoid OOP but you might have to work much harder to get the result you want. The same can be said of the other direction, but the cost of "work much harder" might appear to you in a different way. One may just be awkward to write the code for or you may have to write excessive amounts of code. The other may obscure the purpose of code unnecessarily or require extreme amounts of detailed knowledge to be able to use properly.

Data oriented designs can lend itself to very high performance code, but you need to do it correctly. OOP can make things more convenient, but you may end up with a monstrous hierarchy with strange dependencies. And you may have poor performing code.

Understand your tools well and use them wisely.


#4984267 Minimal Path Problem

Posted by snowmanZOMG on 27 September 2012 - 02:03 AM

If all you're doing is modeling terrain elevation costs in a path, I don't see why you need such complicated conditional edge weights. You could easily account for elevation cost in a graph by simply defining a baseline cost between two nodes at the same elevation. If the elevation differs, simply add or subtract from that base cost to simulate increased or decreased difficulty of that transition (be sure to never go negative, or else Dijkstra or A* is no longer correct!).

Modeling the problem in this manner allows you to use local information at each step to build up a more global solution without having to inspect the actual decisions made to determine future values. Your path cost here means the sum total of the "ease of travel" and if you minimize this cost, your path will be the one that is easiest to follow.


#4979927 Advanced pathfinding with robot in 3D

Posted by snowmanZOMG on 13 September 2012 - 08:56 PM

You're dealing with motion planning here. Check out http://planning.cs.uiuc.edu/


#4961854 A* turning and unit direction

Posted by snowmanZOMG on 21 July 2012 - 10:01 PM

I have also been investigating various pathfinding solutions that take into account a turn radius and agent orientation. The most useful article I've seen so far on this matter is http://www.gamasutra...pathfinding.php where they essentially make the orientation information a part of their search state space and rule out certain tiles based on the given turning radius during the search.

It's a rather old article, but I think lots of the information there is still relevant. There's also a presentation floating around on aigamedev about some pathing goodness from Company of Heroes, but if I recall, that presentation is a little light on details with regard to their vehicular movement.


#4917063 Organizing and managing code?

Posted by snowmanZOMG on 27 February 2012 - 11:54 AM

Smart pointers aren't garbage collected. The different variants of smart pointers do different things to keep track of when something should be deleted, but garbage collection isn't what's going on here. Garbage collection is a process that inspects the heap and searches for reachable memory objects and ones that aren't and deletes the unreachable ones. Smart pointers, such as shared_ptr use something called reference counting which keeps track of how many things currently have access to the pointer. When the reference count reaches zero, the shared_ptr knows that nothing can access the pointer and deletes it.

It may seem like a subtle distinction, but it's very different from garbage collection. The key difference is that garbage collection actually spends time to inspect large portions of the heap and builds lists of what's reachable and what's not. shared_ptr maintains an internal reference counting mechanism that is appropriately modified on construction, copy construction, assignment, and destruction of the instances of the shared_ptr.

As far as your question is concerned, the pointer you've declared:

b2Body * body;

Is a variable that lives on the stack which is a pointer to b2Body. There's nothing special going on here and currently you've not used shared_ptr. body isn't garbage collected, or reference counted, or anything fancy like that. It's just standard stack frame allocation and deallocation when it scopes in and out.

However, if you later on perform a:

body = new b2Body();

There are several things that can happen. One thing that could happen is that you forget to ever put a delete in there. What happens is as soon as the scope of the variable body has ended, the stack memory for that pointer is deallocated but the heap object you allocated through the new is not. You've just leaked memory.

You could also add in all of the necessary deletes. This can be anything from just deleting right at the end of the scope if you don't need the object after that (in which case, you should have probably just put the entire object on the stack in the first place) or following all the places where the object's lifetime is relevant and finding when it dies and placing deletes there.

But, notice there's no garbage collection or reference counting sorcery going on here. You're using old school, manual heap allocations, where you have to do everything yourself. If you had used a shared_ptr() from the beginning, you wouldn't have to worry about those things (as much...). shared_ptr and many of the other smart pointers aren't perfect (shared_ptr can still leak if you aren't careful), but they are extremely convenient if you can afford to use them.


PARTNERS