Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 13 Jan 2007
Offline Last Active Oct 20 2016 03:47 AM

#5297713 What maximum amount of flood fill can a dijkstra handle?

Posted by on 23 June 2016 - 10:43 AM

Dijkstra's minimum path between a pair of points routine is the famous algorithm you are probably writing about, he had several major algorithms so you might mean another.

The single pair algorithm potentially takes quite a lot of memory when implemented naively, on the order of N^3 sounds correct. Most games use the A* (A star) algorithm, which is Dijkstra's single pair shortest path algorithm plus a small heuristic value to speed up the search a bit, at the cost of potentially not being optimal.

Since you mention "flood fill", I'm guessing you are talking about an all-pairs shortest path algorithm? Those exist, but are much more memory intensive for anything other than just storing a binary reachable status. Using the single pair shortest path over and over for all pairs is a bit extreme.

You might also mean having a path that follows a space filling curve across an entire board. That can take as much resources as your path requires, entirely dependent on your implementation.

Finally, there are many implementations of these algorithms online, some quite efficient and others are terribly inefficient. Can you share what implementation you are following?


As has been said, so long as your heuristic does not overestimate (the heuristic is always less than or equal to the actual optimal path cost) it is guaranteed to be optimal. Also, to note, A* can run faster if you do not require an optimal path, so that's also something to keep in mind... though if you really want greedy best-first search, don't use A* as that has the bookkeeping logic required to keep track of the length of the path so far which is what let's A* not be greedy.

Without knowing what exactly he's doing, though, it is impossible to recommend an ideal solution. Floodfill can be used for anything from pathfinding, to image editing, to determining if two voxel volumes connect, to other things. They're all just functionally graph-processing algorithms. There's other ways to do pathfinding as well (that generally couple with A*), including preprocessing certain pairs (jump point search - it basically identifies obstructions and preprocesses the pairs at the corners so you can get around blocking corners with 1 test), using hierarchical space above the nodes to quickly check path validity, and so on. That, and different algorithms are used if your map is dynamic vs static.

#5275796 MSVC generating much slower code compared to GCC

Posted by on 15 February 2016 - 02:03 PM

 __restrict cannot be used on references though, for whatever reason. And even though I read an explanation on those forums that references are already thought of to not be able to be rebound, testing it with assembler showed significant worse generated code than with pointers and __restrict (or whatever equivalent).


__restrict can be used on references as of Visual C++ 2015. It was one of the things that I was glad that they added. You also can functionally specify member functions as restrict in 2015 (where this is restrict).

#5275439 MSVC generating much slower code compared to GCC

Posted by on 12 February 2016 - 12:52 PM

The first version calls std::vector<>::size() every iteration. The second does so only once and stores the value in a local variable.


I've seen this come up in profiles before (and have corrected it). Visual C++, at least, has a lot of difficulty with this situation. It treats m_sum practically as a global variable for purposes of optimization in this case, so every single operation done on it in that loop incurs a load-hit-store. In the latter, the compiler knows very well that sum is local to just the function and thus generally just performs the operations on a register and stores it at the end.

This can be hugely faster in some cases, depending on what you're doing. While this is bad on x86, on the PPC consoles (like the 360 or the PS3) where LHS were death, this caused major slowdowns.

#5219390 Why didn't somebody tell me?

Posted by on 26 March 2015 - 12:30 PM

I like to talk (and text message) in fake olde english from time to time, for my own amusement.


Apparently, nobody bothered to correct my mispronouncing of the word ' ye '. It's the exact same word, and pronunciation, of the word "the"!


I have always pronounced it like " yee ", but it's pronounced closer to " duh ", because it's not actually the letter 'y' but a special letter we dropped from the english alphabet, which was a single-letter for 'th' combined (like in 'there').


Unless you're referring to the pronoun ye, which was also the Early Modern English second person plural... as in, "I am talking to ye five gentlemen!".


Also, yes, the y was used in those cases because typesets lacked þorn, so they used y.  There was also another character, eð, which was similar.


*Ye*, the article, is pronounced *the*. *Ye*, the pronoun, is pronounced *ye*.

#5219139 Why didn't somebody tell me?

Posted by on 25 March 2015 - 01:47 PM

I called duct tape duck tape for years without really wondering why it was called that.


Because it is duck tape. It originally was a tape made using duck cloth. It should not be used on ducts. Duck tape was also the original name.

#5217815 How 3D engines were made in the 80's and 90's without using any graph...

Posted by on 19 March 2015 - 11:36 PM

Something that is worth noting is that computers at that time were 16-bit (or MS DOS at least).

This means that only 64kb could be addressed via virtual memory at once.

Today we can allocate a buffer of 640x480x32bpp (or 1920x1080 if you want), store the address in a pointer and write to all those pixels in a for loop. Even 1920x1080x32bpp is a mere 7.91MB. Easy piesy.

With a 16-bit machine, you had to access video memory in 64kb chunks; paging in and out what you needed. And be sure you had access to the video memory region you needed and left enough space to address the data you needed for yourself. Oh! and while you do all that, make sure to switch those 64kb chunks as less often as possible.


​Not always true; you could and often did use DOS extenders (like DOS4/G) which allowed your program to execute in protected mode, giving you 32-bit addressing. Even on the 286, you had four segment registers, so you could access 256KiB of memory at a time.

​Watcom, for instance, included an extender.

#5216162 Forward+ vs Deferred rendering

Posted by on 12 March 2015 - 05:03 PM

There was also this variant of clustered rendering written about on this forum.


I would actually love to see a benchmark between all variants.

#5214043 null a vector of pointers in a parameter?

Posted by on 02 March 2015 - 04:17 PM

If an empty container would work, you can now do this:

foo( {} );


Which will send an empty array down.

#5213428 Best way of ensuring semantic type consistency?

Posted by on 27 February 2015 - 06:49 PM

I use the user-defined literal approach (I refer to them as Concrete Types), but with more changes. Mainly, I have dimensional units - that is, 'time' would have the units be 0 for distance, 0 for mass, 1 for time.


You'd see something like this as the declaration:


template <typename T, uint dimLength, uint dimMass, uint dimTime, ...etc...> class concrete_type...

I also have more significant work involved within the concrete_type class to enforce type safety and make it more difficult to accidentally assign values that don't make sense.


The nice thing about it is that:


time val1 = 1_s;
length val2 = 1_m;

auto val3 = val2 / val1; // decltype(val3) == speed.

float val4 = val2 / val1; // error
time val5 = val2 / vall; // error
float val6 = val3; // error
float val6 = val3.raw(); // OK

#5196172 Default font texts from windows

Posted by on 03 December 2014 - 08:12 PM

I'm a fan of the Liberation Fonts.

#5168965 Funniest line of code ever ?

Posted by on 24 July 2014 - 02:10 PM


When I was trying to teach somebody to code they wanted to use 'or' to randomly pick between two numbers
value = 1 or 2
Its understandable how a non programmer would expect this to work, so I didn't think he was dumb for doing it, but I still found it funny.

I remember trying to do the same thing myself when I was learning c++.



You should be able to do something like that with variadic templates, though... make a proper templated class, and a function helper to simplify the syntax, and you should be able to do akin to:

if (multi(foo, bar, 4) == some_var) ...

#5129144 fractal result by accident

Posted by on 05 February 2014 - 03:24 PM



Clearly an aliasing effect and nothing fractal about it.


If you say so ;\ (yawn)



Why do you bother asking if you're just going to reject the answers you get if they don't match your predetermined presumptions? I concur that it just looks like a Moiré pattern.

#5128463 Why would devs be opposed to the PS4`s x86 architecture?

Posted by on 03 February 2014 - 11:12 AM

X360's x86 architecture could perform 77 GFLOPS, the PS3 could perform 230 GFLOPS.

Just to nitpick, but the XBox 360 didn't have an x86 processor - it was a PowerPC just like the PS3 but with a different internal architecture. Perhaps you meant that the design of the Xenon CPU is more similar to a PowerPC version of common x86 chips than it is to the distributed design of the Core CPU?

#5080275 Better FPS with high-poly terrain

Posted by on 24 July 2013 - 05:51 PM

In other words, you implemented a O(log n) solution a la spatial subdivision?

#5079031 Generating Terrain on the GPU vs on the CPU

Posted by on 19 July 2013 - 04:14 PM

Graphical applications usually are bottlenecked by pixel shader, while vertex shader and cpu are being idle most of the time. People usually move computations from cpu to vertex shader to save cpu/memory resources for other cpu specific tasks like AI, physics.

In your case, you have moved some calculation from cpu which was idle most of the time to vertex shader which was also idle most of the time. The performance of your application was determined by pixel shader which was not changed and overall performance of your application was not changed, too. But you should be able to see difference in loads on cpu and vertex shader using some performance profiling tool like Nvidia PerfHud.


Most modern GPUs have unified shader models, so there are no dedicated vertex or fragment units. They are shared.