Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 27 Aug 2002
Offline Last Active Dec 18 2014 12:59 PM

#5128355 Why would devs be opposed to the PS4`s x86 architecture?

Posted by Krohm on 03 February 2014 - 02:10 AM

People who have spent their whole programming lives on the x86 platform don't really notice
I have spent my life on x86, yet I feel uneasy with it. One just has to look at the details.

Example: compare the effective ALU size of a die to the control circuitry, most of it is for out-of-order execution. All the instruction dependencies could be accounted at compile time, so what is OOE useful for? To hide the memory latency I suppose... a problem that arisen exactly because there's no way to put the right amount of memory in the right place. OOE makes no sense for closed architectures where SW is heavily scrutinized.


OOE reloaded: Intel puts two cores into one, then figures out most threads are dependancy limited and provides HyperThreading to perform OOE across two threads. Kudos to them for the technical feat but I wonder if that's really necessary and BTW, no HT-enabled low-end processor years after introduction.


There's no such thing as x86: x86-Intel is a thing. x86-AMD is another. The latter is at least somewhat coherent in features. With Intel the thing is a bit more complicated (albeit we can discuss those advanced features to be currently useful).


So, I can absolutely understand at least one good reason for which developers didn't take this nicely. X86 is just cheap, in all senses, good and bad.

#5127400 Raw draw functions in Game Engine (C++)?

Posted by Krohm on 30 January 2014 - 01:33 AM

Don't use .obj to do animations. Don't even try. It is a long, painful way with nothing at the end.

Personally I would suggest you to try prototyping it using an existing game - UDK will sure do but it's a bit complicated. You could be able to set up simple time-based scripting by just dealing with delays (from the editor) and input mangling.

Do not hardcode limits in your system. It's just easier to ensure all your levels are "boxed" correctly. The collision library Bullet has a special primitive for this specific purpose: a wall of infinite thickness if you want to have it easy. Personally I don't use it as its restitution is a bit different compared to standard "finite" rigid bodies.

#5126652 Concatenation of vertex buffers and performance

Posted by Krohm on 27 January 2014 - 12:18 AM

If I use 16bit indices then I could no longer use more then 65535 vertices right?

It means that each single batch can consume up to 64*1024 vertices since there's no way to address more. I suggest to stay away from index 0xFFFF however: on some hardware I tested, it had odd performance behavior (likely because interactive with primitive-restart functionality). Those implementations are old, and the issue is probably gone right now. You can still put a million indices in each VB and use batch offsets to select the correct sub-region.




How far can I stretch vertex buffer merging for performance gains?

Switching vertex buffer is not even nearly half expensive as it used to be. Doing a new buffer for each resource is often doable. Personally, I merge my buffers according to "resource load rounds" and vertex format so I don't have to write complex code and the inspected buffer looks good in debugger.

Are there any rules of thumb of how many vertices that should be in the same buffer?
Dynamic buffers should be below 4MiB for best performance (according to AMD, referring to GCN drivers). Static buffers can likely be a gazillion vertices with no problem.

#5126645 Making a new game?

Posted by Krohm on 26 January 2014 - 11:57 PM

I strongly suggest to produce a document explaining your game first. If the game is "extremely complex" do not even start. The game must be well understood and "compact" at least in your mind. I suggest to start producing a game design document.

#5126641 Physics and huge game worlds

Posted by Krohm on 26 January 2014 - 11:28 PM

Performance is bad because the algorithm cannot make any assumption about the collision data. This is not the case for other collision shapes, which also have a fully parametric representation.

I am currently using proxy hulls for everything (hand made). This is standard about now albeit we could discuss on the amount of automation.

#5125789 Physics and huge game worlds

Posted by Krohm on 22 January 2014 - 11:12 PM

As an aside, Havok deals with that kind of problem with their concept of Islands.  I'm guessing Bullet doesn't have a similar built in structure? 

It does, but sadly, it doesn't expose them at library level.


Also btBvhTriangleMeshShape takes some time to build the hierarchy, so  serialize or not to serialize??
Do not serialize, do not use. Performance is terrible to start with and I cannot understand why so many people go for it.

#5124087 I'm having doubt about my map loading method and its performance.

Posted by Krohm on 16 January 2014 - 02:17 AM

For starters, switch from text to binary It will be much faster, since you don't have to go through hundreds of characters, you just read a piece of data and there you are.
I have to point considering binary files to just be "the data, right there" is likely to be a good way to be in trouble. Soon. TheUnnamable points out a first example, dealing with endiannes. While a true problem, its relevance is overstated in my opinion.

What the binary file does is to guarantee each chunk of data has a known footprint, easily inferred from state. It does not guarantee the value itself is coherent with previous state. Input sanification is still required - what you gain is a much, much more compact parser, which can be ideally 5 LOC for a pure data blob.


With small modifications by me

Binary formats are not necessarily faster and they have a bazillion other drawbacks.

  1. Disk seek time will completely swamp any CPU processing time.
  2. Text files can in many cases compress better than binary equivalents and so (using a ZIP asset file or zlib to compress your source assets)...
  3. you can actually get faster loading time with text than binary.
  4. To be sure, measure, measure, measure.

I would like to know what those drawbacks are supposed to be as...

  1. not a binary file problem. Seek is always a problem, no matter if binary or text. But with text you have the additional complexity of parsing, especially if the format is designed to be human-usable;
  2. to a smaller file? Or in percentage? Information is information. Both files store the same amount of information, with data preferring one representation or the other depending on values themselves;
  3. sure you "can" in some circumstances. I don't recall it happening to me however;
  4. please stop with that "to be sure..." thing. Time is not free. Either you think something is worth or not.

Now, back to the original problem.

Text files are suitable for small amounts of data, such as level configuration in a tower defense game, a set of tile indices for parts of levels in a tile-based game.

If you can guarantee the syntax is simple, loading text has the inconvenience of variable-length but loading complexity will still stay low. If you care about performance, you might cheat by terminating each buffer with a null temporarily and removing that after processing the token. Or, more nicely, you might switch to a pointer-length approach where string termination is not assumed. This way, memory allocations are lower and performance goes up.

If you need even more performance, it's probably time to drop text. Filters to binary can cook the data for you in awesome ways.

Also consider json.

#5121713 How efficient are current GPU schedulers?

Posted by Krohm on 06 January 2014 - 02:42 PM

I am very well aware of the latency-hiding strategies involving block read/write.

There's no heavy math in the memory-heavy section. I don't understand what you're saying

while you're doing extra work ... the kernel actually runs faster while doing the heavy math at peak rates

I don't understand what kind of extra work are you referring to: changing layout or decompressing/transforming data appears something I'd have to do, not the HW.

if you wonder about some optimization you come up with, it's indeed best if you just try and profile it.

Are you reading my posts? We don't have easy access to neither GCN nor Kepler devices. I'm sorry to write this but so far I haven't read anything I didn't know already and I start being aware I am not able to express my thoughts.

#5121585 How efficient are current GPU schedulers?

Posted by Krohm on 06 January 2014 - 03:08 AM

While I had a quick look on GPU releases in the last few years, since I've focused on Development (as opposed to Research), I've haven't had the time to deeply inspect GPU performance patterns.

On this forum I see a lot of people dealing with graphics is still optimizing a lot in terms of data packing and such. It seems very little has changed so far, yet on this forum we care about a widespread installed base.


In an attempt to bring my knowledge up-to-date, I've joined a friend of mine in learning CL, and we're studying various publicly available kernels.

In particular, there's one having the following "shape":


The kernel is extremely linear up to a certain point, where it starts using a lot of temporary registers. After a while, those massive amount of values are only read and then become irrelevant.


What I expect to happen is that the various kernel instances will

  1. be instanced in number to fit execution clusters, according to the amount of memory consumed.
    What happens to other ALUs in the same cluster?
  2. Happily churn along until memory starts to be used. At this point, they will starve one after the other due to the low arithmetic intensity.
  3. The scheduler will therefore swap the "threads" massively every time they starve by bandwidth.
  4. When a "thread" is nearby the ending, compact phase, the swapping will possibly end.

It is unclear to me if the compiler/scheduler is currently smart enough to figure out the kernel is in fact made of three phases with different performance behavior.


However, back when CL was not even in the works and GPGPU was the way to do this, the goal was to make "threads" somehow regular. The whole point was that scheduling was very weak and the ALUs should have been working conceptually "in locksteps". This spurred discussions about the "pixel batch size" back in the day.


Now, I am wondering if simplifying the scheduling could improve the performance. On modern architectures such as GCN (or Kepler).

The real bet I'm doing with him is that the slowdown introduced by increased communication (which is highly coherent) will be smaller than the benefit given by improved execution flow.


Unfortunately, we don't have easy access to neither GCN nor Kepler systems, so all this is pure speculation. Do you think it still makes sense to think in those terms?


Edit: punctuation.

#5117266 Rendering with only 8 lights active

Posted by Krohm on 16 December 2013 - 02:04 AM

Personally, if I'd have a realistic test case in which I need to have more than 100x the amount of supported lights, I'd start considering other methodologies, such as deferred shading. 

#5116440 Snake Gone Wild

Posted by Krohm on 12 December 2013 - 02:01 AM

Well done, but I'd just make everything bigger (say 40x40 map each tile being twice as big), seconding Jay.

#5116436 So, I want to make a game engine...

Posted by Krohm on 12 December 2013 - 01:30 AM

Having a "taste" of different languages is indeed an excellent idea.

I'd personally stay away from Java. It's just too verbose and HTML5 can do many, perhaps even most things Java excels at. The library is quite verbose and system integration is, in my opinion, still lacking.

However, no matter what you do, you will never be able to build an engine (in the sense of multi-game shared platform) without writing a few games first, possibly from different genres. Your resulting design would just not interact well with the gameplay constructs or the data flow involved.

So, next step in your path to engine design is: write an engine (in terms of logic for a game).

#5115590 Bullet physics undefined behavior with a dummy simulation

Posted by Krohm on 09 December 2013 - 02:06 AM

Isn't it generally considered a bad idea to explicitly set an objects velocity in a physics engine? I've always read that you should apply an appropriate force instead.
I'd agree with that. Setting velocities directly always had repercussions on my systems. Dynamic objects are supposed to be completely simulated by the library, if you want to change them, at least make sure you wake em up from sleeping. Try calling ->activate().

#5115584 [Terrain-RTS map] Advanced shadings for better detail!

Posted by Krohm on 09 December 2013 - 01:45 AM

And the result is not bad, but still cant compare to current game today (like starcraft2, civilization V, shogun 2 campaing map....).
You will never be able to compare. To do so, you first need an artist and you probably need a budget, counting at least 4 digits.

What I'm trying to say is that you don't make a game using technical feats alone. You need artwork, you need a aesthetic vision, a look and feel. Those concepts are not currently present in this thread, and that's terrible. It's not just a matter of doing HDR or bloom (strange, it's part of HDR) or DX11 lighting (whatever this is supposed to be) or bumps.

#5113679 Forcing Code To Work !

Posted by Krohm on 02 December 2013 - 01:55 AM

Mph. That made me uncomfortable but I guess it's just PHP? :P