Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 15 Dec 2001
Offline Last Active Today, 05:28 PM

#5206760 OO where do entity type definitions go?

Posted by phantom on Today, 03:21 PM

I'd like to apologise; when I gave you the earlier advice I assumed your data structures would contain the data required for their respective tasks not be.. well... crap.

Apprently we have WILDLY different meanings when it comes to the words 'not very big' as that animal struct is FAR from not being big... because it contains.. well.. crap.

Let me show you what I see;
struct animalTypeRec
  char craptakingupacacheline[64];  // 64bytes
  char craptakingupmorecachespace[36];  // 36bytes
  int bunchOfNotPerFrameStuff[22]; // 88bytes
  float heySomethingUsefull;       // 4bytes
  float oppsUnRelatedAagain[2]     // 8 bytes
  float somethingRelatedTo8BytesAgo; // 4 bytes
Aka 204 bytes (although the compiler might well pad that to 208 to get 4 byte alignment on the structure size) of which most is crap.
(For the rest of this we are assuming these live in isolation; the problem changes depending on what is around it in an "animal instance" although with the 100 bytes of crap at the start that just means that whatever is before it in memory is going to be pulling in some amount of 64bytes of rubbish on access.)

Now, I've got no idea wtf your update loop looks like (although I do recall a cache missing mess a couple of years ago so I'm going to go with.. hideous) but at a guess I'm going to say that 'speed' and 'turnrate' are the two useful 'frame by frame' values in that structure.

"Speed" lives at an offset of 188 into the structure; CPUs on the other hand fetch cached aligned 64bytes at a time. As we are 180 bytes in the CPU will naturally 'skip' the first 128 bytes as we don't need them and will drop us 128bytes into the data meaning it will read from 'trinkets' onwards in your original structure definition.

Or as I like to think of it [60bytes of crap we don't need][4 bytes of useful].

At which point I'm taking a guess that 'turnrate' will come into play. In this case it is only 8 bytes away which means we need to read in 12bytes + 52bytes of whatever follows (probably the opening 52 characters of crap in animals[1]).

So in order to update ONE creature; we've had to read in 128bytes, of which 8 bytes are useful.
1 in 16 bytes transferred was data we wanted.

By anyone's metric that is terrible.

Welcome to the world of 'hot-cold analysis' wherein you work out which data you need together and split your data structures accordingly.

Firstly, I'd dump the char[100] array; that's a char * to somewhere else, it has no place taking up 100bytes in that structure.
Secondly, 'Number appearing' doesn't seem like per-instance data so fuck it off somewhere else.
Third, arrange things by access if you really must store them in this mess.

At a punt;
struct lessBullShitty
  int hp, tohit, takehit_wav;
  int atkdmg, attack_wav;
  float speed, turnrate;
  bool can_climb;
  bool avian;

  int rad, AI;  // no idea what these are...
  int xp, meat, bone, trinkets, hides;  // assume these are loot things

  int mesh, texture;
  float y_offset, scale;
  int animations[5]; // being lazy, name them if you wish

  char * name;
If nothing else your structure is now 96bytes smaller and more logically arranged.

I could do the same on your other structures but frankly the massive one is just making me feel sick even thinking about it to trying to demanage that is a case of taking the Fuck This Train to Nope City.

I will say however don't make a massive "all the fucking things lol" structure for things which don't require it, comments like 'used by food' in with 'used by missiles' is a big blinking sign which says 'warning; this structure is fucked up' which can be seen from space.

I seem to recall taking you to task over your data layout and update loop two years ago on this forum, where I called them out for being piles of shit, so the fact we are here again now with the same bullshit is just frankly annoying.

#5205778 How to get a job in Graphic programming?

Posted by phantom on 21 January 2015 - 08:37 AM

A more important thing is that Graphics Programming is very rarely an Entry-Level position. Most people go into it after working as a gameplay programmer or similar (Or a crapton of school), so if you are only applying to Graphics Programming positions, its very unlikely you will find a job.

This is key in my opinion.
While it isn't impossible to be hired directly to the role you'll need a very strong portfolio behind you showing that you can cut the mustard. I've not looked at the links by judging by the comments made so far it would seem you don't have a degree which is also going to hold you back from getting a job (I know from experience; applied to a company before I had my result and got nothing. Got result and suddenly the same company were 'desperate to talk' to me. The paper matters).

But, ultimately, unless you can show you are very good you are unlikely to get hired into graphics programming role directly and the UK has no shortage of more experienced graphics programmers kicking about right now which probably doesn't help you smile.png

While it might not be your passion I would see about getting into the industry first, get a year or two under your belt before trying to grab a graphics role and during that time work on your skills outside of work hours to make that more likely.

#5203550 Triangles can't keep up?

Posted by phantom on 11 January 2015 - 03:50 PM

Unfortunately the geo-shader stage is a performance sinkhole and generally best avoided.

Most likely the best method (given the OPs current choice) is the one set forward by Hodgman earlier where you would store the data once per particle and then use the vertex shader to generate the triangles - these won't get culled on their centre point by the hardware either as culling happens AFTER the VS run happens.

(The 'best' way relies on probably newer hardware/features than the OP is using; the simulation is done in a compute shader, the output buffers are then bound as inputs to the VS, the draw call is executed with a null vertex buffer and a vertex count of particles * 4, then the vertex shader uses the vertex ID to index into the data buffers and generate all the data itself.)

#5203083 C# seems good, but....

Posted by phantom on 09 January 2015 - 07:50 AM

Our low-end phones are faster than the high-end gaming systems of that era. You don't need to be obsessed with speed any more!

The counter argument to that is one of power; the faster you can get back to a 'low power state' the better on phones so while languages like JavaScript are popular I weep the amount of resources which are wasted by them.

However this is very much an aside and doesn't really have an impact on the OP because they are just starting out so I would echo the recommendation to use C# with the additional consideration that, depending on their aims, using an existing engine to make a game might be a good direction to head in too.

#5202835 Why does this benchmark behave this way?

Posted by phantom on 08 January 2015 - 08:19 AM

I'm not sure anyone completely understands what you are asking?

Are you asking why the Intel processors give different results to the Elbrus-2C+(E2K)? If so that'll be down to the internal design of the Intel chips vs the Elbrus. (Remember: the assembly you see is NOT what the CPU executes in the ALU units - while the assembly might look like a CISC under the hood Intel CPUs have been converting to RISC for some time now.)

If you are asking why /Od gives different results to /O2, well look at the assembly output - the /O2 is clearly doing less work instruction wise and the processor is doing the rest.

Simply put the superscalar out-of-order architecture of the Intel processor is better than the Elbrus's architecture which is why it can do the same work in less time.

If you are asking something else then you need to be clearer in your intent.

#5202609 Convert index<1> idx in parallel_for_each to integer index

Posted by phantom on 07 January 2015 - 09:51 AM

The [=] is part of C++'s lambda syntax, so you'll probably want to read up on that before doing much else.

As to your original problem; here is the solution.

index<n> is a class which represents a value with 'n' rank elements; in order to convert this to an int you'll need to access the correct rank via the operator[] function. In this case you want to do this;

unsigned int x = idx[0] % VPWidth;
unsigned int y = idx[0] / VPHeight;

#5202063 do most games do a lot of dynamic memory allocation?

Posted by phantom on 05 January 2015 - 03:06 PM

For what it's worth: in my experience reading blog posts and listening to talks by experienced programmers (i.e. engine programmers at naughty dog, insomniac, ubisoft, etc.) the topics are usually about data-oriented design, cache efficiency, dislike of C++/STL and its implementation, absolutely no exceptions, a _very strong_ hate against OOP and "modern design". Engine programmers seem to prefer old school approaches: plain C language,  plain data structs and functions, circular and fixed-size arrays for everything.

You should qualify that with 'some engine programmers' and even then these things come with qualifiers.

For example C++ is fine, no one really has a problem with it as a core, what people dislike is things covered in 'typical C++ bullshit' type talks which go directly against data orientated design considerations - but the language itself is no more disliked than any other language out there.

The Standard Template Library tends to be a love/hate thing - the ideas are sound but lack of memory control is a problem (which on consoles is a real consideration), and the code is general which might not have the performance of hand tuned stuff. ('might' being the keyword; some years past now someone here benchmarked the Vs.Net 2002 Std containers vs Quake3's handrolled ones and found they were much faster). That also tends to be legacy due to existing code bases - some places are considering switching for some things now. There is also a deep distrust of template code which may or may not be 100% justified again depending on the situation.

Exceptions are a two fold thing; historical because of poor implementations and overhead (certainly on older systems) and simply a case of 'not needed' because games tend not to run into real exceptional cases which aren't already allowed for in the code flow. Largely a performance thing because the same people can/will use exceptions in code.

The hate for 'OOP' isn't about OOP but about the 'classes for everything!' over engineered Java trained bullshit which is spat out by graduates these days; OOP in the right place and with the right considerations is fine but it's knowing the right time and place to do things. A Vector might be an object and you are still dealing with it as an object but that doesn't mean you forget everything else. This is more an anti-bullshit sentiment than an anti-OOP (and C can do OOP just as much as C++, you just have to hand roll things).

"Modern design" is also gaining traction because a lot of 'modern design' is simply common sense; things like smart pointers have existed in engine code bases for a long time already so much of this isn't new. The difference tends to be handed rolled things and a lack of fear about using raw pointers when it is the correct time to do so. Even containers are common, just tend to be handed rolled.

You'll find plenty of engine devs who use C++, and are even using C++11 features (auto, lambdas, move operators to name 3) as the compilers support them and those people will also use POD structures, free functions when it makes sense (both globally and in namespaces), fixed sized arrays for the right thing and circular buffers when it is correct.

The truly great engine programmers do not tie themselves to old dogma, they continue to evolve and improve their knowledge and their methods, learning from mistakes and improving their practises as data suggests they should.

I speak from experience as an engine programmer on AAA games.

#5202023 Seperate update and draw code by thread - an Idea

Posted by phantom on 05 January 2015 - 01:05 PM

This isn't an uncommon approach already; the main difficulty is dealing with data which needs to flow between 'update' and 'render' threads - do you copy? double buffer? more?

But as things go, yes, it has been a tried and tested idea for a while now smile.png

The biggest issue, going forward, is that is doesn't scale - you are using two threads and thus two cores but CPUs these day can come with upwards of 4 cores so you are under-utilising the CPU. At this point people go off into 'task' systems so that work can be spread even more (although rendering submission gets stuck on a single thread right now).

However as a starter; yes this is a sane way to go about things smile.png

#5202015 do most games do a lot of dynamic memory allocation?

Posted by phantom on 05 January 2015 - 12:46 PM

but as i said before, i'm not doing monster engines. i'm mostly doing basic sims with 100 targets max kinda thing. so setting sizes upfront isn't too difficult. start with MAXTGTS=100. maybe kick it to 200 before release, that kind of thing.

Maybe you aren't, but you've rocked up here questioning things as though your minor project is in some way shining a light as to The Best way when it is just one example of a specific thing done in a specific way which works for you - lets not pretend that playing the static allocation game is anything less than wasteful bad practice which just happens to work for you. 

as for global, yes its bad. its unsafe - easy to misuse.  the safety lacking in the code syntax must be replaced with coding policies and methodologies which must be well documented, and rigorously followed with strict coder discipline to avoid problems. but its the way i started, so i'm used to it. in the long run, all i really use it for its to avoid calling setter and getter methods everywhere. i suspect that it might be possible to write a game where all data is in private modules acessable only via getter and setter methods, and then you have control code modules that call getter and setter methods and perform operations on the values. the only thing that any data module would have to know about any other module would be any custom data structure definitions used to get and set its values. the only thing controller code modules would need to know would be custom data structures used to get and set values of data modules they use. an extremely modular system.  but as you can see, there would be a lot of get and set calls.

No, what I can see is a straw man argument built up using a lot of poor examples and something which screams 'bad design' at me - a good system design does not have 'getters and setters everywhere' and does not require module after module to know about the structures or internal setup of other systems.

A good system design decouples. A good system design hides. A good system design does not vomit all over your code base which is typically the case with global objects.

Fun example: Previous place I worked had a system to manage sharing system textures/render targets. This system was global. It was written by a junior with little experience and it was a mess. I designed and wrote a system to replace it (plus do more) which was not global (because the bloody thing was only used in the renderer anyway). Once the replacement was completed it took a couple of days to unwire the old system which had gotten everywhere. The new system was faster, cleaner, had more functionality and never once had a single bug tracked back to it. Nor did it have loads of 'get and set' functions.

granted , many might get inlined, but i just bypass them and do the assignments directly, IE:

And wasn't it you who not long ago had to make a chance which required searching all over his code base to do because the thing you were changing was accessed from so many places? 

think about it, if you were going to code breakout or galaga, or space invaders or missile command or pong right quick and dirty, you wouldn't break out unity, and start creating CES systems and whatnot - its overkill: "using a tank to squash an ant" as they used to say in design sciences. you'd load up a couple bitmaps, and declare a few variables, and go for it. especially if you'd done it dozens of times before. for me, writing these sims (other than caveman - its a whole different sort of beast) is kind of like that.

Just because I wouldn't use an CES system doesn't mean I'd go around hardcoding things into arrays in data segments either; no, more likely I would grab existing code to read config files... hell, with projects as trivial as that I'd probably just grab Lua and use that for the logic and just plug it into some C++ framework.

But I'm assuming your 'sims' are more detailed than the trivial examples you gave so that is again not a sane comparison nor one which is representative of the scale of things.

The point is, regardless of the scale of things I would spare some brain time to do it properly because you don't know where things are going to go and spinning a few brain cells to correctly split up code rather than vomiting out some monstrosity is the way I will do things.

By all means continue developing and coding as you've done forever... it's no skin of my nose and when I do see your code I get a wonderful amusement out of it... but at the same time don't pretend it is in any way, shape, or form 'good practise' to do things your way because it simply isn't.

#5201329 do most games do a lot of dynamic memory allocation?

Posted by phantom on 02 January 2015 - 10:43 AM

my same exact question.  from the video that inspired the post, it seems as though newing and deleting things like entities, dropped objects, and projectiles was something that games in general were commonly doing on the fly as needed. which struck me as inefficient and error prone.

Except in most cases the new/delete is going via a pre-allocated heap so the cost of creating and deleting isn't that great and depends on the nature of the thing being created/destroyed and most of the overhead is going to be in object deinit where it's child objects are being cleaned up, which unless you are leaving your objects in some kind of zombie state you should be doing anyway and the cost is practically identical.

As mentioned it depends on the things being allocated and how of course; big game entities are infrequently allocated and de-allocated so a sensible clean up via a new/delete (placement, in the C++ case) isn't going to cost you much in overall run time. Transient objects, such a structures which only exist for a frame, are likely to be simple and will more than likely use a simpler allocation scheme ('stack allocation' in the sense that you have a scratch buffer objects a placement-new'd into for ease of construction but never deleted from, the allocation pointer is just reset to the start each frame; useful for things like rendering when you need to build temp data structures).

games taking over memory management when asset size > ram makes much more sense. its what any good software engineer would do.

This has nothing to do with 'asset size > ram'; this is all about keeping things clean. If I don't need that memory allocated then why hang on to it? Why effectively waste it? If you know in your front end you'll need 4Meg for something but only in the front end then you might as well share that pool with the game and overlap the memory usage allowing the system to (OS) to make better use of the physical ram for other things. If you had a 4Meg chunk in the data segment then that 4Meg is now gone forever and while it might not seem much the OS can still make use of it.

If you are allocating a scratch buffer every frame of the same size then just allocate once, at start up, and just reuse it.

i'm saying - that's what i would do.

So why does the allocation speed matter? Most allocations are so infrequent as to not make a difference and frequent ones would be spotted and replaced pretty early on (or not done at all if the developers have any amount of experience).

well, fortunately for me, its not that hard to determine required data structure sizes at compile time. caveman has taken about 2-3 man-years to make. MAXPLAYERS, MAXTARGETS, and MAXPROJECILES have never changed since first being defined.  occasionally i'll up MAXTEXTURES from 300 to 400 etc, as the number of assets grows. but that's about it. and i could have simply declared all those really big, then just right-sized them once before release.

And that's a very small subset of buffers which might exist in a game and a very specific example where you apparently don't care about wastage or good software design (hint: global things are bad software design but having seen your code in the past this doesn't surprise me in the least...) but as a general case solution is does not work and it does not scale. Fixed compile time buffers are the devil; the flexibility gained from just pulling from the heap for a value pulled from a config file far outweighs anything else in the general case.

#5201327 do most games do a lot of dynamic memory allocation?

Posted by phantom on 02 January 2015 - 10:20 AM

quite true, one extra de-reference per access, i believe.  hardly anything to write home about, unless you do a couple million or billion per frame unnecessarily.

I'm intrigued as to where you have pulled this 'extra dereference' from?
A pointer to a chunk of dynamic memory and a pointer to a chunk of memory which came in with the exe are going to be the same...

#5201134 do most games do a lot of dynamic memory allocation?

Posted by phantom on 01 January 2015 - 09:12 AM

Yes, a single allocation every frame can soon add up but my question would be; why are you constantly reallocating anyway?

If you are allocating a scratch buffer every frame of the same size then just allocate once, at start up, and just reuse it.

At which point you might say 'well, why not just allocate a static array and be done with it?' and sure, you could (and for some things this is sane because it is a hard limit which you won't want to change for various reasons) but what happens when that buffer is suddenly too small? You now have to rebuild to test and find a sane number; if you had it in a data file and dynamically allocated it you'd fiddle with one buffer, reload the compiled exe and see if it works. Job done.

There are reasons to allocate statically sized buffers, generally architectural ones such as, on the PS3, references to SPUs or on the 360 a buffer to represent GPU constant slots, but for the sake of one allocation, which can be made at start up, the flexibility far outweighs any perceived advantage of trying to guess the amounts in advance.

As for games, on the ones I've worked on, during the normal runtime phase memory allocation is very light; allocations might get made from existing pools (more than likely by allocating chunks in order and then just resetting the pointer at the start/end of the frame so it doesn't persist). Dynamic allocations tend to only happen when data is streamed in, at which point you have to pay the cost (because it is dynamic), and you can control that so you only service so many requests per frame. (and to be fair, even that can come from pre-allocated pools so you don't have to touch the system allocator and instead can use one of the numerous faster allocators out there)

#5200787 Multi-threading for performance gains

Posted by phantom on 30 December 2014 - 07:39 AM

I would argue that in a game, where you have complete control of things, that isn't a great use either; you'll either over subscribe the cores meaning that you'll run the risk of important tasks getting staved out or at least delayed which hurts the update rate (and wastes resources as the OS has to schedule them in and the resulting overhead of that) OR you'll need to under use them meaning that at any given time resources are sitting idle if they didn't have work to do.

If you've got work to do which isn't quick and needs to be run over a number of frames then write the code in such a way that it can do that so that it uses a fixed amount of time per frame to do some work and allow it to continue next time it is called.

Just because results aren't due in a frame doesn't mean the work can't be broken down to make better use of the cores and control the dispatch of the work.

#5200700 Multi-threading for performance gains

Posted by phantom on 29 December 2014 - 07:14 PM

I'm not sure I buy the latency argument, not when your suggested solution is 'push messages to another thread and have that do some work to kick the read and wait on the result'; the latency difference is unlikely to be all that critical in a game situation anyway when dealing with disk IO which is already pretty high latency.

The solution I came up with, and we are talking a good 3 or 4 years back now so I don't have the code to hand (largely because I'm away from my PC) involved the FileRead family of functions (probably FileReadEx, but I'm not 100% on that).

The code involved was very short, in the order of 10 or 20 lines maybe?, and if memory serves was a case of;
- TTB Task requests chunk of memory to load into and uses FileReadEx to start the async read and record the handle.
- Handles were collected up
- IO check task was used to check the state of the handles (WaitForMultipleObject series, immediate time out)
- For all completed file handles; push tasks into completed task queue for execution

As it was a proof-of-concept it basically only looped on point 3 until all the files were done.

The code really couldn't have been any simpler if I had tried, the most complex bit was setting up the file IO as I seem the recall the docs being a little less than clear at the time; heck that was probably the biggest segment of code in the whole test.

I'd be VERY surprised if you could write something with a lower bug count than that (only likely bugs are going to be post-hand-off in the decoding stage or whatever; memory was owned by a data structure which got passed about so it wasn't like it leaked or anything, was a direct in place load) and I'd be really surprised if the latency was anything to write home about considering that I've got one task on one thread polling at most once a frame.

If I remember I'll try to dig the code up when I get back home, although that's likely only going to happen if this thread is still on the front page as I'm going to be destroying by brain probably 3 more times before I get back to my PC where the code might still live...

#5200473 Is Unity good for learning?

Posted by phantom on 28 December 2014 - 06:58 PM

UDK isn't really a Thing any more; it is no longer being updated.

Instead if you are interested in getting hold of the Unreal Engine to use then you should be looking at UE4 at http://www.unrealengine.com.