Endian. When do you care?

Started by
28 comments, last by griffin2000 17 years, 9 months ago
Quote:Original post by asp_
If I would venture a guess I doubt your bottleneck would ever be that code, no matter if you split it to one call per field. I doubt there's any sane reason to do it except for lazyness.

It's also not just platform specific but compiler specific and affected by a lot of factors such as build settings and so on.

I'd say this is as close to a never as you'll get.


Your ventured guess is wrong. Most console games (at least the ones that have decent load times) do something exactly like this. Data structures are read straight off disk into memory, pointers are fixed up and off you go, loading is nothing more than a sequential read into memory and a pointer fix up pass. Hell, some games even skip the pointer fix-up and read the pointers in unchanged, but that really is playing with fire.

It is platform specific but that's what you have asset pipelines for - to convert your non-platform-specific source assets into very-platform-specific runtime assets that load quickly and are formatted optimally for the target platform.

It may not be 'clean' or very cross-platform but sometimes this kind of thing is necessary for performance, particularly on consoles. If more PC games did things this way maybe we wouldn't see PC games that take way longer to load from hard disk than console games load from optical disks.

Game Programming Blog: www.mattnewport.com/blog

Advertisement
Yeah I overlooked DVD and CD drives where the penalty for a seek and cache misses is extremely high. I'm having some issues seeing how doing large block reads with a high percentage of waste data is efficient when there is a hardware cache which would clearly prevent repeated seeks with each read.

I can see this coming with a fairly large penalty if there is no software caching resulting in data being requested over the bus for each request. I guess older consoles which are RAM starved would generally have a small stream cache.

Seems to me that the best method in this scenario would be to implement a generic container object which serializes to and from memory, reading and writing optimal sizes to disk. It would also mean you do not waste throughput to crap data.

Quote:
If more PC games did things this way maybe we wouldn't see PC games that take way longer to load from hard disk than console games load from optical disks.

I don't think there's any performance difference on systems with larger software caches. Maybe I'm misstaken here?

Quote:
Never simply dump a structure to file like this. Write each field separately because the structure will contain padding between fields in order to align data effectively. This means you'll end up writing crap to the file and the padding may be different between platforms.


This is safe (and the most efficent way of working, as don't have to bother with little reads from disk) as long as you manually pad the structure so all members are aligned to the size of their types. So actually to be safe the structure should be re-written as:


struct MyStruct
{
char name[32];
int a;
float c;<<On some platforms this will align to 32-bit boundary
short b;
char pad[2];<<Explicitally pad to 32-bit boundary just to be ultra safe (don't think this is nessacary)
};



Quote:
So actually to be safe the structure should be re-written as...

That sounds like an even worse idea. Now you're exposing padding to the user of the data structure. What about IA-64? Are you going to include #if defined()'s for all possible platforms?
Quote:Original post by asp_
Yeah I overlooked DVD and CD drives where the penalty for a seek and cache misses is extremely high. I'm having some issues seeing how doing large block reads with a high percentage of waste data is efficient when there is a hardware cache which would clearly prevent repeated seeks with each read.
It's less the seek penalty - if your small reads are sequential, you won't really be seeking around - and more the housekeeping overhead. Checking that you don't need to seek. Packing and sending the read request over the bus. Etcetera.

Quote:I can see this coming with a fairly large penalty if there is no software caching resulting in data being requested over the bus for each request. I guess older consoles which are RAM starved would generally have a small stream cache.
Aye.

Quote:Seems to me that the best method in this scenario would be to implement a generic container object which serializes to and from memory, reading and writing optimal sizes to disk. It would also mean you do not waste throughput to crap data.
So... a software cache [wink] Moving stuff around in memory is going to be faster, sure. But why move it around at all when you can prepare all the data offline? Bandwidth on the consoles we're talking about isn't usually the problem - it's more the latency. Making 1/10th the read requests with 10x the data can still end up faster.

Quote:
I don't think there's any performance difference on systems with larger software caches. Maybe I'm misstaken here?
There's a performance difference - moving stuff around in memory doesn't come for free - but it's not nearly as big as reading field-by-field from a CD/DVD.

Another benefit of single-large-read is asynchronous transfer: after telling it "read N bytes from position X to location Y," you can get on with running a loading screen or FMV or something without needing to micromanage the data. Sure, on some platforms you can achieve that using a seperate thread, but if the hardware itself has support for asynchronous reads from CD/DVD, it's best to use it. Especially if the CPU isn't multi-core, where running a CPU thread for it would cut into whatever else you want to be doing. (I believe the original Xbox is like this).

Richard "Superpig" Fine - saving pigs from untimely fates - Microsoft DirectX MVP 2006/2007/2008/2009
"Shaders are not meant to do everything. Of course you can try to use it for everything, but it's like playing football using cabbage." - MickeyMouse

Quote:Original post by asp_
That sounds like an even worse idea.


Not at all its only a small number of structures and classes that are transfere by disc or network, its pretty easy to ensure these are padded correctly.

Quote:
Now you're exposing padding to the user of the data structure. What about IA-64? Are you going to include #if defined()'s for all possible platforms?


They are problematic whichever route you take. The easiest way is to define explictally sized floats and ints and alway use these... eg..:



struct MyStruct
{
char name[32];
MyInt32 a;
MyFloat32 c;<<On some platforms this will align to 32-bit boundary
MyInt16 b;
char pad[2];<<Explicitally pad to 32-bit boundary just to be ultra safe (don't think this is nessacary)
};

Quote:
So... a software cache ...

Yeah of course, it's a standard way of doing it because it's a good solution. It would basically just be slapping on a buffer to the stream (which should have been there all along imho).

Quote:
There's a performance difference - moving stuff around in memory doesn't come for free.

Copying data from one location to another in RAM isn't exactly critical when loading from a DVD or CD... [grin]

Quote:
Another benefit of single-large-read is asynchronous transfer...

This is probably the largest benefit from a simplicity viewpoint.
Quote:Original post by griffin2000
Quote:Original post by asp_
That sounds like an even worse idea.


Not at all its only a small number of structures and classes that are transfere by disc or network, its pretty easy to ensure these are padded correctly.


That's a whopper of an assumption... in the codebase I'm currently working with I'd estimate we have something in the region of 500 different message-data structures that have to be sent over the network. Most of them are tiny, but padding them all by hand would be a fairly mammoth task.

Richard "Superpig" Fine - saving pigs from untimely fates - Microsoft DirectX MVP 2006/2007/2008/2009
"Shaders are not meant to do everything. Of course you can try to use it for everything, but it's like playing football using cabbage." - MickeyMouse

Quote:Original post by asp_
Quote:
So... a software cache ...

Yeah of course, it's a standard way of doing it because it's a good solution. It would basically just be slapping on a buffer to the stream (which should have been there all along imho).
Aye. Though I wonder if YHO would change if I only gave you a system with 32MB of RAM [wink]

Quote:
Quote:
There's a performance difference - moving stuff around in memory doesn't come for free.

Copying data from one location to another in RAM isn't exactly critical when loading from a DVD or CD... [grin]
Aww. You're not a fan of trying to squeeze the last few drops of blood from a stone, eh? [grin]

Richard "Superpig" Fine - saving pigs from untimely fates - Microsoft DirectX MVP 2006/2007/2008/2009
"Shaders are not meant to do everything. Of course you can try to use it for everything, but it's like playing football using cabbage." - MickeyMouse

Quote:Original post by superpig

That's a whopper of an assumption... in the codebase I'm currently working with I'd estimate we have something in the region of 500 different message-data structures that have to be sent over the network. Most of them are tiny, but padding them all by hand would be a fairly mammoth task.


True... So in that case you would have to rely on packing compiler flags, or write/read them on element at time. But, to me the best solution is to explictally design your streamable structures platform independence in-mind. If you do this, and comment it appropriately, it should help avoid someone creating a structure that no amount of packing flags will help make streamable(such as one with a pointer, VFTP, or bit-field flags).

This topic is closed to new replies.

Advertisement