Jump to content

  • Log In with Google      Sign In   
  • Create Account


Why is hexadecimal used in binary model files?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
20 replies to this topic

#1 gchris6810   Members   -  Reputation: 207

Like
0Likes
Like

Posted 07 July 2013 - 12:42 PM

Hi,

 

I am trying to write a Blender exporter for my game engine's data. When I look at other exporters they often use hexadecimal to identify the different chunks and sub-chunks in the binary file. Why is hexadecimal used and what does it represent in the file?



Sponsor:

#2 Brother Bob   Moderators   -  Reputation: 7929

Like
6Likes
Like

Posted 07 July 2013 - 12:47 PM

There's nothing special with a hexadecimal value, but its representation is convenient because each digit in a hexadecimal number is exactly 4 bits. Thus, each pair of hexadecimal digits represents 8 bits, or exactly a whole byte. That is, the hexadecimal value 0x1234 represents the byte sequence {0x12, 0x34}, assuming big endian storage, but as a value it is no different from the decimal value 4660.


Edited by Brother Bob, 07 July 2013 - 12:48 PM.


#3 Servant of the Lord   Crossbones+   -  Reputation: 18284

Like
6Likes
Like

Posted 07 July 2013 - 02:28 PM

To add onto what Brother Bob says, since memory is laid out mostly as powers of two*, and 16 is a power of two, it makes it very convenient to describe memory-related values in hexadecimal.

 

1 byte = 2 digit hex exactly

2 bytes = 4 digit hex exactly

4 bytes = 8 digit hex exactly

 

Binary would work just as good... but it is too wordy. 11101010 11010101 10010011 11011101 verses just 0xEAD593DD. It's alot more compact to display.

Decimal is also alot more compact to display (3939865565), but it doesn't line up to the powers of two like binary and hexadecimal does.

Some people also use octal (base 8), though I don't see that all too often.

 

The final benefit of hexadecimal is that you can spell words in it: 0xDEADC0DE laugh.png

 

*The basic units of memory (bytes) don't have to be laid out in powers of two, but it almost always is nowadays. Example: 32bit PCs and 64bit PCs. Sure, there's some wierd systems where bytes are 10 bits, but hey,


Edited by Servant of the Lord, 07 July 2013 - 02:32 PM.

It's perfectly fine to abbreviate my username to 'Servant' rather than copy+pasting it all the time.

[Fly with me on Twitter] [Google+] [My broken website]

All glory be to the Man at the right hand... On David's throne the King will reign, and the Government will rest upon His shoulders. All the earth will see the salvation of God.                                                                                                                                                            [Need web hosting? I personally like A Small Orange]
Of Stranger Flames - [indie turn-based rpg set in a para-historical French colony] | Indie RPG development journal


#4 marcClintDion   Members   -  Reputation: 431

Like
0Likes
Like

Posted 07 July 2013 - 03:07 PM

The final benefit of hexadecimal is that you can spell words in it: 0xDEADC0DE

Funny!

 

//-----------------------------------------------------------------------------------------------------------------------------------------------

 

Here are some pages that deal with what you are working on and they use a format that is a little more human readable, I think anyways (decimal).

They aren't entirely complete but they may be the start you need.

 

 

http://38leinad.wordpress.com/2011/11/02/practical-blender-with-glkit-part-2-blender-scripting-with-python/

http://stackoverflow.com/questions/13327379/how-to-export-per-vertex-uv-coordinates-in-blender-export-script


Consider it pure joy, my brothers and sisters, whenever you face trials of many kinds, because you know that the testing of your faith produces perseverance. Let perseverance finish its work so that you may be mature and complete, not lacking anything.


#5 gchris6810   Members   -  Reputation: 207

Like
0Likes
Like

Posted 08 July 2013 - 12:44 AM

Thanks for the great replies. There still is something that I don't get. When hexadecimal is used to describe the chunks of a binary format does it represent the size of each chunk in the memory?



#6 MarkS   Prime Members   -  Reputation: 880

Like
2Likes
Like

Posted 08 July 2013 - 02:43 AM

Binary files don't have "chunks". I'm more than a little confused by that. A binary file is a string of bytes that span the length of the file. The only time I've seen "chunks" was in a human readable format.

Typically a binary file will be formatted to have a header with offsets to the various data and that header will be immediately followed by the file data. What a binary (hexadecimal) string represents to the file loader is entirely defined by the file format.

Here is a binary file. It is a simple 5x5 pixel TGA file. The first 18 bytes are the file header and the image data starts at byte 19 (the 00 immediately following the 08).
00 00 02 00 00 00 00 00 00 00 00 00 05 00 05 00 20 08 00 00 00 FF 00 00 00 FF 00 00 00 FF 00 00 00 FF 00 00 00 FF 00 00 00 FF 00 00 00 FF 00 00 00 FF 00 00 00 FF 00 00 00 FF 00 00 00 FF 00 00 00 FF 00 00 00 FF 00 00 00 FF 00 00 00 FF 00 00 00 FF 00 00 00 FF 00 00 00 FF 00 00 00 FF 00 00 00 FF 00 00 00 FF 00 00 00 FF 00 00 00 FF 00 00 00 FF 00 00 00 FF
The "02" in the header tells the loader that this is an uncompressed, true-color image.
The first "05 00" defines a 16-bit word that tells the loader the width of the image; In this case, 5 pixels.
The second "05 00" defines a 16-bit word that tells the loader the height of the image.
The "20" that follows is 32 in hexadecimal, telling the loader that each pixel in the pixel data is 32-bits (4 bytes).
The "08" is an encoded byte, whose bits tell the loader if the image is flipped as well as the number of alpha channel bits.
The other bytes that I did not explain have special meaning as well, in certain cases, but are unused in this file.
The image data is stored as BGRA (blue, green, red, alpha) and in this case, each pixel has the values "00 00 00 FF".

If you are seeing anything other than hexadecimal in a file, it isn't a binary file.

Edited by MarkS, 08 July 2013 - 02:56 AM.


#7 dave j   Members   -  Reputation: 587

Like
0Likes
Like

Posted 08 July 2013 - 04:51 AM

The final benefit of hexadecimal is that you can spell words in it: 0xDEADC0DE laugh.png

One of the versions of the BBC Micro OS had as the two bytes starting at location 0xD0D0 the value 0xD1ED. laugh.png

#8 gchris6810   Members   -  Reputation: 207

Like
0Likes
Like

Posted 08 July 2013 - 09:20 AM

So how would I know where the different pieces of the file were to reference in hexadecimal surely that would depend on the size of the data in the file.



#9 Brother Bob   Moderators   -  Reputation: 7929

Like
0Likes
Like

Posted 08 July 2013 - 09:26 AM

Sounds like your problem has nothing to do with hexadecimal or any other number base, but with the format itself. What actual data to write is determined by the format you want to read or write. You said you're trying to write an exporter to your own game engine, so you should have a good idea what has to be written to the file. Can you explain mode detailed what your problem actually is?


Edited by Brother Bob, 08 July 2013 - 09:27 AM.


#10 gchris6810   Members   -  Reputation: 207

Like
0Likes
Like

Posted 08 July 2013 - 09:31 AM

Well, my problem is that I need to be able to load model data into my engine (vertices, uvs, lighting attributes, textures) but I also want skeletal animation and most of the formats I found that supported animation were either too bloated (Collada, FBX) and not really suitable for game use or text based (MD5) which I don't really want. So I resolved to make my own (binary) format and after looking at a few example python exporters (3DS, FBX) I found hex was often used and I needed to know exactly how the exporter worked to be able to produce one myself that wasn't just a copy.

 

I hope this is enough explanation. 



#11 Brother Bob   Moderators   -  Reputation: 7929

Like
0Likes
Like

Posted 08 July 2013 - 09:58 AM

As stated earlier, a hexadecimal value is nothing more than a value. You can use decimal values instead if you like, there is nothing special with hexadecimal values, just their textual representation.

 

But as I suspected, it sounds like your question is not why hexadecimal values are used, but why specific hexadecimal values are used. It is not a question why for example 0x1234 is used instead of 4660, but why the value 0x1234, or equivalently 4660, was used in the first place and what it actually means. That can only be determined by studying the format the code is written to export which states what data has to be written where in the file.



#12 gchris6810   Members   -  Reputation: 207

Like
0Likes
Like

Posted 08 July 2013 - 10:04 AM

Okay thanks. I think as you said i'll look further into other file formats to ascertain exactly how a binary file is made.


Edited by gchris6810, 08 July 2013 - 10:04 AM.


#13 Servant of the Lord   Crossbones+   -  Reputation: 18284

Like
1Likes
Like

Posted 08 July 2013 - 12:10 PM

Binary files don't have "chunks". I'm more than a little confused by that. A binary file is a string of bytes that span the length of the file. The only time I've seen "chunks" was in a human readable format.

The term 'chunks' or 'sections' are used to describe different portions of some binary file formats. In your TGA file example, the file format might call the header portion of bytes the "header chunk", and the pixel portion of bytes, the "pixel chunk" (for example).
Some binary file formats even allow their 'chunks' to be put in whatever order, but use code values to identify them so you know how to process them.

It's perfectly fine to abbreviate my username to 'Servant' rather than copy+pasting it all the time.

[Fly with me on Twitter] [Google+] [My broken website]

All glory be to the Man at the right hand... On David's throne the King will reign, and the Government will rest upon His shoulders. All the earth will see the salvation of God.                                                                                                                                                            [Need web hosting? I personally like A Small Orange]
Of Stranger Flames - [indie turn-based rpg set in a para-historical French colony] | Indie RPG development journal


#14 MarkS   Prime Members   -  Reputation: 880

Like
1Likes
Like

Posted 08 July 2013 - 12:17 PM

Binary files don't have "chunks". I'm more than a little confused by that. A binary file is a string of bytes that span the length of the file. The only time I've seen "chunks" was in a human readable format.

The term 'chunks' or 'sections' are used to describe different portions of some binary file formats. In your TGA file example, the file format might call the header portion of bytes the "header chunk", and the pixel portion of bytes, the "pixel chunk" (for example).
Some binary file formats even allow their 'chunks' to be put in whatever order, but use code values to identify them so you know how to process them.


I thought of that, but it didn't seem to be what he is asking. I'm more certain that he is looking at a human readable file format with hexadecimal tags.

#15 gchris6810   Members   -  Reputation: 207

Like
0Likes
Like

Posted 08 July 2013 - 01:23 PM

No I don't think that's what i'm looking for. The file format I was referencing was the 3DS file format which, now i have looked into it seems to be quite unique in using chunks identified by hexadecimal. What I was trying to find out was why certain hex numbers were used. In the 3DS file format spec the primary chunk is 0x4D4D which seems quite random. Why is this used? Here is the link to the spec if necessary http://www.martinreddy.net/gfx/3d/3DS.spec. Thanks.



#16 Brother Bob   Moderators   -  Reputation: 7929

Like
2Likes
Like

Posted 08 July 2013 - 01:40 PM

The document you linked is not the specification itself, but an apparently reverse-engineered documentation. It explicitly says in the link that the specification has (at the time) not been released.

 

If you want to know the decision behind the choice of using the value 19789 for the main chunk, then, if it's not written anywhere, you have to ask the authors of the original format (and as I implied from first paragraph, the link is not the specification and thus not the original authors). Perhaps there was a reason for that particular value, or perhaps it was just random.

 

Now, some formats do encode properties in the chunk name, so it is not an unreasonable question to ask. But the specification should note that if the information is important. Otherwise the number is just arbitrary.



#17 MarkS   Prime Members   -  Reputation: 880

Like
1Likes
Like

Posted 08 July 2013 - 01:42 PM

I see now.

These values may be arbitrary, or more likely, they are chosen because they are unlikely/less likely to show up as data values.

[edit]
I was doing several things at once and typed slowly. Brother Bob beat me to it.ph34r.png


Edited by MarkS, 08 July 2013 - 01:43 PM.


#18 Bregma   Crossbones+   -  Reputation: 4873

Like
1Likes
Like

Posted 08 July 2013 - 01:46 PM

While the numbers do look fairly arbitrary, they do seem systematic.  I'd like to point out that the code for the main chunk (0x4d4d) is 'MM' in ASCII.  Some other codes are likewise two-letter ASCII combinations, but not all are.


Stephen M. Webb
Professional Free Software Developer

#19 Servant of the Lord   Crossbones+   -  Reputation: 18284

Like
1Likes
Like

Posted 08 July 2013 - 03:10 PM

Looking here, assuming this is the same format you're talking about, it seems some of the numbers are spaced so as to provide future expansion or revisions.

 

For example:

0x4000 // Object Block
│  │  ├─ 0x4100 // Triangular Mesh
│  │  │  ├─ 0x4110 // Vertices List
│  │  │  ├─ 0x4120 // Faces Description
│  │  │  │  ├─ 0x4130 // Faces Material
│  │  │  │  └─ 0x4150 // Smoothing Group List
│  │  │  ├─ 0x4140 // Mapping Coordinates List
│  │  │  └─ 0x4160 // Local Coordinates System
│  │  ├─ 0x4600 // Light
│  │  │  └─ 0x4610 // Spotlight
│  │  └─ 0x4700 // Camera

You'll notice they leave spaces for future blocks, and I'd bet that the lowest digit (0x414X) is for future versions of the same block.

Once they start putting out files with numbers, those numbers are locked in stone. They can add new numbers, but they can't reuse old numbers or it'll break backwards compatibility. If they need an arbitrary number for identification, e.g. to identify a "light" chunk (0x4600) of the object block (0x4000), then they might as well space out their numbers enough to leave room for additional chunk types and additional subchunks (like Spotlight - 0x4610) and versions (0x4610). I'm speculating that the final digit is for versions.

 

See FourCC (did someone already post that? I thought someone did). 

"In 1985, Electronic Arts introduced the Interchange File Format (IFF) meta-format (family of file formats), originally devised for use on the Amiga. These files consisted of a sequence of "chunks" which could contain arbitrary data, each chunk prefixed by a four-byte ID. The IFF specification explicitly mentions that the origins of the FourCC idea lie with Apple.

...

Other file formats that make important use of the four-byte ID concept are the Standard MIDI File Format, the PNG image file format, the 3DS (3D Studio Max) mesh file format and the ICC profile format."

 

It also says, "Four byte identifiers are useful because they can be made up of four human-readable characters with mnemonic qualities, while still fitting in the four byte memory space typically allocated for integers in 32-bit systems (although endian issues may make them less readable). Thus, the codes can be used efficiently in program code as integers as well as giving cues in binary data streams when inspected."

 

So, while observing the data of the binary files in a hex editor, it's easier to visually see the chunk identifiers. See 0xDEADC0DE again. wink.png

This kind of thing is useful for debugging. Imagine trying to figure out where you are in a pile of hex values in RAM in Microsoft Visual Studio, and suddenly you see 0xBAADF00D. You know the memory was A) Allocated on the heap and B) never initialized properly.


Edited by Servant of the Lord, 08 July 2013 - 03:20 PM.

It's perfectly fine to abbreviate my username to 'Servant' rather than copy+pasting it all the time.

[Fly with me on Twitter] [Google+] [My broken website]

All glory be to the Man at the right hand... On David's throne the King will reign, and the Government will rest upon His shoulders. All the earth will see the salvation of God.                                                                                                                                                            [Need web hosting? I personally like A Small Orange]
Of Stranger Flames - [indie turn-based rpg set in a para-historical French colony] | Indie RPG development journal


#20 kburkhart84   Members   -  Reputation: 1600

Like
0Likes
Like

Posted 08 July 2013 - 03:57 PM

An example of binary models I've worked with in the past is the MD2 model format, used originally by Quake 2.  The "header" section for all of the models would start with 4 bytes, which is a single int, and as 4 characters was "IDP2".  ID and '2' make sense, but I don't know what the P was for.  Then the next 4 bytes must be an integer value '8'.  This supposedly is the MD2 version, but I don't think it was ever used for anything.  Then, other parts of the header are things like the number of frames, uvs, vertices, etc...  Also included are offsets, that say how far to move the "file cursor" to get to certain things in the file.

 

I've also worked a bit with the OBJ file format.  It is a text format, but you can learn something from it, which could apply to binary formats.  Instead of having offsets to "chunks", each line simply has a one or two letter intro that says what the line is.  So 'v' is vertex, 'vt' is uv coordinate, and 'vn' is a normal.  These things are basically a list of vertices, etc... and then you have a list of faces, which index into that list, so a triangle could be 5/4/1, 6/2/3, 4/1/2, which means that the vertex positions would be the 5th, 6th, and 4th vertex in the list('v'), and then the uvs would be the 4th, 2nd, 1st set in the list(vt), and then the normals would be 1st, 3rd, 2nd, in the list(vn).  Then this would be the manner you would construct the list.

 

I have also created a bit of software for GameMaker, which converts a series of OBJ files into my own binary format.  The "header" simply says how many frames the file has, and how many faces there are in each one.  Instead of storing a series of vertices, I store the faces themselves, so the file may be slightly larger than it has to be, but it is easier to read and convert into data for GameMaker.  Instead of using offsets, you simply read the amount of bytes you need, and so each frame follows the previous in the binary file.

 

The thing to understand about file formats is that they can contain basically whatever you need.  You choose whatever you want to be in them, and as long as whoever is doing the reading knows the format, they should be able to read it.  Some formats store things based on offsets, while others simply assume that you are going to read through the whole thing, and would do so sequentially and therefore not need offsets.

 

My honest opinion is that you are likely better off just using a known format, the one that most fits your needs.  I'm not saying to blatantly waste space, but in modern PCs a bit of waste in media files won't hurt anything most likely.  And you'll save in the long run by saving your time, which could be better spent on actual game design then on creating file formats and exporters.








Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS