4 Bytes into an unsigned short?

Started by
8 comments, last by MrPickle 15 years, 5 months ago
I've been looking at some 3DS loading tutorials and there's something I don't understand or it may just be an error with me trying to convert it into C++ from C. I'll give a lil snippet to make it easier to explain:
struct ChunkInfo {
    unsigned short ID; //2 Bytes, yes?
    int Size; //4 Bytes, yes?
};

//data is std::vector<unsigned char> and is set like this:
//...
		in.seekg(0, std::ios_base::end);
		size = in.tellg();
		in.seekg(0, std::ios_base::beg);

		data.resize(size);
		if(size > 0) {
			in.read((char*)&data[0], sizeof(unsigned char)*size);
			in.close();
		}

//...

ChunkInfo C3DS::getChunkInfo(int offset) {
    std::copy(&data[offset], &data[offset+2], &info.ID); //Read 2 Bytes from data?
    std::copy(&data[offset+2], &data[offset+4], &info.Size); //Read 4 Bytes data?
}
What I don't get is that, info.ID is ment to be something like 0x4D4D or lots of other things like that, isn't 0x4D4D 4 Bytes and info.ID is an unsigned short so it's 2 Bytes?? So info.ID is never going to equal 0x4D4D, it can be 0x4D? but not 0x4D4D??
Here to learn :)
Advertisement
std::copy(&data[offset], &data[offset+2], &info.ID); //Read 2 Bytes from data? Yes, bytes 0 and 1 (but not 2)    std::copy(&data[offset+2], &data[offset+4], &info.Size); //Read 4 Bytes data? No, 2 bytes here also: bytes 2 and 3 (but not 4)


The idiom is std::copy(&data[start],&data[end_that_is_not_included],...);
The idiom is not endian or architecture safe either -- you may want a better tutorial source. Yes, they are reading a different number of bytes into info.Size than info.Size is itself, typically.

Quote:Original post by MrPickle
What I don't get is that, info.ID is ment to be something like 0x4D4D or lots of other things like that, isn't 0x4D4D 4 Bytes

There are two hexadecimal digits in a byte, if you're representing them that way. A byte can store an unsigned number between 0 (0x00) and 255 (0xFF).

So, info.ID can be 0x4D4D just fine -- that fits in 2 bytes. It could be any number between 0x0000 and 0xFFFF. 0x10000 is right out, though.
Quote:Original post by MrPickle
isn't 0x4D4D 4 Bytes and info.ID is an unsigned short so it's 2 Bytes??


No, 0x4D4D is 2 bytes...

One hex position is 0-15 (0-9 then A-F), so two is 16^2 (0-255)...
Whoops, actually, both the lines I added comments to are completely broken. Since std::copy uses iterators, and doesn't do any char casting naughtiness itself, the first line tries to copy two bytes into an array of two unsigned shorts. If you're only seeing one ID, that's not an array, and wondering where the other is comming from... well, now you see why it's completely broken.

The second line tries to copy two bytes into two ints.

Both may cause data corruption and crashing.
Ahh, I see.

Thanks :)

[Edit]
The std::copy errors where my own fault, I was trying to use it instead of memcpy because *I think* it's the C++ version?

Your post also explains why I was only getting a single "4D" as info.ID, how do I make std::copy copy the two bytes into a single unsigned short?
Here to learn :)
Quote:Original post by MrPickle
[Edit]
The std::copy errors where my own fault, I was trying to use it instead of memcpy because *I think* it's the C++ version?

It's the type safe version basically, yes. Since you're trying to do something inherently non type safe, this is one of the very few situations it actually causes a bug instead of removing it. I almost missed it!

Quote:Your post also explains why I was only getting a single "4D" as info.ID, how do I make std::copy copy the two bytes into a single unsigned short?

Honestly, I'd just use memcpy if you plan to just reinterpret bytes like this. Sure, it sticks out like a sore thumb, but that'd kind of be the point :P.

But the C++y equivalent would be:
std::copy(&data[offset], &data[offset+2], reinterpret_cast<char*>(&info.ID));

Of course, there's also the endian independent, safe version:
info.ID = data[0] + data[1]<<8;
Thanks, you've been a great help an cleared lots of stuff up. I think I'll go find some tutorials explaining stuff like this so I'm not running around like a headless chicken.
Here to learn :)
I was just looking at what this actually does:
info.ID = data[0] + data[1] << 8;

with this:
#include <iostream>int main(int argc, char** argv) {	unsigned short us;	unsigned char *hex;	hex = new unsigned char[2];	hex[0] = 0x4D;	hex[1] = 0x4D;	us = hex[0] + hex[1] << 8;	std::cout << std::hex << std::uppercase << us << "\n";	std::cin.get();        delete[] hex;	return 0;}


But it's making us be 0x9A00, not 0x4D4D, how is that endian independent?

[Edited by - MrPickle on November 19, 2008 10:00:46 AM]
Here to learn :)
Quote:Original post by MrPickle
I was just looking at what this actually does:
info.ID = data[0] + data[1] << 8;

with this:
*** Source Snippet Removed ***

But it's making us be 0x9A00, not 0x4D4D, how is that endian independent?


The answer to this is easy: 0x4D + 0x4D = 0x9A.

Alex
I've worked out what it should be to get the wanted result.

info.ID = data[offset] | (data[offset+1] << 8);

I think.

[Edited by - MrPickle on November 19, 2008 3:52:40 PM]
Here to learn :)

This topic is closed to new replies.

Advertisement