Sign in to follow this  
Damocles

2 byte ints from 4 bytes?

Recommended Posts

I'm looking to save a fair amount of data in binary form, and was wondering how I can make the ints be 2 bytes instead of 4, even though I'm on a 64-bit dev machine? If I can save out only 2 byte ints, I'd almost halve the size of the files, which would be very useful. I'm hoping there's some compiler definition I can use? I'm really hoping it's not going to be a case of having to replace every int declaration with a new typedef. I'm using Visual C++ 2003. Or perhaps I can make some new file read/write methods that convert the data on the fly? Is it possible to get the data types from variables, then I could cast them to the appropriate types, and read/write to file?

Share this post


Link to post
Share on other sites
If you really want to control the number of bytes a value takes, int, short etc. won't help much - not in any portable way anyway, so you're better off using typedefs of VS's __int16, __int32 etc (which is also non-portable, but at least it's explicit, and you can change the typedefs if you ever end up compiling on another platform).

What you definitely don't want to do is say to the compiler, "I want all my ints to be only two bytes!" because, assuming they are already four bytes long, anything in any part of any program you write would suddenly find it's using two bytes for ints instead of four... and I guarantee to you that will break stuff.

It sounds like the exact value types you're saving out should be __int16's instead of ints. (Or short, probably). Alternatively, at the time of saving, apply a static_cast<__int16> instead, though you'll need to check the values are definitely in the range -32768 to +32767 or you'll lose information. And, I'd definitely use something like typedef __int16 s16 ("signed 16-bit value", as opposed to u16, "unsigined 16-but value") to make future changes easier.

Your last question might be solved by reflection, but that's a very difficult and involved thing to do in C++. Basically you just want to change your file writing functions to write out the values in the most appropriate size.

Share this post


Link to post
Share on other sites
So in theory I should simply be able to static_cast to __int16 when saving, then when loading, load them into __int16 vars and cast them back to regular ints for use in the rest of the program. The whole -32767 to +32767 won't be a problem - I've never been able to leave the mindset that an int is only 2 bytes anyway, so I always write code thinking that ints are restricted to 2 bytes.

At least that's not as much work as replacing all the int declarations with __int16 :)

Thanks for your help.

Share this post


Link to post
Share on other sites
Why not just store each int as a linked list of bytes! Where the highest bit indicates whether another byte follows or not. So small values can be stored in 1-3 bytes. The drawback is that huge values are store in 4 or even 5 bytes!
This is used in database systems a lot to *compress* integers.

cu,
Chris

Share this post


Link to post
Share on other sites
Quote:
Original post by Christian Weis
Why not just store each int as a linked list of bytes! Where the highest bit indicates whether another byte follows or not. So small values can be stored in 1-3 bytes. The drawback is that huge values are store in 4 or even 5 bytes!
This is used in database systems a lot to *compress* integers.

cu,
Chris


Surely not in a *linked list* (consider how much overhead there is per node), but rather in a "string" (in the ancient C sense) - except that instead of being null terminated, it's "clear high bit terminated" (and the terminator carries information).

And yes, I've used this scheme before (and then found it wasn't really appropriate to my situation). :)

Share this post


Link to post
Share on other sites
Quote:
Original post by Julian90
#include <boost/cstdint.hpp>
// yeah yeah, its the quickest way to make all existing code
// use 16 bits.
#define int int_least16_t;


boost.cstdint


I'm fairly certain it's illegal to #define a reserved keyword (which is not to say some compiler's won't let you do it [rolleyes]).

Share this post


Link to post
Share on other sites
Quote:
Original post by Driv3MeFar
I'm fairly certain it's illegal to #define a reserved keyword (which is not to say some compiler's won't let you do it [rolleyes]).
AFAIK, it's perfectly legal. It's simple text substitution.

Share this post


Link to post
Share on other sites
Quote:
Original post by raz0r
Quote:
Original post by Driv3MeFar
I'm fairly certain it's illegal to #define a reserved keyword (which is not to say some compiler's won't let you do it [rolleyes]).
AFAIK, it's perfectly legal. It's simple text substitution.


Yep reserved words are valid, only syntax is invalid.
For Example: #define { l

Share this post


Link to post
Share on other sites
Quote:
Original post by raz0r
Quote:
Original post by Driv3MeFar
I'm fairly certain it's illegal to #define a reserved keyword (which is not to say some compiler's won't let you do it [rolleyes]).
AFAIK, it's perfectly legal. It's simple text substitution.
It's also a really cool April Fool's :)

Share this post


Link to post
Share on other sites
Quote:
Original post by Julian90
Quote:
Original post by raz0r
Quote:
Original post by Driv3MeFar
I'm fairly certain it's illegal to #define a reserved keyword (which is not to say some compiler's won't let you do it [rolleyes]).
AFAIK, it's perfectly legal. It's simple text substitution.


Yep reserved words are valid, only syntax is invalid.
For Example: #define { l


Hmmm, this GOTW problem says otherwise (see criminal #2).

Share this post


Link to post
Share on other sites
Quote:
Original post by Driv3MeFar
Quote:
Original post by Julian90
Yep reserved words are valid, only syntax is invalid.
For Example: #define { l

Hmmm, this GOTW problem says otherwise (see criminal #2).

In this case GOTW seems to be wrong, or they have a loose definition of "illegal". Remember that the preprocessor processes the #define before the compiler sees the code and does not know about C++ other than delimiting tokens.

This code compiles with no errors in VS 2003.
    #define private public

class foo
{
private:
void private_func()
{
std::cout << "private_func()" << std::endl;
}
};

int main(int argc, char* argv[])
{
foo f;
f.private_func();
}
Anyway, I would fire any programmer that did that, or this:
Quote:
Original post by Julian90
#define int int_least16_t;

Share this post


Link to post
Share on other sites
The typical solution used in such a scenario is the use of a RAM-based file buffer which is written to the file once it is prepared. This makes the data being written explicit as well as minimizing the required number of transfers between the program and the storage device.


#include <stdio.h>
#include <stdlib.h>

#define GetUInt16(Addr) (*Addr << 8 | *(Addr + 1))
#define PutUInt16(Addr, Num) *Addr = (Num >> 8); *(Addr + 1) = Num & 0xFF

int main() {
FILE *FilePtr;
unsigned int FileSize;
unsigned char *FileBuff;
short SomeNum;

// Open the file
FilePtr = fopen("File.dat", "rb");

// Get the file's size
fseek(FilePtr, 0, SEEK_END);
FileSize = ftell(FilePtr);
fseek(FilePtr, 0, SEEK_SET);

// Read all the file data into memory
FileBuff = malloc(FileSize);
fread(FileBuff, 1, FileSize, FilePtr);

// Close the file
fclose(FilePtr);

// Read some data, change it, and put it back
SomeNum = GetUInt16(&FileBuff[2]);
SomeNum /= 7;
PutUInt16(&FileBuff[2], SomeNum);

// Write the output file
FilePtr = fopen("File_Edit.dat", "wb");
fwrite(FileBuff, 1, FileSize, FilePtr);
fclose(FilePtr);

// Free the memory
free(FileBuff);

return 0;
}

Share this post


Link to post
Share on other sites
Quote:
Original post by Zahlman
Quote:
Original post by Christian Weis
Why not just store each int as a linked list of bytes! Where the highest bit indicates whether another byte follows or not. So small values can be stored in 1-3 bytes. The drawback is that huge values are store in 4 or even 5 bytes!
This is used in database systems a lot to *compress* integers.

cu,
Chris


Surely not in a *linked list* (consider how much overhead there is per node), but rather in a "string" (in the ancient C sense) - except that instead of being null terminated, it's "clear high bit terminated" (and the terminator carries information).

And yes, I've used this scheme before (and then found it wasn't really appropriate to my situation). :)


Yes, the term *linked list* was somewhat misleading and ill-used. But you described exactly what I've meant. ;)

cu,
Chris

Share this post


Link to post
Share on other sites
Quote:
Original post by JohnBolton
Quote:
Original post by Driv3MeFar
Quote:
Original post by Julian90
Yep reserved words are valid, only syntax is invalid.
For Example: #define { l

Hmmm, this GOTW problem says otherwise (see criminal #2).

In this case GOTW seems to be wrong, or they have a loose definition of "illegal". Remember that the preprocessor processes the #define before the compiler sees the code and does not know about C++ other than delimiting tokens.
Quote:
C++ Standard, Final Draft, Section 17.3.3.1.1, Paragraph 2
A translation unit that includes a header shall not contain any macros that define names declared or defined in that header. Nor shall such a translation unit define macros for names lexically identical to keywords.

Σnigma

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this