Sign in to follow this  
monid233

Convert hex to int

Recommended Posts

Hello, I am attempting to convert hex to an int.
I did some research, but have only found a way to convert, for example, [code]string HEX = “fffefffe“;[/code] to an int.
However, I need to convert hex in form, for example, [code]string HEX = “\x15\x03”;[/code] to an int.

I did some experimenting and came up with the following code:
[code]
int hex_to_int(string HEX) //Converts hex to an integer.
{
int x = 0;

char* _hex = new char[HEX.size()];

for(int i = 0; i < HEX.size(); i++)
{
_hex[i] = HEX[i];
}
x = (int)*_hex;

delete _hex;

return x;
}[/code]
It is able to convert the first part, but I need to convert all of [code]string = "\x15\x03";[/code] and not just "\x15".

How would I go about accomplishing my goal?
Thanks in advance!

Share this post


Link to post
Share on other sites
[quote name='monid233' timestamp='1307129167' post='4819191']
Hello, I am attempting to convert hex to an int.
[/quote]

It's good to get this stuff straight early on. Hexadecimal is merely one of many visual representations of an integer. The representation you're most familiar with is decimal (base10). 0x0A (hex) and 10 (decimal) are the same number -- the same 'int'.

I'm sure most people will understand what you mean, but your initial statement ("I am attempting to convert hex to an int") doesn't really make sense. Take some time to understand this, it will serve you well!

You appear to be using std::string to hold an array of bytes, rather than some text. While this is certainly do-able, the semantics of the word "string" in programming probably don't match up too well with what you're trying to do with the content of that container. I'd suggest switching to something like std::vector<unsigned char> instead.

Anyway, given an array of bytes, whether it be in a string, or a vector, or whatever else, the answer is to use memcpy:

[code]
#include <iostream>
#include <cstring>
#include <string>

int main()
{
const std::string s1 = "\x3F\xD0\x21\x4A";
const unsigned s2 = 0x4A21D03F; // NOTE: I'm on a little endian machine where sizeof(unsigned) == 4 and CHAR_BIT == 8. You probably are too :)


unsigned s1_as_int = 0;
std::memcpy(&s1_as_int, s1.c_str(), 4);

std::cout << (s2 == s1_as_int) << '\n';

return 0;
}
[/code]

However! I suspect what you've done is read in data from a file in to a string and are subsequently attempting to convert it to properly typed and structured data.

If you have a "binary file", it might be easier to read it piece by piece in to ints and other datatypes. This assumes that the machine and compiler that wrote the file matches your machine and compiler, but if you provide a little more information some better targeted advice can be offered in this area, I'm sure. So what's your higher-level goal?

Share this post


Link to post
Share on other sites
string hex = "\0x15\x14\0x13" is basically 3 chars in a row, so you'd do something similar to:

unsigned i = 0;

for (size_t ct = 0; ct < hex.size(); ++ct)
{
i <<= 8;

i += hex[ct];
}

This is just off of the top of my head, and hex.size() cannot be bigger than sizeof(unsigned).

Share this post


Link to post
Share on other sites
[font="arial, verdana, tahoma, sans-serif"][size="2"][quote name='edd²' timestamp='1307130445' post='4819196']
It's good to get this stuff straight early on. Hexadecimal is merely one of many visual representations of an integer. The representation you're most familiar with is decimal (base10). 0x0A (hex) and 10 (decimal) are the same number -- the same 'int'.

I'm sure most people will understand what you mean, but your initial statement ("I am attempting to convert hex to an int") doesn't really make sense. Take some time to understand this, it will serve you well![/quote]
Thanks for clearing that up![/size][/font]
You have guessed correctly, I am early on in this ;)


[quote name='edd²' timestamp='1307130445' post='4819196']
You appear to be using std::string to hold an array of bytes, rather than some text. While this is certainly do-able, the semantics of the word "string" in programming probably don't match up too well with what you're trying to do with the content of that container. I'd suggest switching to something like std::vector<unsigned char> instead.[/quote]
I realized string isn't the best for this, but...

[quote name='edd²' timestamp='1307130445' post='4819196']
1. However! I suspect what you've done is read in data from a file in to a string and are subsequently attempting to convert it to properly typed and structured data.

2. If you have a "binary file", it might be easier to read it piece by piece in to ints and other datatypes. This assumes that the machine and compiler that wrote the file matches your machine and compiler, but if you provide a little more information some better targeted advice can be offered in this area, I'm sure. So what's your higher-level goal?[/quote]
1. You are correct.

2. My higher goal is to create a tool that extracts and recompiles the data files for a 12 year old game.

[quote name='edd²' timestamp='1307130445' post='4819196']
Anyway, given an array of bytes, whether it be in a string, or a vector, or whatever else, the answer is to use memcpy:

[code]
#include <iostream>
#include <cstring>
#include <string>

int main()
{
const std::string s1 = "\x3F\xD0\x21\x4A";
const unsigned s2 = 0x4A21D03F; // NOTE: I'm on a little endian machine where sizeof(unsigned) == 4 and CHAR_BIT == 8. You probably are too :)


unsigned s1_as_int = 0;
std::memcpy(&s1_as_int, s1.c_str(), 4);

std::cout << (s2 == s1_as_int) << '\n';

return 0;
}
[/code][/quote]
I tried this code. It worked!
[code]

int hex_to_int(const string HEX) //Converts hex to an integer.
{

unsigned x = 0;
memcpy(&x, HEX.c_str(), HEX.size());


return x;
}[/code]Thank you very much!




[quote name='bbobak' timestamp='1307130612' post='4819197']
string hex = "\0x15\x14\0x13" is basically 3 chars in a row, so you'd do something similar to:

unsigned i = 0;

for (size_t ct = 0; ct < hex.size(); ++ct)
{
i <<= 8;

i += hex[ct];
}

This is just off of the top of my head, and hex.size() cannot be bigger than sizeof(unsigned).
[/quote]
Unfortunately, this doesn't work.
"\x15\x03" should convert into 789. When I ran my program with this code, it gave me 5379.




Thank you all very much!

Don't mind if I follow this up with a question concerning how to convert, say, 789 back into "\x15\x03"?

Share this post


Link to post
Share on other sites
[quote name='monid233' timestamp='1307133924' post='4819215']
I tried this code. It worked!
[code]

int hex_to_int(const string HEX) //Converts hex to an integer.
{

unsigned x = 0;
memcpy(&x, HEX.c_str(), HEX.size());


return x;
}[/code]Thank you very much!
[/quote]
I'm glad. But take care to understand that this will fall apart if HEX.size() != sizeof(unsigned). If HEX contained a long string chars, for example, the result of that function call would be disastrous as the memcpy call would copy bytes over memory outside of x -- who knows where?! Ideally, when copying from an array of bytes in to some other primitive data type using memcpy, the third argument should be "sizeof(x)" if the first argument is "&x". This ensures that memcpy can't possibly write to places it's not supposed to. I should have done this in my initial example, really :/

For each chunk of bytes that you wish to 'decode' in to a primitive data type, you must know the number of bytes in that chunk and a C(++) data type that is the same size as that chunk.

I also briefly mentioned the phrase "little endian" in my last post. It would be worth spending a little bit of time understanding what big endian and little endian mean, and how they relate to reading in so-called binary data. Feel free to ask, but google will probably get you rather far.

The other thing I mentioned is that in C or C++, you can use fread() and std::istream::read() to directly read the bytes from a file in to a primitive data type.

For example:

[code]
unsigned get_unsigned_from_file(const std::string &filename)
{
unsigned ret = 0;

// TODO: add error checking!
std::ifstream in(filename.c_str(), std::ios::binary);
in.read(reinterpret_cast<char *>(&ret, sizeof(ret));
return ret;
}
[/code]

An equivalent using fread(), which can be used in C as well:

[code]
unsigned get_unsigned_from_file(const char *filename)
{
unsigned ret = 0;
FILE *fptr = fopen(filename, "rb");

if (fptr)
{
/* TODO: add error checking! */
fread(&ret, 4, 1, fptr);
fclose(fptr);
}
return ret;
}
[/code]


[quote]
Don't mind if I follow this up with a question concerning how to convert, say, 789 back into "\x15\x03"?
[/quote]
It already is! If you don't understand what I mean by that, ask :)

If you want to put the value in to an array of bytes e.g. std::vector<unsigned char>, make sure the vector is big enough and then use memcpy to copy the data over [i]the elements[/i] of the vector (not the over the vector itself).

Share this post


Link to post
Share on other sites
[font="arial, verdana, tahoma, sans-serif"][size="2"][quote name='edd²' timestamp='1307135040' post='4819221']
I'm glad. But take care to understand that this will fall apart if HEX.size() != sizeof(unsigned). If HEX contained a long string chars, for example, the result of that function call would be disastrous as the memcpy call would copy bytes over memory outside of x -- who knows where?!
[/quote]
Don't worry, the tool checks to make sure it is a valid data file for this game, so it'll fall apart only if something messed up badly.
Or did you mean that this could fall apart if the compiler uses different sizes for the involved data types?


[quote name='edd²' timestamp='1307135040' post='4819221']
1. For each chunk of bytes that you wish to 'decode' in to a primitive data type, you must know the number of bytes in that chunk and a C(++) data type that is the same size as that chunk.

2. I also briefly mentioned the word "little endian" in my last post. It would be worth spending a little bit of time understanding what big endian and little endian mean, and how they relate to reading in so-called binary data. Feel free to ask, but google will probably get you rather far.
[/quote]
1. Well, the only chunks of bytes I need to decode are three parts of the header..

2. Yes, I know what little and big endian are :)


[quote name='edd²' timestamp='1307135040' post='4819221']
The other thing I mentioned is that in C or C++, you can use fread() and std::istream::read() to directly read the bytes from a file in to a primitive data type.

For example:

[code]
unsigned get_unsigned_from_file(const std::string &filename)
{
unsigned ret = 0;

// TODO: add error checking!
std::ifstream in(filename.c_str(), std::ios::binary);
in.read(reinterpret_cast<char *>(&ret, sizeof(ret));
return ret;
}
[/code]

An equivalent using fread(), which can be used in C as well:

[code]
unsigned get_unsigned_from_file(const char *filename)
{
unsigned ret = 0;
FILE *fptr = fopen(filename, "rb");

if (fptr)
{
/* TODO: add error checking! */
fread(&ret, 4, 1, fptr);
fclose(fptr);
}
return ret;
}
[/code]
[/quote]
I'll have to read up on fread...
I'm using this for reading in files:
[code]
FILE* file = fopen("myfile", "r");
int c = 0;

if(!file)
{
cout << "Failed to load file!\n";
return 0;
}

while (c != EOF)
{
c = getc(file);

//Do stuff.
}

fclose(file);
[/code]
Any chance bytes can be read in directly into primitive data types using this code?



[quote name='edd²' timestamp='1307135040' post='4819221']
It already is! [u]If you don't understand what I mean by that, ask[/u]

If you want to put the value in to an array of bytes e.g. std::vector<unsigned char>, make sure the vector is big enough and then use memcpy to copy the data over the elements of the vector (not the over the vector itself).
[/quote]
I'm asking ;)[/size][/font]

Share this post


Link to post
Share on other sites
[quote name='monid233' timestamp='1307137486' post='4819232']
Don't worry, the tool checks to make sure it is a valid data file for this game,
[/quote]
Ok, but [i]you[/i] need to check that you aren't decoding too much or too little data.

[quote]
so it'll fall apart only if something messed up badly.
[/quote]
There are two kinds of falling apart: crashing, and failing gracefully and (ideally) informing the user with details of the problem. Make sure when things fall apart, they're doing so in the second kind of way.

[quote]
Or did you mean that this could fall apart if the compiler uses different sizes for the involved data types?
[/quote]
That's one way in which things might crash, yes. Bad!

[quote]
[quote name='edd²' timestamp='1307135040' post='4819221']
For each chunk of bytes that you wish to 'decode' in to a primitive data type, you must know the number of bytes in that chunk and a C(++) data type that is the same size as that chunk.
[/quote]
1. Well, the only chunks of bytes I need to decode are three parts of the header..
[/quote]
And do you know the size of each chunk? Have you found C(++) data types that are of an equivalent size? Just making sure you've dotted the 'i's and crossed the 't's...


[quote]
I'll have to read up on fread...
I'm using this for reading in files:
[code]
FILE* file = fopen("myfile", "r");
int c = 0;

if(!file)
{
cout << "Failed to load file!\n";
return 0;
}

while (c != EOF)
{
c = getc(file);

//Do stuff.
}

fclose(file);
[/code]
Any chance bytes can be read in directly into primitive data types using this code?
[/quote]
With some changes, yes. Note that one of my code snippets reads data through a FILE* interface, like yours does. fread() takes a FILE* as one of its arguments.


[quote]
[quote name='edd²' timestamp='1307135040' post='4819221']
It already is! [u]If you don't understand what I mean by that, ask[/u]
[/quote]
I'm asking ;)[/quote]

Memory in all computers (worth talking about here) consists of bytes (usually defined to be 8 bits). To a reasonable approximation, every variable that exists and is used in a C(++) program will be represented by 1 or more bytes somewhere in your computer's memory.

The memcpy function literally takes some bytes from one place and copies them to another. In the first post where I mentioned memcpy, I copied some bytes out of the character buffer of the std::string in to the memory associated with an 'unsigned' variable.

[b]memcpy doesn't change the values of the bytes as it copies[/b].

So let's return to your question:
[quote]
[...] how to convert, say, 789 back into "\x15\x03"?
[/quote]

789 and "\x15\x03" [i]are the same thing[/i]. They are merely different representations of the same value.

If you have a std::string containing "\x16\x03" and an unsigned containing 789 in your program, then somewhere in different parts of your computer's memory, there are two identical copies of the same small byte sequence.

If you want to put the bytes from an unsigned in to a string, you can use memcpy.

Again, I recommend using vector<unsigned char> instead. And please be sure to preallocate enough memory in the vector/string buffer before doing the memcpy operation.

But again, there's a function fwrite(), that does the opposite of fread(), so you might not need to go through an intermediate buffer and may be able to write the bytes out of primitives in to your file directly.

Share this post


Link to post
Share on other sites
Assuming you can rely on the fact that the string will always be \x then 2 decimal digits (being a max of 15)
then this should work:

[code]template <class numberType>
bool NumberFromString(numberType& out, const std::string& s, std::ios_base& (*f)(std::ios_base&) = std::dec)
{
std::istringstream iss(s);
return !(iss >> f >> out).fail();
}

int main()
{

std::string hex("\\x15\\x03\\x01");
std::stack<std::string> numbers;
std::string number;

while(!hex.empty())
{
hex = hex.substr(2);
number = hex.substr(0, 2);
hex = hex.substr(2);
numbers.push(number);
}

int num = 0;
int current = 0;
int multiple = 1;
while(!numbers.empty())
{
NumberFromString<int>(current, numbers.top());
num += current * multiple;
multiple *= 16;
numbers.pop();
}
// "num" should be integer value of the hex string
}[/code]

Not the fastest, no error checking either. The idea is, you convert the long string into a bunch of numbers and put them on a stack, the left (most significant number) most being on the bottom and the right most (least significant) being on the top. The far right number is how many 1s there are (hence multiple starting at 1). So you multiply it by 1 and add it to the total. The next number across is how many 16s there are (multiple is now 16). Next across is how many 256s (1*16*16), next along would be how many 4096s (1*16*16*16) and so on.

Share this post


Link to post
Share on other sites
[quote name='Nanoha' timestamp='1307141295' post='4819245']
[code]
int main()
{

std::string hex("\\x15\\x03\\x01");
...
[/code]
[/quote]
Note that the O.P's string is very different from this, in meaning (number of slashes).

Share this post


Link to post
Share on other sites
[quote name='edd²' timestamp='1307139749' post='4819240']
Ok, but you need to check that you aren't decoding too much or too little data. [/quote]
Luckily, the person who designed this data file seperated these parts with padding :)


[quote name='edd²' timestamp='1307139749' post='4819240']
There are two kinds of falling apart: crashing, and failing gracefully and (ideally) informing the user with details of the problem. Make sure when things fall apart, they're doing so in the second kind of way.

That's one way in which things might crash, yes. Bad![/quote]
Oh boy, error checking. Very important.


[quote name='edd²' timestamp='1307139749' post='4819240']
And do you know the size of each chunk? Have you found C(++) data types that are of an equivalent size? Just making sure you've dotted the 'i's and crossed the 't's...[/quote]
That's a problem. The size of the chunks can be anything. But, as said, these chunks are surrounded by padding, so it's not too hard to tell what the size of the chunks are at runtime.


[quote name='edd²' timestamp='1307139749' post='4819240']
With some changes, yes. Note that one of my code snippets reads data through a FILE* interface, like yours does. fread() takes a FILE* as one of its arguments.
[/quote]
I noticed that. I've yet to take a look at how to tell where fread() should start reading.


[quote name='edd²' timestamp='1307139749' post='4819240']
So let's return to your question:
[quote]
[...] how to convert, say, 789 back into "\x15\x03"?
[/quote]

789 and "\x15\x03" are the same thing. They are merely different representations of the same value.

If you have a std::string containing "\x16\x03" and an unsigned containing 789 in your program, then somewhere in different parts of your computer's memory, there are two identical copies of the same small byte sequence.

If you want to put the bytes from an unsigned in to a string, you can use memcpy.

Again, I recommend using vector<unsigned char> instead. And please be sure to preallocate enough memory in the vector/string buffer before doing the memcpy operation.

But again, there's a function fwrite(), that does the opposite of fread(), so you might not need to go through an intermediate buffer and may be able to write the bytes out of primitives in to your file directly.
[/quote]
Here's the code I tried:
[code]
unsigned number = 728;
std::vector<unsigned char> v;
v.resize(sizeof(number));
memcpy(&v[0], &number, sizeof(number));

std::FILE *file = std::fopen("test.pwp", "w");
std::fwrite(&v, sizeof(unsigned char), v.size() - 1, file);
std::fclose(file);[/code]
It doesn't seem to work properly.



[quote name='Nanoha' timestamp='1307141295' post='4819245']
Assuming you can rely on the fact that the string will always be \x then 2 decimal digits (being a max of 15)
then this should work:

[...]

Not the fastest, no error checking either. The idea is, you convert the long string into a bunch of numbers and put them on a stack, the left (most significant number) most being on the bottom and the right most (least significant) being on the top. The far right number is how many 1s there are (hence multiple starting at 1). So you multiply it by 1 and add it to the total. The next number across is how many 16s there are (multiple is now 16). Next across is how many 256s (1*16*16), next along would be how many 4096s (1*16*16*16) and so on.
[/quote]
Tried it...
Apparently it doesn't accept "\x15\x03", which is what is being read in from the file.
[quote]terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::substr
Aborted (core dumped)
[/quote]
Additionally, I can't rely on the fact that it will always be a max of 15.

Share this post


Link to post
Share on other sites
[quote name='monid233' timestamp='1307144098' post='4819260']
[quote name='edd²' timestamp='1307139749' post='4819240']
Ok, but you need to check that you aren't decoding too much or too little data. [/quote]
Luckily, the person who designed this data file seperated these parts with padding :)
[/quote]
I'm not sure what difference that makes. All it changes is that you now also need to know where the padding is and how much of it there is.

[quote]
[quote name='edd²' timestamp='1307139749' post='4819240']
And do you know the size of each chunk? Have you found C(++) data types that are of an equivalent size? Just making sure you've dotted the 'i's and crossed the 't's...[/quote]
That's a problem. The size of the chunks can be anything. But, as said, these chunks are surrounded by padding, so it's not too hard to tell what the size of the chunks are at runtime.
[/quote]
How can you tell what's padding? If the file format is documented, it should state the number of bytes for each piece of data in the file.

[quote]
Here's the code I tried:
[code]
unsigned number = 728;
std::vector<unsigned char> v;
v.resize(sizeof(number));
memcpy(&v[0], &number, sizeof(number));

std::FILE *file = std::fopen("test.pwp", "w");
std::fwrite(&v, sizeof(unsigned char), v.size() - 1, file);
[/quote]
What's the -1 for, here?

Share this post


Link to post
Share on other sites
[size="2"][font="arial, verdana, tahoma, sans-serif"][size="2"][font="arial, verdana, tahoma, sans-serif"][size="2"][size="2"][quote name='edd²' timestamp='1307150584' post='4819278']
I'm not sure what difference that makes. All it changes is that you now also need to know where the padding is and how much of it there is. [/quote]
I know where it is and how much of it there is.

[quote name='edd²' timestamp='1307150584' post='4819278']
A, B)How can you tell what's padding? C) If the file format is documented, it should state the number of bytes for each piece of data in the file.[/quote]
A) Patterns. I look at several of these files and noticed that the padding is repeated NULLs
B) This file format is simply the playstation version of the data file from the PC game. Which has an extractor/compiler tool.
C) As said, each piece of data's size varies depending on the situation.

[quote name='edd²' timestamp='1307150584' post='4819278']
[quote]
Here's the code I tried:
[code]
unsigned number = 728;
std::vector<unsigned char> v;
v.resize(sizeof(number));
memcpy(&v[0], &number, sizeof(number));

std::FILE *file = std::fopen("test.pwp", "w");
std::fwrite(&v, sizeof(unsigned char), v.size() - 1, file);[/size][/size][/font]
[font="arial, verdana, tahoma, sans-serif"][size="2"][size=2][/code]
[/quote]
What's the -1 for, here?
[/quote]
Oops, put out of habit from accessing arrays.
New code:
[code]
unsigned number = 728;
std::vector<unsigned char> v;
v.resize(sizeof(number));
memcpy(&v[0], &number, sizeof(number));

std::FILE *file = std::fopen("test.pwp", "w");
std::fwrite(&v, sizeof(unsigned char), v.size(), file);
std::fclose(file);
[/code]
Still doesn't work properly.
Expected output (in hex): 15 03
Actual output (in hex): 30 03 D0 00[/size][/size][/font][/size][/font][/size]

Share this post


Link to post
Share on other sites
[quote name='monid233' timestamp='1307151888' post='4819284'][size="2"][font="arial, verdana, tahoma, sans-serif"][size="2"][font="arial, verdana, tahoma, sans-serif"][size="2"][size="2"]
Still doesn't work properly.
Expected output (in hex): 15 03
Actual output (in hex): 30 03 D0 00[/size][/size][/font][/size][/font][/size]
[/quote]

You really need to be more careful with your pointers.

The original code didnt work because you dereferenced a char* and THEN casted to int, which is rather pointless. What you meant to do was: x = *(int*)_hex;

[font="Arial"]And now you use a pointer to the vector object for output instead of a pointer to the actual data of the vector, which is &v[0] (or &*v.begin() if you really want to irritate whoever has to read the code).[/font]

Share this post


Link to post
Share on other sites
[size="2"][font="arial, verdana, tahoma, sans-serif"][size="2"][size="2"][size="2"][font="arial, verdana, tahoma, sans-serif"][size="2"][size="2"][font="arial, verdana, tahoma, sans-serif"][size="2"][size="2"][size="2"][size="2"][font="arial, verdana, tahoma, sans-serif"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][font="arial, verdana, tahoma, sans-serif"][size="2"][size="2"][size="2"][size="2"][font="arial, verdana, tahoma, sans-serif"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][font="arial, verdana, tahoma, sans-serif"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][quote name='Trienco' timestamp='1307165466' post='4819321']
You really need to be more careful with your pointers.

The original code didnt work because you dereferenced a char* and THEN casted to int, which is rather pointless. What you meant to do was: x = *(int*)_hex;

And now you use a pointer to the vector object for output instead of a pointer to the actual data of the vector, which is &v[0] (or &*v.begin() if you really want to irritate whoever has to read the code).[/size]
[/quote]


Actually,
[quote name='monid233' timestamp='1307151888' post='4819284'][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"]
Still doesn't work properly.
Expected output (in hex): 15 03
Actual output (in hex): 30 03 D0 00[/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size]
[/quote][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size]
is intended for the below code:
[quote name='monid233' timestamp='1307151888' post='4819284'][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"][size="2"]
[code]
unsigned number = 728;
std::vector<unsigned char> v;
v.resize(sizeof(number));
memcpy(&v[0], &number, sizeof(number));

std::FILE *file = std::fopen("test.pwp", "w");
std::fwrite(&v, sizeof(unsigned char), v.size(), file);
std::fclose(file);
[/code][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/font][/size][/size][/size][/size][/size][/size][/size][/size][/font][/size][/size][/size][/size][/font][/size][/size][/size][/size][/size][/size][/size][/size]
[/quote]
The above code is an attempt to convert an int, say, 728, into this form: "\x15\x03", save it into a string (or vector of unsigned chars in this case), then finally write that out to the file.[/size][/size][/size][/size][/font][/size][/size][/size][/size][/font][/size][/size][/font][/size][/size][/size][/font]

Anyway, I tried your suggested fix for the original code...It only looked at _hex[0]
Here's the code:
[code] int x = 0;

char* _hex = new char[HEX.size()];
for(int i = 0; i < HEX.size(); i++)
{
_hex[i] = HEX[i];
}
x = *(int*)_hex;


delete _hex;

return x;[/code][/size]

[size="2"]Edit: Now that I take a closer look, "\x15\x03" might be looked at in full...Only it fails to properly convert it.[/size]
[size="2"]output:[quote]10486549[/quote][/size]

Share this post


Link to post
Share on other sites
[quote name='monid233' timestamp='1307151888' post='4819284']
[quote name='edd²' timestamp='1307150584' post='4819278']
A, B)How can you tell what's padding? C) If the file format is documented, it should state the number of bytes for each piece of data in the file.[/quote]
A) Patterns. I look at several of these files and noticed that the padding is repeated NULLs
[/quote]
Can you be sure that what you think is padding might actually be significant? For example, if you're reading a four byte number where the two most significant bytes are zeros, how do you know those zeros won't be used when larger numbers are present at that point in the file?

[quote]
B) This file format is simply the playstation version of the data file from the PC game. Which has an extractor/compiler tool.
C) As said, each piece of data's size varies depending on the situation.
[/quote]
Ok, I'll take your work for it.

[quote]
New code:
[code]
unsigned number = 728;
std::vector<unsigned char> v;
v.resize(sizeof(number));
memcpy(&v[0], &number, sizeof(number));

std::FILE *file = std::fopen("test.pwp", "w");
...
[/code]
[/quote]
In this instance, it would be better to use "wb" and "rb" for the second argument of fopen() (when writing and reading, respectively). You also want to check that opening the file succeeded, but I'll assume you omitted the check for brevity :)

[quote]
[code]
...
std::fwrite(&v, sizeof(unsigned char), v.size(), file);
[/code]
[/quote]
The first argument to this fwrite() call is wrong.

Share this post


Link to post
Share on other sites
Look at your code. You do it right when you memcpy and you do it wrong when you fwrite.


[quote name='monid233' timestamp='1307170601' post='4819335']
[size="2"]Edit: Now that I take a closer look, "\x15\x03" might be looked at in full...Only it fails to properly convert it.[/size]
[size="2"]output:[quote]10486549[/quote][/size]
[/quote]

Well, convert it back to hex and you will find the value you got is 0x1503A0.

In your example the string is only 2 characters, so you get 0x1503XXXX with XXXX being random garbage in memory. To make it work, you need to pad the string to sizeof(int), handle it in a loop if the string is longer than that and be sure to put the padding 0s at the right end depending on your platforms byte order. This also applies to using memcpy (which I would prefer because it avoids alignment issues).

unsigned x = 0; // Important for hex being short than 4
memcpy(&x, hex.c_str(), min( sizeof(unsigned), hex.size() );

This will get much uglier with a different byte order, as now you would have to offset the destination for memcpy by ( sizeof(unsigned) - min(4, hex.size()) ) bytes, (ie. memcpy( ((char*)&x) + offset, ...)

Share this post


Link to post
Share on other sites
[quote name='monid233' timestamp='1307129167' post='4819191']
However, I need to convert hex in form, for example, [code]string HEX = “\x15\x03”;[/code] to an int.[/quote]If that's how it's written then you don't need to convert anything, all you need to do is copy it, or simply cast what is holding it.

Share this post


Link to post
Share on other sites
[quote name='edd²' timestamp='1307182372' post='4819364']
In this instance, it would be better to use "wb" and "rb" for the second argument of fopen() (when writing and reading, respectively). You also want to check that opening the file succeeded, but I'll assume you omitted the check for brevity
[/quote]
Switched to "wb".
Yes, ommited ;)

[quote name='edd²' timestamp='1307182372' post='4819364']
[quote]
[code]
...
std::fwrite(&v, sizeof(unsigned char), v.size(), file);
[/code]
[/quote]
The first argument to this fwrite() call is wrong.
[/quote]
New code:
[code]
unsigned number = 728;
std::vector<unsigned char> v;
v.resize(sizeof(number));
memcpy(&v[0], &number, sizeof(number));

std::FILE *file = std::fopen("test.pwp", "wb");
std::fwrite(&v[0], sizeof(unsigned char), v.size(), file);
std::fclose(file);
[/code]
It works. It converted 728 to D8 02 :)
Hopefully the game will accept that, seeing that it's different from the 15 03 the original file has.


[quote name='Trienco' timestamp='1307182425' post='4819365']
In your example the string is only 2 characters, so you get 0x1503XXXX with XXXX being random garbage in memory. To make it work, you need to pad the string to sizeof(int), handle it in a loop if the string is longer than that and be sure to put the padding 0s at the right end depending on your platforms byte order. This also applies to using memcpy (which I would prefer because it avoids alignment issues).

unsigned x = 0; // Important for hex being short than 4
memcpy(&x, hex.c_str(), min( sizeof(unsigned), hex.size() );

This will get much uglier with a different byte order, as now you would have to offset the destination for memcpy by ( sizeof(unsigned) - min(4, hex.size()) ) bytes, (ie. memcpy( ((char*)&x) + offset, ...)
[/quote]
So...this?
[code]
int hex_to_int(const string HEX)
{
string _hex = HEX;

if(_hex.size() < sizeof(unsigned))
{
for(int i = _hex.size() - 1; i < sizeof(unsigned); i++)
{
_hex[i] = '\x00';
}
}

unsigned x = 0;
memcpy(&x, _hex.c_str(), _hex.size());

return x;
}
[/code]


[quote name='iMalc' timestamp='1307217743' post='4819504']
[quote name='monid233' timestamp='1307129167' post='4819191']
However, I need to convert hex in form, for example, [code]string HEX = “\x15\x03”;[/code] to an int.[/quote]If that's how it's written then you don't need to convert anything, all you need to do is copy it, or simply cast what is holding it.
[/quote]

So I could just do [code]unsigned x = HEX;[/code]?
Or [code]unsigned x = (unsigned) HEX.c_str();[/code]?

Share this post


Link to post
Share on other sites
[quote name='monid233' timestamp='1307222578' post='4819527']
So I could just do [code]unsigned x = HEX;[/code]?
[/quote]
The above should give you a compiler error.

[quote]
Or [code]unsigned x = (unsigned) HEX.c_str();[/code]?
[/quote]
Almost. If we're sure that HEX.size() == sizeof(unsigned), or we only need the first sizeof(unsigned) bytes in HEX, then we can do this:
[code]
unsigned x = *reinterpret_cast<const unsigned *>(HEX.c_str());
[/code]

So, we cast the pointer-to-const-char to a pointer-to-const-unsigned and then dereference the result to get an unsigned. I've opted to use reinterpret_cast<> as opposed to a C-style cast as it makes clear the reason for the cast. The C-style cast you wrote (once corrected) would also cast away the const-ness of the pointee unnecessarily, without allowing the compiler to warn you of the mistake.

With all that said, I tend to prefer memcpy in situations like this as it helps to steer clear of aliasing issues.

Share this post


Link to post
Share on other sites
[quote name='edd²' timestamp='1307225642' post='4819551']
Almost. If we're sure that HEX.size() == sizeof(unsigned), or we only need the first sizeof(unsigned) bytes in HEX, then we can do this:
[code]
unsigned x = *reinterpret_cast<const unsigned *>(HEX.c_str());
[/code]

So, we cast the pointer-to-const-char to a pointer-to-const-unsigned and then dereference the result to get an unsigned. I've opted to use reinterpret_cast<> as opposed to a C-style cast as it makes clear the reason for the cast. The C-style cast you wrote (once corrected) would also cast away the const-ness of the pointee unnecessarily, without allowing the compiler to warn you of the mistake.

With all that said, I tend to prefer memcpy in situations like this as it helps to steer clear of aliasing issues.
[/quote]



That means I can safely use either [code]unsigned x = *reinterpret_cast<const unsigned *>(HEX.c_str());[/code]
or the code you gave me earlier for the hex_to_int() function?

Share this post


Link to post
Share on other sites
[quote name='monid233' timestamp='1307226098' post='4819554']
That means I can safely use either [code]unsigned x = *reinterpret_cast<const unsigned *>(HEX.c_str());[/code]
or the code you gave me earlier for the hex_to_int() function?
[/quote]

If by "safely" you mean taking care to ensure we don't read/copy more bytes than we've got or write to any location that we aren't sure we control, then yes, either approach will work. If you don't understand why, please feel free to ask (but state which part you don't understand!).

The hex_to_int() function that you came up with earlier in post #4 should use the result of a sizeof expression in the third argument of the memcpy() call. I mentioned that earlier, but I want to stress the [i]safely[/i] qualification. You're not going to blow up your computer or anything :) but bugs caused by being "unsafe" in these areas are incredibly hard to track down.

Share this post


Link to post
Share on other sites
[quote name='edd²' timestamp='1307227714' post='4819565']
If by "safely" you mean taking care to ensure we don't read/copy more bytes than we've got or write to any location that we aren't sure we control, then yes, either approach will work. If you don't understand why, please feel free to ask (but state which part you don't understand!).[/quote]
Safely as in, "my computer won't blow up and no bugs will be caused by the use of this".

[quote name='edd²' timestamp='1307227714' post='4819565']
The hex_to_int() function that you came up with earlier in post #4 should use the result of a sizeof expression in the third argument of the memcpy() call. [/quote]

This is how it should be written?
[code]
memcpy(&x, HEX.c_str(), sizeof(unsigned));[/code]

[quote name='edd²' timestamp='1307227714' post='4819565']
I mentioned that earlier, but I want to stress the [i]safely[/i] qualification. You're not going to blow up your computer or anything :) but bugs caused by being "unsafe" in these areas are incredibly hard to track down.
[/quote]
This part is a bit unclear to me. Are you implying that

[code]
memcpy(&x, HEX.c_str(), HEX.size());[/code]
is unsafe?

Share this post


Link to post
Share on other sites
[quote name='monid233' timestamp='1307228420' post='4819566']
This is how it should be written?
[code]
memcpy(&x, HEX.c_str(), sizeof(unsigned));[/code]
[/quote]
Yes, but you *must* make sure that HEX.size() >= sizeof(unsigned), otherwise you'll be reading data from no man's land.

[quote]
Are you implying that

[code]
memcpy(&x, HEX.c_str(), HEX.size());[/code]
is unsafe?
[/quote]
Yes. HEX is an std::string. It could contain 10000 characters. Writing 10000 characters over a 4 byte variable means you scribble over 9996 bytes of data outside of x. Who knows what those bytes were being used for? If you're lucky your program will crash. If you're unlucky it'll continue to run but behave strangely, crashing 3 minutes later in a seemingly unrelated piece of code.

Share this post


Link to post
Share on other sites
[quote name='edd²' timestamp='1307229016' post='4819571']
Yes, but you *must* make sure that HEX.size() >= sizeof(unsigned), otherwise you'll be reading data from no man's land.[/quote]
[code]

string _hex = HEX;

if(_hex.size() < sizeof(unsigned))
{
for(int i = _hex.size() - 1; i < sizeof(unsigned); i++)
{
_hex[i] = '\x00';
}
}

unsigned x = 0;
memcpy(&x, _hex.c_str(), sizeof(unsigned));

return x;[/code]
Applied the padding Trienco suggested. Better?

[quote name='edd²' timestamp='1307229016' post='4819571']
Yes. HEX is an std::string. It could contain 10000 characters. Writing 10000 characters over a 4 byte variable means you scribble over 9996 bytes of data outside of x. Who knows what those bytes were being used for? If you're lucky your program will crash. If you're unlucky it'll continue to run but behave strangely, crashing 3 minutes later in a seemingly unrelated piece of code.
[/quote]

Brings up recent memories...*shivers*

Share this post


Link to post
Share on other sites
[quote name='monid233' timestamp='1307229736' post='4819578']
Applied the padding Trienco suggested. Better?
[/quote]
Does that yield a value that makes sense? Or would having to add padding imply that the data is malformed?

In other words, if you're expecting to be able to read, say, 4 bytes from a file and there are only 3 left, adding some padding is probably not what you want to do. It makes things well defined as far as the C/C++ languages are concerned, but it might not make any sense algorithmically.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this