Sign in to follow this  
methinks

turning string into int

Recommended Posts

I feel like an idiot, because I saw the answer here a while ago but now I can't seem to find it... I'm storing data by identifier (using std::map). My problem is that the identifier comes in the form of a string, which (the way I understand it) is a slow way of doing it, because string comparisons take a while... So what I would like to do is to change the string into a numerical value. A 64bit variable should be able to hold the equivalent of 8 chars, which should be sufficient. Now, how would I go about casting that?

Share this post


Link to post
Share on other sites
From a string to an int: atoi( (yourstring)_cstr() )
To a float: atof( (yourstring)_cstr() )
There's also a Boost method but I've never used it before.

The other methods are in this thread.

Share this post


Link to post
Share on other sites
Quote:
Original post by Sol462
From a string to an int: atoi( (yourstring)_cstr() )
To a float: atof( (yourstring)_cstr() )
There's also a Boost method but I've never used it before.

The other methods are in this thread.
I don't think that's the kind of 'casting' he means. I think he wants to map arbitrary strings to integer values which will then serve as keys in a std::map.

Share this post


Link to post
Share on other sites
What your looking for is called a hash function (for a more general "data" to number formula). You won't have an 8 char limit with a hash function and you could use 32 bit numbers which are faster on most systems.

Some info and links on Wikipedia.

Share this post


Link to post
Share on other sites
Quote:
Original post by methinks
I feel like an idiot, because I saw the answer here a while ago but now I can't seem to find it...

I'm storing data by identifier (using std::map). My problem is that the identifier comes in the form of a string, which (the way I understand it) is a slow way of doing it, because string comparisons take a while...

So what I would like to do is to change the string into a numerical value. A 64bit variable should be able to hold the equivalent of 8 chars, which should be sufficient. Now, how would I go about casting that?


One way to do it would be with bitwise operations: Grab each the numeric value of each character in the string, copy it into the lowest byte of the int64, then shift the int64 8 bits and repeat.

Another (easier but uglier) way would be to get a char* to the int64 (using reinterpret_cast) and copy the characters in the string into it, i.e.


const char* mystring = "abcdefgh";
__int64 myint;
char* pint = reinterpret_cast<char*>(&myint);

for ( int i = 0; i < 8; ++i )
pint[i] = mystring[i];





Quote:
Original post by Cocalus
What your looking for is called a hash function.

Some info and links on Wikipedia.


Yeah, a hash function would be a good idea if you want to drop the 8-character limit on keys. :)

Edit: Actually, maybe not, since he's using a map. Collisions between hashes would cause problems. Might as well just use a proper hash table.

Share this post


Link to post
Share on other sites
Quote:
Original post by lancekt
Quote:
Original post by methinks
I feel like an idiot, because I saw the answer here a while ago but now I can't seem to find it...

I'm storing data by identifier (using std::map). My problem is that the identifier comes in the form of a string, which (the way I understand it) is a slow way of doing it, because string comparisons take a while...

So what I would like to do is to change the string into a numerical value. A 64bit variable should be able to hold the equivalent of 8 chars, which should be sufficient. Now, how would I go about casting that?


One way to do it would be with bitwise operations: Grab each the numeric value of each character in the string, copy it into the lowest byte of the int64, then shift the int64 8 bits and repeat.

Another (easier but uglier) way would be to get a char* to the int64 (using reinterpret_cast) and copy the characters in the string into it, i.e.

*** Source Snippet Removed ***


How can you possibly understand std::map, and know enough to use reinterpret_cast, but not be using std::string? :( Also, standard library algorithms roxor:


__int64 firstEightCharactersAsValue(const std::string& s) {
__int64 result;
const char* c = s.c_str();
std::copy(c, c + 8, reinterpret_cast<char*>(&result));
return result;
}

Share this post


Link to post
Share on other sites
Quote:
Original post by Zahlman

How can you possibly understand std::map, and know enough to use reinterpret_cast, but not be using std::string? :( Also, standard library algorithms roxor:


__int64 firstEightCharactersAsValue(const std::string& s) {
__int64 result;
const char* c = s.c_str();
std::copy(c, c + 8, reinterpret_cast<char*>(&result));
return result;
}


Better to use trivial language features to demonstrate trivial concepts, if you ask me. I don't see what the library stuff really adds here, except complexity.

Share this post


Link to post
Share on other sites
What, because manually iterating over things, dealing with memory management on strings (granted there isn't any in the sample code, but presumably the string we're interested in "comes from somewhere" at runtime), etc. is "less complex"? Is this some kind of joke?

Share this post


Link to post
Share on other sites
Quote:
Original post by Zahlman
What, because manually iterating over things, dealing with memory management on strings (granted there isn't any in the sample code, but presumably the string we're interested in "comes from somewhere" at runtime), etc. is "less complex"? Is this some kind of joke?


Any programmer alive can understand iteration over an array. There may be some who are (god forbid) unfamiliar with <algorithm> or even <string>.

Additionally, you are getting a pointer to the raw character data from the std::string in your version. The only real difference is that you have added an extra layer of complexity--the string class, which has no relevance whatever to this discussion.

Share this post


Link to post
Share on other sites
Quote:
Original post by Zahlman
Quote:
Original post by lancekt
Quote:
Original post by methinks
I feel like an idiot, because I saw the answer here a while ago but now I can't seem to find it...

I'm storing data by identifier (using std::map). My problem is that the identifier comes in the form of a string, which (the way I understand it) is a slow way of doing it, because string comparisons take a while...

So what I would like to do is to change the string into a numerical value. A 64bit variable should be able to hold the equivalent of 8 chars, which should be sufficient. Now, how would I go about casting that?


One way to do it would be with bitwise operations: Grab each the numeric value of each character in the string, copy it into the lowest byte of the int64, then shift the int64 8 bits and repeat.

Another (easier but uglier) way would be to get a char* to the int64 (using reinterpret_cast) and copy the characters in the string into it, i.e.

*** Source Snippet Removed ***


How can you possibly understand std::map, and know enough to use reinterpret_cast, but not be using std::string? :( Also, standard library algorithms roxor:


__int64 firstEightCharactersAsValue(const std::string& s) {
__int64 result;
const char* c = s.c_str();
std::copy(c, c + 8, reinterpret_cast<char*>(&result));
return result;
}



Why get the c_str()? Couldn't just do:

__int64 firstEightCharactersAsValue(const std::string& s)
{
__int64 result;
std::copy(s.begin(),s.begin()+sizeof(__int64), reinterpret_cast<char*>(&result));
return result;
}

Share this post


Link to post
Share on other sites
Quote:
Original post by lancekt
Quote:
Original post by Zahlman
What, because manually iterating over things, dealing with memory management on strings (granted there isn't any in the sample code, but presumably the string we're interested in "comes from somewhere" at runtime), etc. is "less complex"? Is this some kind of joke?


Any programmer alive can understand iteration over an array. There may be some who are (god forbid) unfamiliar with <algorithm> or even <string>.

Additionally, you are getting a pointer to the raw character data from the std::string in your version. The only real difference is that you have added an extra layer of complexity--the string class, which has no relevance whatever to this discussion.


See mikeman's post. I had a brainfart and missed the obvious thing, partly because of being distracted by these char*'s.

And lots of programmers "understand iteration over an array" in a completely different way. In Python, an explicit C++-style for loop would have to look like:


for i in range(len(container)):
do_something_with(container[i])


New users of Python get friendly jeers for this sort of thing, as the experiences Pythonistas attempt to correct the damage that other imperative languages have done to their thought process. Imagine, writing something that strange (and it doesn't even make you handle the incrementing like C++ would!), when you could do:


for thing in container:
do_something_with(thing)


Granted, C++ syntax can't be made quite so nice, but the library is a big step in that directly. Besides, the std::copy call reads quite naturally: "copy between str.begin() and 8 bytes after str.begin() to the reinterpreted-as-char* location of result". As opposed to "with i starting at zero until it reaches 8, incrementing each time, assign byte i of str to byte i of the reinterpreted-as-char* location of result".

Share this post


Link to post
Share on other sites
Quote:
Original post by methinks
I feel like an idiot, because I saw the answer here a while ago but now I can't seem to find it...

I'm storing data by identifier (using std::map). My problem is that the identifier comes in the form of a string, which (the way I understand it) is a slow way of doing it, because string comparisons take a while...

So what I would like to do is to change the string into a numerical value. A 64bit variable should be able to hold the equivalent of 8 chars, which should be sufficient. Now, how would I go about casting that?



You mean like this?

#ifndef _ID_STRING_H
#define _ID_STRING_H
#include <map>
#include <string>

class ID
{
private:
int id;
public:
ID(){}
ID(int i);
ID(const std::string &Str);
ID(const char *ch);
bool operator ==(ID A) const;
bool operator !=(ID A) const;
bool operator < (ID A) const;
void operator = (const std::string &Str);
void operator = (const char *ch);
};
bool operator ==(const std::string &Str, ID Id);
bool operator ==(ID Id, const std::string &Str);
bool operator !=(const std::string &Str, ID Id);
bool operator !=(ID Id, const std::string &Str);

bool operator ==(const char *ch, ID Id);
bool operator ==(ID Id, const char *ch);
bool operator !=(const char *ch, ID Id);
bool operator !=(ID Id, const char *ch);

class ID_String
{
private:
ID_String();
static ID_String* instance;
std::map< std::string, ID, std::less<std::string> > Map;
int Pairs;
static const ID end;

public:

//~ID_String();
static ID_String* Instance();
ID RegisterString(const std::string& name);
ID Ident(const std::string &name);
static ID End(){return end;}
std::string GetString(ID);
};

#endif //_ID_STRING_H





#include "ID_String.h"

ID::ID(int i)
{
id = i;
}
ID::ID(const std::string &Str)
{
*this = (ID_String::Instance()->RegisterString(Str));
}
ID::ID(const char *ch)
{
std::string Str = ch;
*this = (ID_String::Instance()->RegisterString(Str));
}
bool ID::operator ==(ID A) const
{
return id == A.id;
}
bool ID::operator !=(ID A) const
{
return id != A.id;
}
bool ID::operator <(ID A) const
{
return id < A.id;
}
void ID::operator =(const std::string &Str)
{
*this = (ID_String::Instance()->Ident(Str));
}
void ID::operator =(const char *ch)
{
std::string Str = ch;
*this = (ID_String::Instance()->Ident(Str));
}

bool operator ==(const std::string &Str, ID Id)
{
return Id == ID_String::Instance()->Ident(Str);
}
bool operator ==(ID Id, const std::string &Str)
{
return Id == ID_String::Instance()->Ident(Str);
}
bool operator !=(const std::string &Str, ID Id)
{
return Id != ID_String::Instance()->Ident(Str);
}
bool operator !=(ID Id, const std::string &Str)
{
return Id != ID_String::Instance()->Ident(Str);
}

bool operator ==(const char *ch, ID Id)
{
std::string Str = ch;
return Id == ID_String::Instance()->Ident(Str);
}
bool operator ==(ID Id, const char *ch)
{
std::string Str = ch;
return Id == ID_String::Instance()->Ident(Str);
}
bool operator !=(const char *ch, ID Id)
{
std::string Str = ch;
return Id != ID_String::Instance()->Ident(Str);
}
bool operator !=(ID Id, const char *ch)
{
std::string Str = ch;
return Id != ID_String::Instance()->Ident(Str);
}



ID_String* ID_String::instance = 0;
const ID ID_String::end = ID(-1);

ID_String::ID_String()
{
Map.clear();
Pairs = 0;
}
//ID_String::~ID_String()
//{
// if(instance) delete instance;
//}
ID_String* ID_String::Instance()
{
if(instance == 0) instance = new ID_String();

return instance;
}

ID ID_String::RegisterString(const std::string& name)
{
std::map<std::string, ID>::iterator I = Map.find(name);
if(I != Map.end())
{
return I->second;
}
else
{
ID id(Pairs);
Map[name] = id;
Pairs++;
return id;;
}
}

ID ID_String::Ident(const std::string &name)
{
std::map<std::string,ID,std::less<std::string> >::iterator I = Map.find(name);
if(I != Map.end())
{
return I->second;
}
return this->end;
}

std::string ID_String::GetString(ID id)
{
std::map<std::string,ID,std::less<std::string> >::iterator I;
for(I = Map.begin(); I != Map.end(); ++I)
{
if(id == I->second)
{
return I->first;
}
}
return "";
}


Share this post


Link to post
Share on other sites
Quote:
Original post by lancekt
Any programmer alive can understand iteration over an array. There may be some who are (god forbid) unfamiliar with <algorithm> or even <string>.


Learning the language syntax but not the standard library seems to be something only C++ programmers do. C, Java, Python, ($LANGUAGE) programmers all learn to use their respective language's library, but apparently, that's too much to ask of C++ programmers.

Share this post


Link to post
Share on other sites
Quote:
Original post by Fruny
Quote:
Original post by lancekt
Any programmer alive can understand iteration over an array. There may be some who are (god forbid) unfamiliar with <algorithm> or even <string>.


Learning the language syntax but not the standard library seems to be something only C++ programmers do. C, Java, Python, ($LANGUAGE) programmers all learn to use their respective language's library, but apparently, that's too much to ask of C++ programmers.


To be fair, the difference is that C++ provides both the C++ library and the C one, so people that come from C usually just learn the new syntax and continue to use stuff like printf(),memcpy(), because they can. There is not a single library, memcpy() is standard, std::copy() is standard. That makes it dificult for beginners to "detach" themselves from the way they used to do things in C.

Share this post


Link to post
Share on other sites
But you'll notice that lancekt's code didn't use any of the appropriate C standard library functions either.

Share this post


Link to post
Share on other sites
Wow, that's a lot of feedback, thanks!

A couple of questions:
What's reinterpret_cast, and more importantly, how is it different from static_cast?

How do I declare a 64bit int? (is it just _int64?)

Would I be able to just copy the string over one char at a time and then bitshift the int to the next int?

Share this post


Link to post
Share on other sites
Quote:
Original post by methinks
What's reinterpret_cast, and more importantly, how is it different from static_cast?


A static_cast is used when the two types are convertible one to the other, somehow, the cast performs the conversion appropriately: e.g. when cast to int, a float will be truncated. reinterpret_cast, on the other hand, tells the compiler to treat the variable as if it had been of the target type all along. Thus it reinterprets the same bundle of bit differently.

Quote:
How do I declare a 64bit int? (is it just _int64?)


C++ does not (yet) have 64 bit integers. Compilers often support them as a language extension, but the keyword used vary. Visual Studio uses __int64, while GCC uses long long which, unless I am mistaken, is the way it is currently planned to be added.

Quote:
Would I be able to just copy the string over one char at a time and then bitshift the int to the next int?


You could do that, but it's significantly more complicated. The solution you have now is the simple one: directly write the data in the right place. Bitshifting is not a good plan.

Share this post


Link to post
Share on other sites
Quote:
Original post by Fruny

C++ does not (yet) have 64 bit integers. Compilers often support them as a language extension, but the keyword used vary. Visual Studio uses __int64, while GCC uses long long which, unless I am mistaken, is the way it is currently planned to be added.


With GCC compiling for a x86-64 system, longs are 64bits (as are long longs, ints are 32bits). I don't think the C++ standard guaranties any bit sizes, Just that sizeof(char)<=sizeof(short)<=sizeof(int)<=sizeof(long). So I think a standards compliant C++ compiler doesn't even need 32bit integers (I doubt such a compiler exists).

If I remember right __int64 works on Microsofts compiler and GCC (mingw). Which are the 2 most common ones.

Share this post


Link to post
Share on other sites
Quote:
Original post by Zahlman

New users of Python get friendly jeers for this sort of thing, as the experiences Pythonistas attempt to correct the damage that other imperative languages have done to their thought process. Imagine, writing something that strange (and it doesn't even make you handle the incrementing like C++ would!), when you could do:


for thing in container:
do_something_with(thing)


Granted, C++ syntax can't be made quite so nice...


Yes it can!


#include <boost/foreach.hpp>
#include <vector>
#include <iostream>

#define foreach BOOST_FOREACH
int main()
{
std::vector<int> integers(10);
foreach(int& i, integers)
std::cout << i << " ";
}





Btw, does anybody know how to change the syntax of the foreach macro so that it looks like this: foreach(int& i in integers)? I tried using

#define in ,

but that (obviously) didn't work.

Share this post


Link to post
Share on other sites
Quote:
Original post by Cocalus
Quote:
Original post by Fruny

C++ does not (yet) have 64 bit integers. Compilers often support them as a language extension, but the keyword used vary. Visual Studio uses __int64, while GCC uses long long which, unless I am mistaken, is the way it is currently planned to be added.


With GCC compiling for a x86-64 system, longs are 64bits (as are long longs, ints are 32bits). I don't think the C++ standard guaranties any bit sizes, Just that sizeof(char)<=sizeof(short)<=sizeof(int)<=sizeof(long). So I think a standards compliant C++ compiler doesn't even need 32bit integers (I doubt such a compiler exists).

If I remember right __int64 works on Microsofts compiler and GCC (mingw). Which are the 2 most common ones.

Standard specify a minimum size of a type. All the type have size that are multiple of sizeof(char) that is 1. The header <limits.h> or <climits> contains information about the size dimensions and min/max values.
__int64 works on Microsofts compiler and mingw.

Share this post


Link to post
Share on other sites
Quote:
Original post by deathkrush
Btw, does anybody know how to change the syntax of the foreach macro so that it looks like this: foreach(int& i in integers)? I tried using

#define in ,

but that (obviously) didn't work.


Please don't try to do that. If you don't want C++ syntax, don't use C++.

Share this post


Link to post
Share on other sites
Quote:
Original post by mikeman
Quote:
Original post by Fruny
Quote:
Original post by lancekt
Any programmer alive can understand iteration over an array. There may be some who are (god forbid) unfamiliar with <algorithm> or even <string>.


Learning the language syntax but not the standard library seems to be something only C++ programmers do. C, Java, Python, ($LANGUAGE) programmers all learn to use their respective language's library, but apparently, that's too much to ask of C++ programmers.


To be fair, the difference is that C++ provides both the C++ library and the C one, so people that come from C usually just learn the new syntax and continue to use stuff like printf(),memcpy(), because they can. There is not a single library, memcpy() is standard, std::copy() is standard. That makes it dificult for beginners to "detach" themselves from the way they used to do things in C.


Yeah. And there are just so many libraries out there for C/C++, often reimplementing the same functionality, often in very different ways. It boggles the mind how many different string classes there have been, or how many ways there are to allocate memory in Win32, etc...

The C++ standard library is of course a huge step in the right direction, but a lot of people (like me) didn't learn with it and hence don't really think to use things like std::copy instintively. (The distrust of STL some people in this industry seem to have doesn't help, either...)

Quote:
Original post by SiCrane
But you'll notice that lancekt's code didn't use any of the appropriate C standard library functions either.


I originally had memcpy in there, for what it's worth. :)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this