Sign in to follow this  
random_thinker

C++ STL container mulitple types

Recommended Posts

This issue may have come up before, but I have not found any answers to it yet. In the STL, containers the type is fixed for every element. I would like to have containers with multiple types. I have seen on Sourceforge a discussion of a class identified as 'MultiType' which is basically morphs into various primitive types and a few others, such as 'std::string'. This allows such things as 'std::map<std::string,MultiType>'. My question is, what are the opinions as to this approach (as it could be argued that it breaks the type safety of C++ and I suppose could lead to possible pointer memory allocation problems), and secondly, are there other approaches? Additionally, I have tried using 'std::string' as a base type and converting to primitives using a stringstream-type lexical_cast (with std::fixed << std::setprecision(20)), but I have discovered discrepancies as compared to the same calculations using native types (about 0.000006% which is not acceptable for my work). I have assumed these are round-off related when using this conversion for primitive types; hence the desire to keep numeric data in its primitive form. Any ideas?

Share this post


Link to post
Share on other sites
Which types are you considering storing? The best course of action will depend heavily on that.

Share this post


Link to post
Share on other sites
boost::any
cdiggins::any
boost::variant

Use google.

Keep in mind, that this is merely type-casting. It is not possible, due to language design, to morph a type into another type, unless the conversion between the two is explicitly defined.

While stringstream may seem similar to string, they are completely unrelated. The only reason they can inter-operate is because they are explicitly supported.

Having fixed type in a container is A Good Thing(tm). So is type-safety. If inheritance and polymorphism doesn't give you adequate solution, chances are you're deliberately trying to break something.

Share this post


Link to post
Share on other sites
Hi Si,

I've got std::strings, single, double and various int types that could be used. I'm implementing a simple lisp-style parser for calculated fields in a report template system, so I need to be able to use std::strings as keys for map values as well as std::strings or primitives.

I have implemented this quite well by using the STL std::vector<std::string> and STL std::map<std::string,std::string> and converting types back and forth with a lexical_cast<primitiveType> approach, but now I have discovered this numerical difference from the calculation in native form. Hence my thinking about something like MultiType.

Share this post


Link to post
Share on other sites
Hi Antheus,

I couldn't agree with you more, but I'm stumped on this problem. I'm trying to minimise the number of underlying data containers for this application (always using KISS), and I realise that one of the fundamental strengths of C and C++ is strict typing. But still I have this problem, that I would like to solve without breaking the type safety of the language.

Share this post


Link to post
Share on other sites
It sounds like you'd best benefit from using a discriminated union solution like boost::variant. However, consider also embedding an existing scripting solution and exposing your programs data to the scripting language.

Share this post


Link to post
Share on other sites
Thanks everyone, I'm checking these sources further. What I'm specifically doing is this:

"This is some text from a report with a calculated field that results in the number: ~12,L,2,calculated.value."

;;calculated.value
(* 5.67 7.45 (/ mapkey.value 9)(+ mapkey2.value 4))

The system that I have written works very well; the above string replaces the substring '~12,L,2,calculated.value' with the formatted result of ';;calculated.value', and 'calculated.value' uses the strings 'mapkey.value' and 'mapkey2.value' to access a std::map pair<std::string,std::string> and find the associated values, and convert them to primitive types. But all the data is first read as a string, then parsed, and the integers, numbers and strings (keys) are interpreted. After conversions, however, this small error occurs, which is not acceptable. My problem is that, in some cases the map values are primitives, and in some cases the map values are strings.

Another option is to create another std::map<std::string,double>, for example, and store numeric data in that, but this increases container complexity and management significantly. Under this approach I would need four or five std::maps<>, and have access routines for each.

Share this post


Link to post
Share on other sites
Quote:
Original post by random_thinker
This issue may have come up before, but I have not found any answers to it yet. In the STL, containers the type is fixed for every element. I would like to have containers with multiple types.
<...>


The reason why STL containers uses fixed type elements is that these elements must be of the same size to make access to elements easier. For example, if you put values of constant size into array, you get simple layout with elements accessible by index. To access certain element in array of variable-sized elements is possible by byte-index only which might be hard to compute.

I suggest you to either use pointers - allocate elements on heap and insert them into container (pointers are constant-sized), or create union of types you are going to use.

Share this post


Link to post
Share on other sites
Quote:
After conversions, however, this small error occurs, which is not acceptable. My problem is that, in some cases the map values are primitives, and in some cases the map values are strings.


C++ gives you guarantees on accuracy of floating point calculations. Even without conversion to string, your accuracy may suffer.

Share this post


Link to post
Share on other sites
Hi Again All,

Thanks for the excellent info...still formulating an approach to this...to keep things simple and to play on some of the ideas above, what about something like (this pseudocode has not been tested!):

// Consider each parsed element as an atom that is either number or text.
struct atom
{
double number;
std::string text;
};

// Does the string resemble a number?
bool isNumber(const std::string & strg)
{ return (strg.find_first_not_of(<number_string>) == std::string::npos); }

// Make a new atom from a string at initial parsing.
atom * makeAtom(const std::string & strg)
{
atom * pAtom;
if(isNumber(strg))
{
pAtom->number = lexical_cast<double>(strg);
pAtom->text = "";
}
else
{
pAtom->number = 0;
pAtom->text = strg;
}
return pAtom; // on exit, content is returned and pAtom is destroyed.
}

// Convenience typedefs allowing dynamic allocation.
typedef std::map<std::string,atom*> atom_map;
typedef std::vector<atom_map*> atom_map_vec;

// Create an atom_map_vec.
atom_map_vec = amvec;

// Load first atom_map.
amvec.push_back(new atom_map);

// Parse a key-value string and return them.
std::string key = "string_key";
std::string value = "123.789";

// Insert a key-value pair.
amvec[0]->insert(std::make_pair(key,(makeAtom(value))));

The above should allow some form of dynamic allocation, and when each map pair is used the 'non-null' values only would be accessed. This also preserves type safety and because all containers are iterating across pointers the efficiency should be preserved. However, it still does not seem the best solution to me.

Any suggestions or ideas?

Share this post


Link to post
Share on other sites
Quote:
Original post by random_thinker
Hi Again All,

Thanks for the excellent info...still formulating an approach to this...to keep things simple and to play on some of the ideas above, what about something like (this pseudocode has not been tested!):

// Consider each parsed element as an atom that is either number or text.
struct atom
{
double number;
std::string text;
};

// Does the string resemble a number?
bool isNumber(const std::string & strg)
{ return (strg.find_first_not_of(<number_string>) == std::string::npos); }

// Make a new atom from a string at initial parsing.
atom * makeAtom(const std::string & strg)
{
atom * pAtom;
if(isNumber(strg))
{
pAtom->number = lexical_cast<double>(strg);
pAtom->text = "";
}
else
{
pAtom->number = 0;
pAtom->text = strg;
}
return pAtom; // on exit, content is returned and pAtom is destroyed.
}

// Convenience typedefs allowing dynamic allocation.
typedef std::map<std::string,atom*> atom_map;
typedef std::vector<atom_map*> atom_map_vec;

// Create an atom_map_vec.
atom_map_vec = amvec;

// Load first atom_map.
amvec.push_back(new atom_map);

// Parse a key-value string and return them.
std::string key = "string_key";
std::string value = "123.789";

// Insert a key-value pair.
amvec[0]->insert(std::make_pair(key,(makeAtom(value))));

The above should allow some form of dynamic allocation, and when each map pair is used the 'non-null' values only would be accessed. This also preserves type safety and because all containers are iterating across pointers the efficiency should be preserved. However, it still does not seem the best solution to me.

Any suggestions or ideas?
A couple of comments:

1. In your makeAtom() function, no memory is allocated for pAtom, and the value of pAtom is never initialized. Because of this, the behavior of this function (and any code that uses its return value) will be undefined.

2. The comment 'on exit, content is returned and pAtom is destroyed' is somewhat confusing. What exactly are you expecting to be destroyed at this point? (It is true that pAtom itself is destroyed, but I'm not sure that this is what you mean.)

3. The line 'atom_map_vec = amvec;' should be 'atom_map_vec amvec;', I think.

4. What's the rationale for storing everything (maps, atoms, etc.) by reference (i.e. pointer) rather than by value? (You touched on this in your post, but I wasn't quite clear what you were getting at.)

5. Given what you posted above, it seems that the aforementioned boost::variant would be an appropriate solution to the problem (unless I'm missing something). Have you considered using variant?

[Edit: Some of the programming errors mentioned above - e.g. not allocating memory for pAtom - are probably just because, as you said, it's untested code. I'm still curious about items 4 and 5 though.]

Share this post


Link to post
Share on other sites
Have you considered using a container that is parameterized to hold pointers that are of the type of a base class?

For example:


class Base{...};

class DervA : public Base{...};

class DervB : public Base{...};


int main(){
std::vector<Base*> v;

v.push_back(new DervA);
v.push_back(new DervB);
}




This will allow you to hold "multiple types" in a single container.

[or]

I think you may have touched on this idea already, but you can keep a string representation of all your types then define conversion operators from your string type to basic types (float, int, bool etc.)

You can also provide conversion from one user defined type to another user defined type by utilizing the implicit conversion functionality of the copy constructor.

Share this post


Link to post
Share on other sites
Hi Jyk..

The code that you've commented on was very rough, I just put it down as one approach to the problem. As to your points, I agree that there is really no need to store by reference, better to store everything by value, as you suggested. This will take care of the pointer issues and pointers in containers that you mentioned. An earlier post suggested using pointers to varying data types to make iteration more efficient, but since a data struct is used (atom) as the value type then there is really no benefit of pointers, I think.

I looked into boost::variant immediately after Si and Antheus suggested it last night. One of the goals of my code is cross-platform usage under GCC compilation; when I read the boost::variant documentation I noted that there were some issues on certain platforms, which complicates things form me. I try to stick with the STL as much as possible and I've found that straight-forward STL-based C++ code compiles with GCC (or G++ actually) out of the box on everything that I use, so that is a prime consideration. For that reason I've decided to try a simpler approach without boost.

In another discussion I understood that with containers of pointers (excluding smart pointers), when the container goes out of scope, the pointers are 'destroyed', but not the memory they point to. So the comment there was just a mental note that the when out-of-scope, the pointer 'pAtom' would be 'destroyed' but the allocation would be returned to another pointer of the type 'atom'.

The code 'atom_map_vec = amvec;' should indeed be 'atom_map_vec amvec;' as you noted. However, once the containers store by value, the code gets a lot simpler, so the pointer issues can be eliminated.

I'm in the process of introducing and testing an 'atom' type in my code now; this is similar to the terminology of Lisp, in that each list is composed of atoms. I originally thought of many types, but in reality the programme really can be narrowed to just 2 (since I cannot see the point in developing rational types for now). These would be std::string and double. Hence 'struct atom' should be all that is needed for type storage at the moment.

I still have a feeling there is a more elegant approach out there, but for now I just can't see it. Again, you suggestions are welcomed.

Thx..

Share this post


Link to post
Share on other sites
fpsgamer,

Very interesting idea, goes back to the most basic aspect of C++. But how would this work in detail? For example, you could have:

class Atom
{
public:
virtual ~Atom(){ }
virtual <sometype?> value(){ }
};

class Text : public Atom
{
public:
Text(std::string& val):value_(val){ }
std::string value(){ return value_; }

private:
std::string value_;
};

class Number : public Atom
{
public:
Number(double val):value_(val){ }
std::string value(){ return value_; }

private:
double value_;
};

or a template approach:

template<typename type>
class Atom
{
public:
virtual ~Atom(){ }
virtual type value(){ }
};

template<typename derivedType>
AtomType : public Atom<derivedType>
{
AtomType(derivedType& val) : value_(val)
derivedType value(){ return value_; }

private:
derivedType value_;
};

but there would still be a problem for templates, you would have to know the type during instantiation, and what type do you use for non-templated approaches?

At the moment, I'm just using a functional approach, and the very rich techniques of std::string to do everything, but there are some problems with calculation error, that I've not been able to solve using string conversions; about 0.0000006%.

Share this post


Link to post
Share on other sites
Quote:
Original post by random_thinker
fpsgamer,

Very interesting idea, goes back to the most basic aspect of C++. But how would this work in detail? For example, you could have:

class Atom
{
public:
virtual ~Atom(){ }
virtual <sometype?> value(){ }
};

class Text : public Atom
{
public:
Text(std::string& val):value_(val){ }
std::string value(){ return value_; }

private:
std::string value_;
};

class Number : public Atom
{
public:
Number(double val):value_(val){ }
std::string value(){ return value_; }

private:
double value_;
};

or a template approach:

template<typename type>
class Atom
{
public:
virtual ~Atom(){ }
virtual type value(){ }
};

template<typename derivedType>
AtomType : public Atom<derivedType>
{
AtomType(derivedType& val) : value_(val)
derivedType value(){ return value_; }

private:
derivedType value_;
};

but there would still be a problem for templates, you would have to know the type during instantiation, and what type do you use for non-templated approaches?

At the moment, I'm just using a functional approach, and the very rich techniques of std::string to do everything, but there are some problems with calculation error, that I've not been able to solve using string conversions; about 0.0000006%.
Here's what I'd suggest (this is all of course IMHO). If the inheritance+polymorphism approach is appealing to you, don't reinvent the wheel; instead, use boost::any, which is robust, flexible, and battle-tested. If the struct/union approach is appealing to you, don't reinvent the wheel; instead, use boost::variant, which is robust, flexible, and battle-tested. If you have concerns about portability, post your target platforms/compilers so that the folks here can (potentially at least) provide some feedback on that. (Although I can't say for sure, I think it's at least a possibility that your concerns about portability with respect to these particular libraries may be unfounded.)

Share this post


Link to post
Share on other sites
In my personal opinion, reducing the number of containers is not always in line with the KISS principle. Often it complicates things.

If the data objects are related and have a similar purpose (i.e. they have similar interfaces and the differences are mostly internal implementations) then most likely polymorphism is your friend (from your explanation above this seems like it may be what you want).

If you just want to lump a bunch of random data together then you may want to rethink your design.

Share this post


Link to post
Share on other sites
It's not the number of containers that is important, it is how the data may be split up into different containers for a given data file that concerns me. I'd like to keep data organised so that data entry file organisation, searching and collection in a relational way is robust and relatively straight-forward. For this reason I've chosen maps as the basic data container, and these are assembled in vectors. By using a consistent data type 'atom' in each map, the numeric data can be stored and processed in native form, which I hope will eliminate the calculation error that I am getting. Then it is just a matter of sifting the atoms prior to use, depending upon whether there are numeric operations or other uses (such as producing reports).

My work involves stochastic analysis of engineering material data so there is a lot of data processing and simulation with random numbers. Following on earlier discussions, I've written the following code (again untested) to allow the creation, storage and filtering of atoms so that storage and calculation of numbers is done entirely in primitive form. Using an approach something like:



// The idea here is to parse Lisp-like expressions and store the 'atoms'
// as either numeric or textual information. The steps would be something
// like:
//
// std::string expr = "(* 5.6 9.32 map_key_1 (/ map_key_2 map_key_3))"
//
// initial parsing would return:
//
// step 1: "(/ map_key_2 map_key_3)"
// step 2: "/ map_key_2 map_key_3"
// step 3: "map_key_2 map_key_3" and atom "/" goes to get_accumulator<>().
// step 4: Expression is passed to the result of get_accumulator<>(),
// which returned an accumulate_quotient<>() object, and the result
// is returned and inserted; for this example, let's say that:
//
// (/ map_key_2 map_key_3) = 2.5.
//
// Then:
//
// step 5: "(* 5.6 9.32 map_key_1 2.5)" after insertion.
// step 6: return to step 1, until only a single atom remains.

// Create a new data type atom.
struct atom
{
std::string text;
double number;
};

// Typedef an atom_map
typedef std::map<std::string,atom> atom_map;

// Now a way to make an atom from a string. Returns a correctly
// typed atom, based upon the string; note that purely number string
// representations cannot be used as text.
atom makeAtom(const std::string& value)
{
atom atm;

// 'bool isNumber(const std::string&)' defined elsewhere,
// No side effects.
if(isNumber(strg);
{
atm.text = "__NIL__";
atm.number = atof(strg.c_str());
}
else
{
atm.text = strg;
atm.number = 0;
}
return atm;
}

// Now a way to get a numeric value from a key. Uses typedef 'atom_map'. No
// side effects.
double get_numeric_atom(const std::string& key,const atom_map& map)
{
atom_map::const_iterator iter;

// First see if the key exists.
if( (iter = map.find(key)) == map.end() )
{
// Does not exist, issue error and kill...
}
return (iter->second).number;
}

// Create data for this example. Note that 'value2' refers to
// contents of 'key'.
std::string key = "keystring",value = "123.789",
key1 = "keystring1",value1 = "123.789",
key2 = "keystring2",value2 = "keystring";

// Create atom_map and iterator.
atom_map amap;
atom_map::iterator iter = amap.begin(); // Consider const_iterator here.

// Load container, use makeAtom() to create correct contents.
atom_map.insert(std::make_pair(key,makeAtom(value)));
atom_map.insert(std::make_pair(key1,makeAtom(value1)));
atom_map.insert(std::make_pair(key2,makeAtom(value2)));

// A 'Lisp-style' accumulator of a sequence of values, compatable
// with STL algorithms, has the abstract base class accumulatable<>.
// accumulatable and descendants contain the method, 'double result()'.
accumulate_sum<double>accumulator;

// Would prefer std::for_each() here but each value must first
// be screened.
for ( iter = amap.begin(); iter != amap.end(); ++iter )
{
// 'bool isNil(const std::string&)' defined elsewhere;
// no side effects.
if ( isNil((iter->second).text) )
{
accumulator((iter->second).number);
}
// The value may refer to another key.
else
{
// 'bool isEmpty(const std::string&)' defined elsewhere;
// no side effects.
if( isEmpty((iter->second).text) )
{
// No nil symbol and is empty, issue error and kill.
}
else
{
// returns value associated with key, no side effects.
accumulator(get_numeric_atom((iter->second).text,amap));
}
}
}

// Should return the result of 123.789 + 123.789 + 123.789 = 371.367,
// as the last entry of atom_map refers to another key, in a sense
// the map recurses on itself. This is useful for calculated fields
// that contain mixed numbers and keys.
std::cout << "\n" << accumulator.result() << std::endl;








I know that in a more abstract sense, the use of boost::any or boost::variant would be far more powerful, but the above code approach should do just about anything that I need into the future and is fairly simple and can be robust, as it just applies STL techniques.

Any thoughts?

[Edited by - random_thinker on August 13, 2007 11:01:58 AM]

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this