C++ STL container mulitple types

Started by
15 comments, last by random_thinker 16 years, 8 months ago
Quote:Original post by random_thinker
Hi Again All,

Thanks for the excellent info...still formulating an approach to this...to keep things simple and to play on some of the ideas above, what about something like (this pseudocode has not been tested!):

// Consider each parsed element as an atom that is either number or text.
struct atom
{
double number;
std::string text;
};

// Does the string resemble a number?
bool isNumber(const std::string & strg)
{ return (strg.find_first_not_of(<number_string>) == std::string::npos); }

// Make a new atom from a string at initial parsing.
atom * makeAtom(const std::string & strg)
{
atom * pAtom;
if(isNumber(strg))
{
pAtom->number = lexical_cast<double>(strg);
pAtom->text = "";
}
else
{
pAtom->number = 0;
pAtom->text = strg;
}
return pAtom; // on exit, content is returned and pAtom is destroyed.
}

// Convenience typedefs allowing dynamic allocation.
typedef std::map<std::string,atom*> atom_map;
typedef std::vector<atom_map*> atom_map_vec;

// Create an atom_map_vec.
atom_map_vec = amvec;

// Load first atom_map.
amvec.push_back(new atom_map);

// Parse a key-value string and return them.
std::string key = "string_key";
std::string value = "123.789";

// Insert a key-value pair.
amvec[0]->insert(std::make_pair(key,(makeAtom(value))));

The above should allow some form of dynamic allocation, and when each map pair is used the 'non-null' values only would be accessed. This also preserves type safety and because all containers are iterating across pointers the efficiency should be preserved. However, it still does not seem the best solution to me.

Any suggestions or ideas?
A couple of comments:

1. In your makeAtom() function, no memory is allocated for pAtom, and the value of pAtom is never initialized. Because of this, the behavior of this function (and any code that uses its return value) will be undefined.

2. The comment 'on exit, content is returned and pAtom is destroyed' is somewhat confusing. What exactly are you expecting to be destroyed at this point? (It is true that pAtom itself is destroyed, but I'm not sure that this is what you mean.)

3. The line 'atom_map_vec = amvec;' should be 'atom_map_vec amvec;', I think.

4. What's the rationale for storing everything (maps, atoms, etc.) by reference (i.e. pointer) rather than by value? (You touched on this in your post, but I wasn't quite clear what you were getting at.)

5. Given what you posted above, it seems that the aforementioned boost::variant would be an appropriate solution to the problem (unless I'm missing something). Have you considered using variant?

[Edit: Some of the programming errors mentioned above - e.g. not allocating memory for pAtom - are probably just because, as you said, it's untested code. I'm still curious about items 4 and 5 though.]
Advertisement
Have you considered using a container that is parameterized to hold pointers that are of the type of a base class?

For example:

class Base{...};class DervA : public Base{...};class DervB : public Base{...};int main(){std::vector<Base*> v;v.push_back(new DervA);v.push_back(new DervB);}


This will allow you to hold "multiple types" in a single container.

[or]

I think you may have touched on this idea already, but you can keep a string representation of all your types then define conversion operators from your string type to basic types (float, int, bool etc.)

You can also provide conversion from one user defined type to another user defined type by utilizing the implicit conversion functionality of the copy constructor.
Hi Jyk..

The code that you've commented on was very rough, I just put it down as one approach to the problem. As to your points, I agree that there is really no need to store by reference, better to store everything by value, as you suggested. This will take care of the pointer issues and pointers in containers that you mentioned. An earlier post suggested using pointers to varying data types to make iteration more efficient, but since a data struct is used (atom) as the value type then there is really no benefit of pointers, I think.

I looked into boost::variant immediately after Si and Antheus suggested it last night. One of the goals of my code is cross-platform usage under GCC compilation; when I read the boost::variant documentation I noted that there were some issues on certain platforms, which complicates things form me. I try to stick with the STL as much as possible and I've found that straight-forward STL-based C++ code compiles with GCC (or G++ actually) out of the box on everything that I use, so that is a prime consideration. For that reason I've decided to try a simpler approach without boost.

In another discussion I understood that with containers of pointers (excluding smart pointers), when the container goes out of scope, the pointers are 'destroyed', but not the memory they point to. So the comment there was just a mental note that the when out-of-scope, the pointer 'pAtom' would be 'destroyed' but the allocation would be returned to another pointer of the type 'atom'.

The code 'atom_map_vec = amvec;' should indeed be 'atom_map_vec amvec;' as you noted. However, once the containers store by value, the code gets a lot simpler, so the pointer issues can be eliminated.

I'm in the process of introducing and testing an 'atom' type in my code now; this is similar to the terminology of Lisp, in that each list is composed of atoms. I originally thought of many types, but in reality the programme really can be narrowed to just 2 (since I cannot see the point in developing rational types for now). These would be std::string and double. Hence 'struct atom' should be all that is needed for type storage at the moment.

I still have a feeling there is a more elegant approach out there, but for now I just can't see it. Again, you suggestions are welcomed.

Thx..
--random_thinkerAs Albert Einstein said: 'Imagination is more important than knowledge'. Of course, he also said: 'If I had only known, I would have been a locksmith'.
fpsgamer,

Very interesting idea, goes back to the most basic aspect of C++. But how would this work in detail? For example, you could have:

class Atom
{
public:
virtual ~Atom(){ }
virtual <sometype?> value(){ }
};

class Text : public Atom
{
public:
Text(std::string& val):value_(val){ }
std::string value(){ return value_; }

private:
std::string value_;
};

class Number : public Atom
{
public:
Number(double val):value_(val){ }
std::string value(){ return value_; }

private:
double value_;
};

or a template approach:

template<typename type>
class Atom
{
public:
virtual ~Atom(){ }
virtual type value(){ }
};

template<typename derivedType>
AtomType : public Atom<derivedType>
{
AtomType(derivedType& val) : value_(val)
derivedType value(){ return value_; }

private:
derivedType value_;
};

but there would still be a problem for templates, you would have to know the type during instantiation, and what type do you use for non-templated approaches?

At the moment, I'm just using a functional approach, and the very rich techniques of std::string to do everything, but there are some problems with calculation error, that I've not been able to solve using string conversions; about 0.0000006%.
--random_thinkerAs Albert Einstein said: 'Imagination is more important than knowledge'. Of course, he also said: 'If I had only known, I would have been a locksmith'.
Quote:Original post by random_thinker
fpsgamer,

Very interesting idea, goes back to the most basic aspect of C++. But how would this work in detail? For example, you could have:

class Atom
{
public:
virtual ~Atom(){ }
virtual <sometype?> value(){ }
};

class Text : public Atom
{
public:
Text(std::string& val):value_(val){ }
std::string value(){ return value_; }

private:
std::string value_;
};

class Number : public Atom
{
public:
Number(double val):value_(val){ }
std::string value(){ return value_; }

private:
double value_;
};

or a template approach:

template<typename type>
class Atom
{
public:
virtual ~Atom(){ }
virtual type value(){ }
};

template<typename derivedType>
AtomType : public Atom<derivedType>
{
AtomType(derivedType& val) : value_(val)
derivedType value(){ return value_; }

private:
derivedType value_;
};

but there would still be a problem for templates, you would have to know the type during instantiation, and what type do you use for non-templated approaches?

At the moment, I'm just using a functional approach, and the very rich techniques of std::string to do everything, but there are some problems with calculation error, that I've not been able to solve using string conversions; about 0.0000006%.
Here's what I'd suggest (this is all of course IMHO). If the inheritance+polymorphism approach is appealing to you, don't reinvent the wheel; instead, use boost::any, which is robust, flexible, and battle-tested. If the struct/union approach is appealing to you, don't reinvent the wheel; instead, use boost::variant, which is robust, flexible, and battle-tested. If you have concerns about portability, post your target platforms/compilers so that the folks here can (potentially at least) provide some feedback on that. (Although I can't say for sure, I think it's at least a possibility that your concerns about portability with respect to these particular libraries may be unfounded.)
In my personal opinion, reducing the number of containers is not always in line with the KISS principle. Often it complicates things.

If the data objects are related and have a similar purpose (i.e. they have similar interfaces and the differences are mostly internal implementations) then most likely polymorphism is your friend (from your explanation above this seems like it may be what you want).

If you just want to lump a bunch of random data together then you may want to rethink your design.
It's not the number of containers that is important, it is how the data may be split up into different containers for a given data file that concerns me. I'd like to keep data organised so that data entry file organisation, searching and collection in a relational way is robust and relatively straight-forward. For this reason I've chosen maps as the basic data container, and these are assembled in vectors. By using a consistent data type 'atom' in each map, the numeric data can be stored and processed in native form, which I hope will eliminate the calculation error that I am getting. Then it is just a matter of sifting the atoms prior to use, depending upon whether there are numeric operations or other uses (such as producing reports).

My work involves stochastic analysis of engineering material data so there is a lot of data processing and simulation with random numbers. Following on earlier discussions, I've written the following code (again untested) to allow the creation, storage and filtering of atoms so that storage and calculation of numbers is done entirely in primitive form. Using an approach something like:

// The idea here is to parse Lisp-like expressions and store the 'atoms'// as either numeric or textual information.  The steps would be something// like://// std::string expr = "(* 5.6 9.32 map_key_1 (/ map_key_2 map_key_3))"// // initial parsing would return://// step 1: "(/ map_key_2 map_key_3)"// step 2: "/ map_key_2 map_key_3"// step 3: "map_key_2 map_key_3" and atom "/" goes to get_accumulator<>().// step 4: Expression is passed to the result of get_accumulator<>(),//         which returned an accumulate_quotient<>() object, and the result//         is returned and inserted; for this example, let's say that://// (/ map_key_2 map_key_3) = 2.5.  //// Then://// step 5: "(* 5.6 9.32 map_key_1 2.5)" after insertion.// step 6: return to step 1, until only a single atom remains.// Create a new data type atom.struct atom{	std::string text;	double number;};// Typedef an atom_maptypedef std::map<std::string,atom> atom_map;// Now a way to make an atom from a string. Returns a correctly// typed atom, based upon the string; note that purely number string// representations cannot be used as text.atom makeAtom(const std::string& value){	atom atm;		// 'bool isNumber(const std::string&)' defined elsewhere,	// No side effects.	if(isNumber(strg);	{		atm.text = "__NIL__";		atm.number = atof(strg.c_str());	}	else	{		atm.text = strg;		atm.number = 0;	}	return atm;}// Now a way to get a numeric value from a key. Uses typedef 'atom_map'. No// side effects.double get_numeric_atom(const std::string& key,const atom_map& map){	atom_map::const_iterator iter;		// First see if the key exists.	if( (iter = map.find(key)) == map.end() )	{		// Does not exist, issue error and kill...	}	return (iter->second).number;}// Create data for this example. Note that 'value2' refers to // contents of 'key'.std::string key = "keystring",value = "123.789",	    key1 = "keystring1",value1 = "123.789",	    key2 = "keystring2",value2 = "keystring";// Create atom_map and iterator.atom_map amap;atom_map::iterator iter = amap.begin(); // Consider const_iterator here.// Load container, use makeAtom() to create correct contents.atom_map.insert(std::make_pair(key,makeAtom(value)));atom_map.insert(std::make_pair(key1,makeAtom(value1)));atom_map.insert(std::make_pair(key2,makeAtom(value2)));// A 'Lisp-style' accumulator of  a sequence of values, compatable// with STL algorithms, has the abstract base class accumulatable<>.// accumulatable and descendants contain the method, 'double result()'.accumulate_sum<double>accumulator;		// Would prefer std::for_each() here but each value must first // be screened.for ( iter = amap.begin(); iter != amap.end(); ++iter ){	// 'bool isNil(const std::string&)' defined elsewhere;	// no side effects.	if ( isNil((iter->second).text) )	{		accumulator((iter->second).number);	}	// The value may refer to another key.	else	{		// 'bool isEmpty(const std::string&)' defined elsewhere;		// no side effects.		if( isEmpty((iter->second).text) )		{			// No nil symbol and is empty, issue error and kill.		}		else		{			// returns value associated with key, no side effects.                      accumulator(get_numeric_atom((iter->second).text,amap));		}	}}		// Should return the result of 123.789 + 123.789 + 123.789 = 371.367,// as the last entry of atom_map refers to another key, in a sense// the map recurses on itself.  This is useful for calculated fields// that contain mixed numbers and keys.std::cout << "\n" << accumulator.result() << std::endl;


I know that in a more abstract sense, the use of boost::any or boost::variant would be far more powerful, but the above code approach should do just about anything that I need into the future and is fairly simple and can be robust, as it just applies STL techniques.

Any thoughts?

[Edited by - random_thinker on August 13, 2007 11:01:58 AM]
--random_thinkerAs Albert Einstein said: 'Imagination is more important than knowledge'. Of course, he also said: 'If I had only known, I would have been a locksmith'.

This topic is closed to new replies.

Advertisement