How to Attach Streams to Memory/Buffers

Started by
12 comments, last by GenuineXP 15 years, 9 months ago
Up to this point, I've only ever used C++ streams that were either somehow mysteriously connected to some sink/source or were file streams (or where bound to some iterators of some sort... or STL containers... whatever) :-) . I'm wondering, how can one create a stream object that writes and reads data in system memory? I'm hoping this is actually very simple to accomplish! What I'm looking to do is use Boost.Serialization to assist in moving C++ objects across a binary border (e.g., from one dynamic library to another, which may have been compiled separately). To do this, I'd like to create an archive (the Boost kind) using a stream that works in system memory (instead of something like a file stream, which is probably the most common usage). Is this possible? Someone mentioned the serialization library in my previous thread and I thought it sounded like a great idea to try out. Thanks!
Advertisement
Going out on a limb: Does a stringstream do what you want?
Quote:Original post by dmatter
Going out on a limb: Does a stringstream do what you want?

I'm not sure. Thanks for the idea. I'll look into it.

Should I be looking into writing my own stream buffer (like std::streambuf)? I would think one for writing/reading into memory already exists...

EDIT:

I've found this page, which illustrates a char array based stream buffer. Maybe I can just use this. (I really like this guy's site. Very cool.)

[Edited by - GenuineXP on July 15, 2008 8:18:29 PM]
boost iostreams is much easier than doing than a streambuf thing with custom read buffer. it models (inherits i think) istream aand ostream so can be used as a drop in replacement.

#include <string>#include <list>#include <iostream>#include <algorithm>#include <boost/iostreams/filtering_stream.hpp>#include <boost/range/iterator_range.hpp>int main(){       // stream into a wholly inappropriate data type (std::list)    std::list< char>        buf;    boost::iostreams::filtering_ostream  out( std::back_inserter( buf));    out << "pinacolada " << 123 << " " << 7777.f << std::flush;    out << " another pinacolada " << std::flush;    std::copy( buf.begin(), buf.end(), std::ostream_iterator< char>( std::cout, "")  );}


Glad you like the idea, send me a PM or something when you finish, I'm curious if there are any gotchas - I'm planning to do pretty much the same thing.

Its trivial to serialize to a stringstream.

std::stringstream out_sstream(std::stringstream::out);{	boost::archive::text_oarchive outarchive(out_sstream);	outarchive & my_serializable_thing;}const std::string outstr = out_sstream.str();const size_t outlen = outstr.size();const char* outbuf = outstr.c_str();



I think boost::iostreams is overkill for your need. I actually took a look at boost::iostreams yesterday - it seems powerful and definitely makes using iostreams a lot simpler - but I couldn't even get the example sink to compile in an hour. I know I've read in some of the boost flamethreads that some respectable posters have also had trouble with boost::iostreams. YMMV.
Yes, after reading about std::streambuf I immediately pointed my browser to the Boost.Iostreams documentation. It's a cool library.

In any case, if a string stream would work then I agree about Boost being overkill. At the moment, I've been a little more concerned with refining the plugin framework and all of my cross-binary function calls use purely C interfaces.

I'm thinking if I just pass the block of memory I've serialized an object to as an opaque pointer, I can safely get it across this border and deserialize without problems.

One question I have is, does Boost.Serialization handle STL containers, and if so, does it do so on a byte-by-byte level (i.e., it wouldn't work if the plugin used a different implementation) or at a logical level (i.e., a std::vector of ints would be serialized as a list of integers that are read again and simply inserted via push_back)? I want to know if the serialization process is unintrusive enough to allow one container implementation to be passed to another container implementation. This should be possible so long as serialization uses the containers public interface, right?

I'm also not sure which object should be serialized (some objects are easy to convert to pure C) or if there's an elegant way to provide some sort of parameter framework (so that all of my parameters don't just end up being void pointers).

Thanks for the help!
Quote:Original post by GenuineXP

One question I have is, does Boost.Serialization handle STL containers, and if so, does it do so on a byte-by-byte level (i.e., it wouldn't work if the plugin used a different implementation) or at a logical level (i.e., a std::vector of ints would be serialized as a list of integers that are read again and simply inserted via push_back)? I want to know if the serialization process is unintrusive enough to allow one container implementation to be passed to another container implementation. This should be possible so long as serialization uses the containers public interface, right?


You're passing from realm of serialization into more advanced topics, some which are a massive overkill for *function calls* (not even IPC).

Why are you using object if not to maintain a consistent and simple API, with no overhead. If you need implementation abstraction, then you need to look into distributed systems with language agnostic APIs.

std::vector<int> is just that. It's not equivalent to std::list<int> or int[]. And MVC 2008 implementation of std::vector is in no way whatsoever interoperable with 2005 or GCC implementation, unless you recompile the project.

Interoperability is a completely separate topic, one for which C++ is horribly ill-suited for, Java and C# however are designed for it (with C# preferring to run on Windows-only anyway).

Using serialization for the purpose of DLLs is horribly inefficient, and will result in more issues that it would solve. While the most trivial issues are indeed solved, the process is horribly slow (compared to what function call costs).

DLLs are designed for one purpose - low overhead. Otherwise, just use socket library, and you get the ability to distribute the application over multiple machines.

Personally, I'd go with writing templated function call wrappers, that would pass data as native types, or add some light-weight checks.

Otherwise, I'd use a data marshalling solution designed just for that, which allows for code isolation, and defines clear boundaries.
For c++ code I find boost pretty ammenable (what looks initially a bit formidable eventually turns out to be pleasantly simple) but I have never managed to get far with the serialisation stuff so i cant really comment here. One thing with serialization that might be important is endiness - for example marshalling serialized objects across a network to a cross platform client. asn.1 is a heavyweight response to this kind of problem - used in ssh etc.

More generally as a form of file storage - a naive binary serialisation implementation, that lacks typing information is not typesafe. If the deserialization code expects an int and it gets a double - then expect random crashes ( particularly a problem if you cant trust the source). Another perhaps more mundane issue is the need to be able to update the internal data representation behind the back of the on-disk serialized object structure. This is required when you want to make a code change - either to fix a bug or add a new feature. This implies that provision is needed for 1) a version schema control that indicates how the deserialisation should be performed given a mismatch between the memory object layout and the disk object or else 2) include name and type attributes so that on deserialization a value element will be taken from file if the typing matches, otherwise a default constructed value will be taken. its still possible to get into an incongruent state with this system however.

This along with complications like cyclic object graphs combine to makes serialisation non-trivial.
You're probably right. :-)

For now, I'll stick with trying to communicate in C between binaries. Sometimes this may not be possible... but that's what you get with C++. If for no other reason, this will at least keep things a bit simpler. In the end, there will probably always be some caveat that results in the need to build the entire system uniformly for it to work. Ah, well...

Thanks for the reply!
i had some unposted code written for you but my browser ate it when i went to a meeting.

short version: Yes, boost::serialization will do STL (RTFM). Yes, it should be STL implementation agonstic (I looked at the source and they are sticking to the well defined public interface).

Quote:Using serialization for the purpose of DLLs is horribly inefficient, and will result in more issues that it would solve. While the most trivial issues are indeed solved, the process is horribly slow (compared to what function call costs).

At risk of sounding like I'm flaming you, I believe that exactly the problem that OP is looking to solve is turned trivial, with so little effort that if performance becomes relevant you can fix it next iteration and discard one day's work.

chairthrower: "if the deserialization code expects an int and it gets a double"
B::S is designed such that the saving code and the loading code are the *same* code. its not possible to mess it up. if your data source is guaranteed not to add error, this will not happen. [edit2: i continued researching, types are tracked, but I don't know how errors are handled. It is definitely mapping types to strings a la RTTI. From my code which I had forgotten about: BOOST_CLASS_EXPORT_GUID(MyClass, "MyClass"))] b::s is also designed to overcome versioning and platform endianness issues. I thought they deal with cycles transparently also but I couldn't confirm that with a quick google search. This will be equally difficult to solve with or without a serialization library. edit: requirement 5, "Proper restoration of pointers to shared data"

doc index lists design requirements which y'all should read
Quote: 1. Code portability - depend only on ANSI C++ facilities.
2. Code economy - exploit features of C++ such as RTTI, templates, and multiple inheritance, etc. where appropriate to make code shorter and simpler to use.
3. Independent versioning for each class definition. That is, when a class definition changed, older files can still be imported to the new version of the class.
4. Deep pointer save and restore. That is, save and restore of pointers saves and restores the data pointed to.
5. Proper restoration of pointers to shared data.
6. Serialization of STL containers and other commonly used templates.
7. Data Portability - Streams of bytes created on one platform should be readable on any other.
8. Orthogonal specification of class serialization and archive format. That is, any file format should be able to store serialization of any arbitrary set of C++ data structures without having to alter the serialization of any class.
9. Non-intrusive. Permit serialization to be applied to unaltered classes. That is, don't require that classes to be serialized be derived from a specific base class or implement specified member functions. This is necessary to easily permit serialization to be applied to classes from class libraries that we cannot or don't want to have to alter.
10. The archive interface must be simple enough to easily permit creation of a new type of archive.
11. The archive interface must be rich enough to permit the creation of an archive that presents serialized data as XML in a useful manner.

This topic is closed to new replies.

Advertisement