Extending ostream

Started by
7 comments, last by King_DuckZ 12 years, 7 months ago
I'm trying to write a debug-only logger that prints data to stdout, although this might change in the future (stderr, a file, send over network...). My choice was to use the c++-style streams.
A feature I'd like to have is auto-ignoring already printed values. Leaving aside the discussion on whether this is a good idea or not, and about thread safety, here is how I'd like to use my custom stream:


myns::dout << "Hello world!" << myns::endl;
myns::dout << mynd::UNIQMSG_BEGIN << "Hello world!" << myns::endl;
myns::dout << mynd::UNIQMSG_BEGIN << "Hello world!" << myns::endl;


I expect the code above to print Hello world twice. I've been looking into iostream architecture for a while now, but I really can't figure out how to do this exactly. Most resources explain how to write a custom buffer, but to my understanding, that's not what I need. In fact, I'd like to block a given message before it reaches the underlying buffer. This means that my stream will need a second buffer in which it will store the string until a myns::UNIQMSG_END is sent in (hence the custom endl), then finally check if it has ever been encountered, and if not send it to the buffer. Is this the right approach? Any help is appreciated.

Just for reference, here are the links I've been looking at:

http://spec.winprog.org/streams/
http://www.horstmann.../iostreams.html (kind of outdated in my opinion)
http://www.atnf.csiro.au/computing/software/sol2docs/manuals/stdlib/user_guide/loc_io/index.htm
http://www.cplusplus.com/
Thanks in advance!

[ King_DuckZ out-- ]
Advertisement
One option is to handle this externally:

./myprogram | uniq
# Or
./myprogram &
tail -f log.txt | uniq

Writing everything to std::cout and std::cerr gives you *maximum* control of this kind of thing. You can pipe to any number of external processes that can do stuff like:

  • Write to a file (Using >)
  • Write to a file and process further (tee)
  • Print only high priority messages (grep)
  • Strip out low priority messages (grep -v)
  • Write to a remote location (netcat, logger to syslog, etc)
  • As mentioned above, using uniq to ignore duplicate lines

Best of all, you can switch between them easily. In some cases, you can do this while the program is running!

Writing a system with a fraction of this flexibility in your actual source will be hard. For the above to work, all you need to do in your source is to output enough information to let the external tools do their work. E.g. a function like this:

enum Severity {
VERBOSE,
INFO,
WARN,
ERROR,
FATAL,
};

// Trivial overload of operator << for Severity omitted

std::ostream & log(std::ostream &out, Severity level, const char *subsystem)
{
return out << timestamp() << ' ' << subsystem << ' ' << level << ": ";
}

int main()
{
log(std::cout, INFO, "main") << "Hello, world!" << std::endl;
}

The tools mentioned above are standard on *nix, you don't get them "out of the box" on Windows. You can download them or write them yourself if you want.
[hr]
A less invasive solution in the code could look like this:

unique(std::cout) << "foo" << 13 << " ... " << std::endl;

A simple implementation that ignores thread safety:

struct unique : noncopyable
{
public:
unique(std::ostream &out) : out(&out)
{
}

~unique()
{
static std::string previous;
std::string str = stream.str();
if(str != previous)
{
(*out) << str;
previous = str;
}
}

template<typename T>
std::ostream &operator<<(const T &object)
{
return stream << object;
}

private:
std::ostream *out;
std::stringstream stream;
};

If you want to have a "per-stream" uniqueness check, you can use std::map<> of their pointers to the previously output string. Obviously making this thread safe will take even more work again, if this is necessary.
Well, I thought about all this, but using an external tool (I'm working on Linux btw) wouldn't give me the flexibility I need.

I think that logging should be somewhat well thought, so I'm not for the approach "flood the output with data and let it do the work". Doing deduplication on-demand (ie: UNIQMSG_BEGIN and UNIQMSG_END) would allow me for example to only filter duplicates in cases where conditionally sending log data would be too hard.

For example, this log is acceptable:

Inside MyFunc()
operation in progress...
[operation in progress...]
[operation in progress...]
Inside MyFunc()
operation in progress...


so I can decide I only want to filter "operation in progress..." messages within a certain scope.

What I need is really how to get between the operator<< and the call that actually sends data to the buffer. An easier excercise would be: how do I make a stream that prints "hello" no matter what the input is?

[ King_DuckZ out-- ]
This might help - the upper case buffer example in particular.

You have an interesting problem though, how are you going to deal with a mismatch? For example, if you forgot myns::UNIQMSG_END? What if the client tries to flush() the stream, or close() it, before this occurs? My "unique()" shares the problem that it does not pass flushes on to the underlying stream.
Your exercise has an easy solution:
class FooStream
{
public:
template <typename T>
FooStream& operator << (const T& value)
{
std::cout << "Hello!\n";

return *this;
}
};


This is trivial to extend into the kind of double-buffering you're asking for. Thread safety becomes a minor headache but isn't too terrible. You can see an example of how to do custom stream manipulators here.

I maintain that it's a bad idea, because it's always better to have too much log information and filter it post-hoc than to have too little and be mystified when something goes wrong - but then again, I'm dealing with a situation where logs might be analyzed post-mortem as much as weeks after an actual problem appears, and accumulating data is vital to unraveling an issue in a system that must not be stopped.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]


This might help - the upper case buffer example in particular.

You have an interesting problem though, how are you going to deal with a mismatch? For example, if you forgot myns::UNIQMSG_END? What if the client tries to flush() the stream, or close() it, before this occurs? My "unique()" shares the problem that it does not pass flushes on to the underlying stream.

Thank you, I've been looking into it and it seems to be what I'm looking for.
You have a point about closing or flushing the stream. I'm currently following the approach of setting a buffer that inherits from basic_buffer and that contains a real basic_buffer, taken from cout or from somewhere else. The problem though is that I can't call overflow() nor sync() on the aggregate buffer, as they're protected. I'll probably "horse around" with similar methods until I find a decent solution, but I'm starting to fear that I have to re-implement basic_ostream :/


Your exercise has an easy solution:
class FooStream
{
public:
template <typename T>
FooStream& operator << (const T& value)
{
std::cout << "Hello!\n";

return *this;
}
};


This is trivial to extend into the kind of double-buffering you're asking for. Thread safety becomes a minor headache but isn't too terrible. You can see an example of how to do custom stream manipulators here.

I maintain that it's a bad idea, because it's always better to have too much log information and filter it post-hoc than to have too little and be mystified when something goes wrong - but then again, I'm dealing with a situation where logs might be analyzed post-mortem as much as weeks after an actual problem appears, and accumulating data is vital to unraveling an issue in a system that must not be stopped.

I'm not sure what you wrote is 100% correct. Assume this case:

MyStream dout;
dout << UNIQMSG_BEGIN;
SomeFunctionTakingAnStdBasicStream(dout);
dout << UNIQMSG_END;

In such a case I'm not sure I can guarantee a correct behaviour, while if I put in a custom stream I'm sure it will behave as I expect. Also, I think the template approach has some issues with manipulators, as some template parameters wouldn't be inferred, which is why stl does it the overloaded way.
As said, I agree logging should be left alone, but I'm trying to implement this feature, so now it's more of a challenge than anything else :)


edit

Ok I see what you mean, you're aggregating instead of inheriting. This would make the code I posted invalid, however. Also, the manipulator problem would still be there.

[ King_DuckZ out-- ]
I already linked to an example of how to handle manipulators.

FooStream should not derive from ostream, because it changes the fundamental contract of ostream. You are violating the expectation of the user and introducing features that the user may not account for properly if they assume you have given them a true ostream. Don't lie about your object's types!

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

You do not need to extend the standard streams. It is not what you want to do. Do not do this. What I tell you three times is true.

What you need to do is replace the streambuf of [font="Courier New"]std::cout[/font], [font="Courier New"]std::cerr[/font], or (preferably) [font="Courier New"]std::clog[/font].

The streams are formatting objects. The streambuf is a transport object.

Stephen M. Webb
Professional Free Software Developer

Allright I guess you're right, I'll try to do it that way. Thank you all for your suggestions, I'll be still following this thread for a while so if anyone comes up with any other suggestion I'm happy to hear it!
[ King_DuckZ out-- ]

This topic is closed to new replies.

Advertisement