Hendrik Schober
2008-07-14 20:27:38 UTC
In some way efficient logging is a hobby of mine. I've been writing
logging libraries for almost 15 years.
Logging is a very good way of finding errors in your application if
those errors only appear at a client's site on the other side of the
planet. It also helps tremendously where errors only appear at some
special timing patterns (like when you're multi-threading) which will
never occur during debugging.
But logging has very conflicting goals: It should be able to produce
a lot of information in case of error without drowning you in a sea
of useless data and, even more important, without slowing down your
application. While this combination certainly cannot be achieved, a
sophisticated logging system can give you the necessary tools and
levers to balance between these according to your needs. (But then
sophistication usually conflicts with another goal: It should be
easy to use.)
That's what makes logging so interesting a problem.
In order to achieve maximum performance while not losing important
information (which sat in some buffer when the application crashed),
IME the following must hold:
- for each log statement, the decision whether its data is used must
be the first thing done -- at least in the code generated by the
compiler
- only if the data is to be used should the data be evaluated and
forwarded
- each log statement must be treated en-bloc, the resulting string
entirely buffered in memory while it's constructed, and flushed
to the target device immediately when done
- it must be possible to enable/disable logging at compile-time
using different parameters (severity, parts of the code etc.);
if disabled, log statement should be discarded by the optimizer
- if logging is enabled at compile-time, it should be possible to
enable/disable it at run-time; when disabled at run-time, it
should be optimized to a single 'if' statement
Many people preferred the stream idiom for logging:
some_logger << "x=" << x << ", y=" << y;
I suppose this stems from the fact that, at some point, everybody
used 'std::cerr' for logging and from the general familiarity of the
idiom. Also, everybody is familiar with streaming. So recently I have
searched for ways to make the syntax for logging stream-like while
still upholding the above goals.
The problem with the stream idiom is that it isn't easy to chunk
information in a deterministic way. You just shove everything down
'operator<<' into some buffer, and when that buffer is full it will
get flushed to the target device. The only way to control flushing
is by explicitly and manually inserting flushing tokens into the
stream. For logging, this isn't acceptable, because flushing is the
default and must not be forgetten.
But given code like the above, when you are inside 'operator<<' you
cannot tell whether it is its first invocation in that line, its
last, or one in between. So you don't know whether to flush the
information given of not.
Add to this parametrization (log messages have severity and you can
decide which severities you want to log), the ability to group code
in hierarchies so you can turn logging on/off for sub-hierarchies,
and the desire to be very flexible in formatting and outputting the
data.
I've used templates to solve this puzzle.
Over the weekend, I finally went to sourceforge and created a project
so you can go and have a look at it. But beware! That's rather rough
code, there's no documentation at all, and so far I've only compiled
it using VC9. If all this doesn't scare you away, have a look at
http://sourceforge.net/projects/templog/
(There's a cpp file with a small test application. If you debug
through it, this might help you to understand what's going on.)
I'd very much appreciate if you would find the time to have a look
at this lib and tell me what you think. (I'm very interested in what
other compilers have to say about the code.)
TIA,
Schobi
logging libraries for almost 15 years.
Logging is a very good way of finding errors in your application if
those errors only appear at a client's site on the other side of the
planet. It also helps tremendously where errors only appear at some
special timing patterns (like when you're multi-threading) which will
never occur during debugging.
But logging has very conflicting goals: It should be able to produce
a lot of information in case of error without drowning you in a sea
of useless data and, even more important, without slowing down your
application. While this combination certainly cannot be achieved, a
sophisticated logging system can give you the necessary tools and
levers to balance between these according to your needs. (But then
sophistication usually conflicts with another goal: It should be
easy to use.)
That's what makes logging so interesting a problem.
In order to achieve maximum performance while not losing important
information (which sat in some buffer when the application crashed),
IME the following must hold:
- for each log statement, the decision whether its data is used must
be the first thing done -- at least in the code generated by the
compiler
- only if the data is to be used should the data be evaluated and
forwarded
- each log statement must be treated en-bloc, the resulting string
entirely buffered in memory while it's constructed, and flushed
to the target device immediately when done
- it must be possible to enable/disable logging at compile-time
using different parameters (severity, parts of the code etc.);
if disabled, log statement should be discarded by the optimizer
- if logging is enabled at compile-time, it should be possible to
enable/disable it at run-time; when disabled at run-time, it
should be optimized to a single 'if' statement
Many people preferred the stream idiom for logging:
some_logger << "x=" << x << ", y=" << y;
I suppose this stems from the fact that, at some point, everybody
used 'std::cerr' for logging and from the general familiarity of the
idiom. Also, everybody is familiar with streaming. So recently I have
searched for ways to make the syntax for logging stream-like while
still upholding the above goals.
The problem with the stream idiom is that it isn't easy to chunk
information in a deterministic way. You just shove everything down
'operator<<' into some buffer, and when that buffer is full it will
get flushed to the target device. The only way to control flushing
is by explicitly and manually inserting flushing tokens into the
stream. For logging, this isn't acceptable, because flushing is the
default and must not be forgetten.
But given code like the above, when you are inside 'operator<<' you
cannot tell whether it is its first invocation in that line, its
last, or one in between. So you don't know whether to flush the
information given of not.
Add to this parametrization (log messages have severity and you can
decide which severities you want to log), the ability to group code
in hierarchies so you can turn logging on/off for sub-hierarchies,
and the desire to be very flexible in formatting and outputting the
data.
I've used templates to solve this puzzle.
Over the weekend, I finally went to sourceforge and created a project
so you can go and have a look at it. But beware! That's rather rough
code, there's no documentation at all, and so far I've only compiled
it using VC9. If all this doesn't scare you away, have a look at
http://sourceforge.net/projects/templog/
(There's a cpp file with a small test application. If you debug
through it, this might help you to understand what's going on.)
I'd very much appreciate if you would find the time to have a look
at this lib and tell me what you think. (I'm very interested in what
other compilers have to say about the code.)
TIA,
Schobi
--
***@gmx.de is never read
I'm HSchober at gmx dot de
"I guess at some point idealism meets human nature and
explodes." Daniel Orner
***@gmx.de is never read
I'm HSchober at gmx dot de
"I guess at some point idealism meets human nature and
explodes." Daniel Orner