# $Id: README,v 1.39 2002/05/01 21:06:32 jerome Exp $
ScanErrLog v2.01 - May 1st 2002
(C) Jerome Alet <firstname.lastname@example.org> 2000-2002 You're welcome to redistribute this software under the terms of the GNU General Public Licence version 2.0 or, at your option, any higher version.
You can read the complete GNU GPL in the file COPYING which should come along with this software, or visit the Free Software Foundation's WEB site http://www.fsf.org
Sample reports are not distributed anymore, because it's easy to test ScanErrLog online at:
Since version 1.5, you now have to download the jaxml XML generation Python module. You can download its latest version freely from:
You need at least jaxml-2.22.
To be able to produce the report in PDF format, you have to install
the ReportLab's Python module.
You can download it freely from:
The latest official release of ReportLab, 1.13 at this time, works just fine, but any older version may only work partially or not at all.
Nota Bene: Since 2.00 you don't need the jahtml module anymore.
WARNING: 1, 2, and 4 are now mandatory. 3 is optional.
1 - If you don't have Distutils installed (e.g. python version <= 1.5.2) then first download it from:
then follow the installation instructions for Distutils and install it on your system.
2 - If you don't have the jaxml module installed, then download its latest version from:
then follow the installation instructions for jaxml and install it on your system.
3 - If you don't have the ReportLab module installed, and you want to produce reports in PDF format, then download its latest version from:
then follow the installation instructions for ReportLab and install it on your system.
4 - Download the latest ScanErrLog version from:
gzip -d scanerrlog-x.xx.tar.gz | tar -xf -
where x.xx is scanerrlog's latest version number.
Go to scanerrlog's directory:
python setup.py install
You may need to be logged in with sufficient privileges (e.g. root)
This will generally install scanerrlog.py in /usr/local/bin or an equivalent path depending on your system.
If you want to launch ScanErrLog as a CGI script, please consider looking at the ScanErrLog.html file included in this package to see a sample HTML form to do it. Then you may want to copy scanerrlog.py to your web server's cgi-bin directory and allow the execution of python CGI scripts. Refer to your web server's documentation for details.
You can launch scanerrlog.py either directly from the command line, or as a CGI script, or import it in your own python program and use (or subclass) the ApacheErrorLog class it defines. In the latter case take care of ensuring that scanerrlog.py is in your python path before importing it (e.g. do a sys.path.append('/usr/local/bin') before the import scanerrlog)
You can test ScanErrLog online at:
Producing the same report in different formats is now quickier than before, thanks to the --continue option:
- launch ScanErrLog on your error_log file with the --continue option.
- then for each new format you want of the same report, just launch ScanErrLog with the --continue option on an empty file in the same directory as the error_log file.
This will make ScanErrLog parse the error_log file only one time, but produce as many same reports as you want, saving on the processing time and CPU. Note however that due to the use of the QuickSort algorithm, messages with the same number of occurences may be ordered differently from one pass to another.
ScanErrLog v2.01 (C) 2000-2002 Jerome Alet & Free Software Foundation
This Python module allows people to parse Apache error_log files from one of different possible sources (filename, stdin, python file object), and present their datas in decreasing number of occurences of error messages.
This is particularly useful if you want to quickly solve the most annoying problems web surfers encounter visiting your site.
If you run this module directly, it will parse each file which name was passed on the command line.
If you don't pass any argument on the command line, then scanerrlog will read an error_log from stdin if you've piped some file or command to its standard input, or it will print its documentation if you've not.
You can also use it as a CGI script, but you'll not be able to
modify the pattern and outputfile used, and the input filename
should not begin with / or contain .. in its name, all for
security reasons. The names you may use for your CGI variables
are: continue, date, withoutheader, title, limit, exclude, format and
if continue, date or withoutheader exist in your form, these options will be set to TRUE whatever value they have. See ScanErrLog.html for a sample form to launch ScanErrLog as a CGI script.
prints scanerrlog's documentation (what you are reading now)
./scanerrlog.py /var/log/httpd/error_log /var/log/httpd/error_log.1
will read datas from the specified files.
will read datas from standard input
You can pass some options on the command line:
-c | --continue useful if you want to parse the same file
many times (e.g. every week): the current state and statistics of the file are saved in a file named ScanErrLog.stats in the same directory, so you don't have to reparse the beginning of the file each time. You should use this option either to tell ScanErrLog to save the statistics or to reuse the saved ones. Without this option the file is completely parsed again, even if you've got an old statistics file saved in the same directory. WARNING: this option is incompatible with the parsing of multiple files. -d | --date include in the final report the date when
each message appeared for the last time. this option is mutually exclusive with the --pattern option. -e | --exclude e e is a slash separated list of
messages severity. All messages with a severity listed in e are excluded from the final report. By default all messages are included. For example, e can be: info/debug to exclude all messages which severity is info or debug. -f | --format f output format for the report, f can be
'html', 'pdf', 'text', 'xml' the default format is 'html'. -h | --help displays this help screen. -l | --limit lim selects messages only if their number of
occurences equals or exceeds lim. lim's default value is 1, meaning all messages are included in the final report. -n | --nocumulate don't cumulate counts for all the files
passed on the command line. the old -c | --cumulate option is now the default. if the following option -o is not used, then -n implies -w because all reports will be in the same file (stdout). -o | --outputfile f save the report in the file f.
if -n is used, then the filename will be n.f where n is an integer incremented for each new file and starting at 1. -p | --pattern regexp select only the lines which match regexp.
the default regexp is: ^(httpd: |\B)\[([^\[\]]+)\] \[([^\[\]]+)\] (?:\[([^\[\]]+)\] )?
which selects all Apache logged messages, but not errors from CGI scripts for example. to work correctly, your regexp should consume all characters from the beginning of the error line up to the beginning of the real error message. this option is mutually exclusive with the --date option. -t | --title t sets the report title. -v | --version displays ScanErrLog's version number. -w | --withoutheader suppress the header of the HTML report.
useful if you want to include the report directly into another HTML document.
Warning: some options may not work with all report formats.
A fifth possibility is to import this module into another python program and use the ApacheErrorLog class it defines.
ScanErrLog comes with ABSOLUTELY NO WARRANTY This is free software, and you are welcome to redistribute it under certain conditions; refer to the Gnu General Public License for details. You'll find the GNU GPL in the file COPYING which should came along with this software or at http://www.gnu.org
Please e-mail bugs to: email@example.com (Jerome Alet)