- Use the Source, Luke
Home | Register | News | Forums | Guide | MyLinks | Bookmark

Sponsored Links

Latest News
  General News
  Press Releases
  Off Topic

Back to files

<!doctype HTML public "-//W3C//DTD HTML 3.2//EN"> <html><head>
<title>FTPWebLog 1.0.2a</title>
<h1 align=center>
FTPWebLog 1.0.2a (Last updated 5 Feb 2001) </h1>
<h2>What is FTPWebLog?</h2>
FTPWebLog 1.0.2a is a freeware integrated WWW and FTP log reporting tool. Its primary inspiration was the <a
href="">wwwstat</a> program written by <a
href="">Roy Fielding</a>. <p> While a good program -
wwwstat has some design flaws that
make it unsuited for use by large sites as released - notably difficult reconfiguration of reports, bad handling of characters that should be escaped, difficulty in making it support additional log formats, poor support for multiple servers, and the rather 'after the fact' retro-fitting of graphic reports to it. </p>
My experience using and heavily customizing wwwstat led me to conclude that I needed a new program written from the ground up for flexibility: FTPWebLog was the result.
wwwstat still does some things that FTPWebLog does not - most notably filtering of reports by date. On the flip side, FTPWebLog does several things that wwwstat does not and is <strong>much</strong> easier to customize to match a sites particular needs. </p>
<h2>Differences between 1.0.2a and 1.0.2</h2> <p>
The 'graphftpweblog' program has been modified to generate 'png' files instead of 'gif' files because the GD library has removed all 'gif' support in recent version. No other changes. </p>
<h2>Differences between 1.0.2 and 1.0.1</h2> <p>
I have added 'archive section' reporting to the main text report, a CGI script to allow getting 'extract' reports on the fly and a hostname lookup function that can convert raw IP addresses to hostnames as a log is being scanned. The documentation on all changes from 1.0.1 is still quite thin.
My &quot;to do&quot; list for 1.0.3 includes adding command line support for archive sections, adding the archive sections to the graphical report, 'local' machine name handling, date filtering, re-laying out of the daily graph to allow more than one month of data, addition of a &quot;by the month&quot; summary report, the ability to include more than one old report at a time, and some speed enhancements. </p>
IOW: Check back often. Things will be a changin. </p>
<h2>What does a FTPWebLog report look like?</h2> I have an <a
href="">example</a> of a report online. This report is a <em>full</em> report with all report sections activated and graphs. The text section is about 230 Kbytes. Each major section can be selectively disabled, and re-ordering the sections is simply a matter of changing the order of a half dozen calling lines. </p><p>
For example, a <a
href="">'stats lite'</a>
version of the same report above is easily generated by extracting the needed information from the <a
href="">full report</a>. It is only 24 Kbytes.

<h2>How much does it cost?</h2>
Absolutely nothing.
If you like it - just download it and set it up. <h2>Setting it up</h2>
First - <a
href="">download</a> the current alpha distribution.
If you want to do graphical reports, you will also need some additional support:
<li>The <a
href=""></a> perl library. It is available at <a
href="">&lt;URL:;</a>; <li><a href="">The gd graphics library</a>. It can be found at

<li>You also MUST have <a
href="">Perl 5.001</a> or later to use the graph generating Perl script. Perl 4.036 is sufficient for the text based report, but Perl 5.001 is necessary for the graphic based report. Perl 5 is available at <a href="">&lt;URL:;</a>; and many other places
Follow the directions given with each of those packages to install them. Once the required graphics support is in place, configuration of 'ftpweblog' is easy.
Almost all the options are explained directly in the source for 'ftpweblog' and 'graphftpweblog'. Here is a short general guide that should let you get up.

Identify where your access_log is
stored. Change $LogFile in the 'ftpweblog' program to point to it. <p>
If using 'graphftpweblog', set $GraphFTPWebLogURL in the 'ftpweblog' program to point the URL where you intend to put the graphic report html file generated by 'graphftpweblog'.
Make any directories that will be used by 'graphftpweblog' to store the gif files it generates.
Run 'ftpweblog' - directing its output to a file: <p>
<strong>ftpweblog &gt; stats.html</strong> <p>
If using graphftpweblog, run it - also directing its output to a file. <p>
<strong>graphftpweblog &gt; graphs.html</strong> <p>

You should now have a report. That easy. By fine tuning the report options, you can make it as short or as in depth as you like.

<h2>The Command Line Options for FTPWebLog</h2>

Nearly every report option that can be set from inside the script can be set using command line options:
<p><strong>ftpwwwlog [-h] [-i pathname] [-t www|ftp]

               [-x perlregex] [-X perlregex] [-r perlregex]
               [-R perlregex] [-A 0|1] [-H 0|1] [-f N]
               [-d N] [-S 0|1] [-D 0|1] [-F 0|1]
               [-N systemname] [-T perlregex] [-B perlregex]
               [-Q quota] [-q quotarate]
               [logfile ...]   [logfile.gz ...]    [logfile.Z ...]</strong>

<h3>Display Options</h3>
<dd>Just display the usage help message and quit.</dd> </dl>
<h3>Input Options</h3>
<dt>-i pathname</dt>
<dd>Include the 'pathname' file (assumed to be a prior ftpweblog

           output). in the report. Only one preexisting report can
           be included per run right now.</dd>

<dt>[logfile ...] [logfile.gz ...] [logfile.Z ...]</dt> <dd>Process the listed sequence of logfiles.</dd> <dt>-t www|ftp</dt>
<dd> Select whether the log files are to be processed are

FTP log or NCSA Common Log format</dd> <dt>-g URL</dt>
<dd>The URL of the of GraphFTPWebLog output html file(if using GraphFTPWebLog)</dd>
<h3>Log Search Options</h3>

<dt>     -x regex</dt>
<dd>       Only include domain names matching the perl regex in the report</dd>
<dt>     -X regex</dt>
<dd>       Do not include any domain name matching the perl regex</dd>
<dt>     -r regex</dt>
<dd>       Only include refs to files matching the perl regex</dd>
<dt>     -R regex</dt>
<dd>       Do not include refs to files matching the perl regex</dd>
<dt>     -A 0|1</dt>
<dd>         Print Daily stats (0=do not, 1=do)</dd>
<dt>     -H 0|1</dt>
<dd>         Print Hourly stats (0=do not, 1=do)</dd>
<dt>     -f N</dt>
<dd>           Print Top N Files (0=do not)</dd>
<dt>     -d N</dt>
<dd>           Print Top N Domains (0=do not)</dd>
<dt>     -S 0|1</dt>
<dd>         Print summary report (0=do not, 1=do)</dd>
<dt>     -F 0|1</dt>
<dd>         Print full file listing (0=do not, 1=do)</dd>
<dt>     -D 0|1</dt>
<dd>         Print full domain listing (0=do not, 1=do)</dd>
<dt>     -L 0|1</dt>
<dd>         Print top level domain report (0=do not, 1=do)</dd>
<dt>     -N name</dt>
<dd>        Name for report</dd>
<dt>     -T regex</dt>
<dd>       Filter top N file list to exclude files matching the regex</dd>
<dt>     -B regex</dt>
<dd>       Blank this pattern in filenames. Useful for stripping extra 
path from cache defeating CGI scripts.</dd>
<dt>     -Q quota</dt>
<dd>       Volume Quota in bytes (0=no quota). A extremely basic 
accounting feature. Lets you automatically charge for excessive volume.</a>
<dt>     -q quotarate</dt>

<dd> Quota Rate in meg/day over volume quota. Assumed to be in dollars.
<h2>The Command Line Options for GraphFTPWebLog</h2>

<strong>graphftpwwwlog [-h] [-A 0|1] [-B regex] [-D 0|1] [-d N]

                      [-f N] [-H 0|1] [-N name] [-P directory]
                      [-U URL] [-R regex] [-r regex] [-X regex]
                      [-x regex] [filename]</strong>

GraphFTPWebLog processes a FTPWebLog report and produce graphss of the information in it. An HTML web page connecting them together is sent to STDOUT
<h3>Display Options</h3>
<dt> -h</dt>
<dd> Just display the usage help message and quit.</dd> </dt>

<h3>Common Options</h3>
<dt> -P directory</dt>
<dd> Directory where the graph files are to be stored.</dd>

<dt>     -U URL</dt>
<dd>         Base URL where the graph files can be accessed</dd>
<dt>     -A 0|1</dt>
<dd>    Graph Daily stats (0=do not, 1=do)</dd>
<dt>     -B regex</dt>

<dd> Blank out partial URLs matching the regex. This can be used to 'defragment' URLs that use extended paths (such as cache defeating CGI programs).</dd>
<dt> -D 0|1</dt>
<dd> Graph top level doamins (0=do not, 1=do)</dd>

<dt>     -d N</dt>
<dd>      Graph Top N Domains (0=do not)</dd>
<dt>     -f N</dt>
<dd>      Graph Top N Files (0=do not)</dd>
<dt>     -H 0|1</dt>
<dd>    Graph Hourly stats (0=do not, 1=do)</dd>
<dt>     -N name</dt>

<dd> System name for report. It iss inserted into the title and a h1 header for the report.
<dt> -R regex</dt>
<dd> Filter out URLs matching regex from the top N files graph</dd> <dt> -r regex</dt>
<dd> Include only files matching the perl regex</dd> <dt> -X regex</dt>
<dd> Filter domains matching regex in top N domains graph</dd> <dt> -x regex</dt>
<dd> Include only doamains matching the perl regex</dd> <dt>filename</dt>
<dd>The file where an already generated

FTPWebLog report has been stored.</dd> </dl>
<h2>Putting all together</h2>

Here is an example of a script to analyze a log and generate both a full report, and a 'lite' report - both linked to a graphic report. <pre>

cd /home/users/snowhare/bin/stats # Where I keep the FTPWebLog scripts

# Directory where I am going to keep all my stats basestatsdir=&quot;/usr/local/lib/httpd/htdocs/statistics&quot;

# Location of my access_log

# Name of my server

# Type of log I am processing (www or ftp) type=&quot;www&quot;

#Name of the full stats report

# Genate a FULL stats report, all reports.

./ftpweblog -t &quot;$type&quot; -N &quot;Web Log Report for $name&quot; \

        -d 40 -D 1 -L 1 -f 40 -F 1 -S 1 -A 1 -H 1 \
        -g &quot;/statistics/graph.html&quot; \
        $sourcelog > ${statsfile}.$$
mv ${statsfile}.$$ ${statsfile}  # Doing the two step to keep the time
                                  # when there are NO stats to a minimum

# Generate a stats lite
# Only the Summary, Daily, Hourly and Top Level domains.

litestatsfile=&quot;$basestatsdir/httpstats-lite.html&quot; ./ftpweblog -t &quot;$type&quot; \

        -N &quot;Lite Web Log Report for $name&quot; -i $statsfile \
        -d 0 -D 0 -L 1 -F 0 -f 0 -S 1 -A 1 -H 1 \
        -g &quot;/statistics/graph.html&quot; \
        /dev/null > ${litestatsfile}.$$
mv ${litestatsfile}.$$ ${litestatsfile} # Doing the two step to keep the time
                                        # when there are NO stats to a minimum

# Make the graphical log report.

./graphftpweblog -N &quot;Graphical Web Log Report for $name&quot;

                -U &quot;/statistics" \
                -P &quot;$basestatsdir&quot; \
                -A 1 -D 1 -d 40 -f 40 -H 1 \
                $statsfile > $basestatsdir/graph.html

# Just to be sure file permission are correct chmod 644 $litestatsfile $statsfile $basestatsdir/graph.html chmod 644 $basestatsdir/*Stats.gif

<h2>Getting sophisticated</h2>
A number of sites are now running multiple servers. By taking advantage of the command line options you can tailor the reports for each server - in fact you can even make seperate reports for different sections of a single server. When doing that - I recommend making one 'with the works' report with all reports turned on, and then using the ability to read old reports to efficiently extract special interest reports. This is <strong>much</strong> faster than generating new reports from the original access_log.<p>
Note: You can't extract domains <em>and</em> meaningfully associate them with a file sections from an old log report. You have to do that particular trick using the original access_log. You can extract domains from an old log report for analysis <strong>OR</strong> extract file names from an old report and have it mean something. But not both.

<h4>An example</h4>

Let's say you have a user named 'johndoe' on your server. You could get a report on just his pages by using: <p>
<strong>ftpweblog -t&#160;www -N&#160;'Web Pages for John Doe' -D&#160;0 -d&#160;0
-L&#160;0 -i&#160;fullreport.html -r&#160;'^/~johndoe' /dev/null &gt; johndoe.html</strong></p> <p>
Breaking it down:
<dt>-t www</dt>
<dd>Specifies this report as being about a WWW server. Not strictly needed since we aren't actually reading a log file.</dd> <dt>-N 'Web Pages for John Doe'</dt>
<dd>This sets the title of the report to 'WWW Log Report for John Doe'</dd> <dt>-D 0</dt>
<dd>Suppress the full domain report because it would be meaningless</dd> <dt>-d 0</dt>
<dd>Suppress the top 40 domains report because it would be meaningless</dd> <dt>-L 0</dt>
<dd>Suppress the top level domains report, again because it would be meaningless</dd>
<dt>-i fullreport.html</dt>
<dd>Specifies to read the file 'fullreport.html' for an already created FTPWebLog report</dd>
<dt>-r '^/~johndoe'</dt>
<dd>Only include files that have paths that start with /~johndoe<p> This is an extremely powerful feature - you can use it to extract reports on graphic files, individual users, and archive sections. </dd>
<dd>Read the current 'log' from '/dev/null'. Just an easy trick to let you focus on the prepocessed report you already made without having to process a real access_log.</dd>
<dt>&gt; johndoe.html</dt>
<dd>Put this extracted report in the file 'johndoe.html'</dd>



You will also find in this distribution a 'ftpweblog-103a1' file - this is an experimental version of FTPWebLog that supports Apache's mod_config_log module and improves FTPWebLog's memory management (you should save TONS of memory now if you turn off the domain related reports). You should be able to directly copy your 'LogFormat' directive value into the appropriate line and have the program parse your custom log format. It is nowhere near complete - it does work. </p>

<address>Benjamin &quot;Snowhare&quot; Franz / <a href="">snowhare&64;</a><br> </body>

Sponsored Links

Discussion Groups
  Networking / Security

About | FAQ | Privacy | Awards | Contact
Comments to the webmaster are welcome.
Copyright 2006 All rights reserved.