LogJack 1.00 (c) 2001 GT [email: "gt" on dreamsmith.org]
========================================================
This program was written shortly after I downloaded LiveWebStats (which can
be found at http://www.chaosreigns.com/stats) and discovered it didn't quite
do what I needed, but had a number of good ideas on how I could write
something that did.
What did I need? Well, first of all, I wanted to be able to add hit
counters to pages without relying on additional CGI, cute odometer images,
etc. All the hits are being logged into Apache's access_log, so why add
additional mechanisms to count hits already being counted? All that I
really needed was something to scan my log files and tell me how many times
a particular page had been hit. If it produced some nice reports along the
way, that would be nice too.
LiveWebStats generates reports. Unfortunately, it writes them to HTML
files. What I really wanted was something that wrote out snippets that I
could then <--#include virtual="..." --> into my pages. I hacked on
LiveWebStats for a while, changing the output to .shtml, trying to fix the
tables (it generates "sloppy" tables --
's with no | 's, for example,
which is fine for top level tables but fails miserably if the table is
within another table, which unfortunately is how my entire website is set
up), then decided to just start from scratch. I did this, but found myself
frequently as I went along saying "Now how did LiveWebStats do that?" and
pulling up that code and studying it while working on my own. Thus,
although technically written from scratch, my code borrows quite a bit from
LiveWebStats. My thanks to Darxus for the excellent ideas and code...
Enough History Already, What Does It Do?
========================================
Alright, here's what it does. It starts up and scans your httpd logs
(common or combined format), compiles a bunch of statistics, then writes out
a bunch of files suitable for being included in your web pages. The files
come in two varieties, tables and snips.
The tables are your basic reports on all the various statistics, see my
website for examples. It should be noted that they aren't complete tables,
they actually just contain rows. This is because how you format your tables
is likely to be different from how I format mine, depend on the look and
feel of your website. Thus, you're to create your own header, then
just include the generated file as the body, something like this:
| Cool web surfing program | Hits |
|---|
So YOU create the table, with whatever formatting, colors, etc. you want,
and just use SSI to include the content.
Of course, these tables are a side benefit, what I really wanted was access
counts. These are generated and stored in a fileinfo directory, where they
can be included like this:
Qtarot: download the source here!
[Downloaded .]
Which looks ugly, but the viewer sees something like this:
Qtarot: download the source here!
[Downloaded 58 times since 2001-04-19 14:08:22.]
The fileinfo directory has a snip like that for every file you haven't
excluded from the statistics counting. Lines from the log can be excluded
based on IP address, file, or anything you can parse out with a regex. On
my site I exclude all image files from the statistics, as I don't really
care how many times people have downloaded my navbar (well, it's not really
a bar, let's just call it a navigation gadget).
Okay, So How Do I Make It Work?
===============================
Somewhere in your Apache config, you'll have a line specifying where to
record each access (usually in a file creatively called "access.log").
LEAVE IT ALONE! You don't want to replace it, but you want to add another
one. Apache lets you log to as many different places as you like. Nifty,
eh? So add something like this:
CustomLog "|exec /usr/local/bin/logjack.pl /usr/local/etc/logjack" combined
The first parameter to CustomLog tells Apache where to send its log
information. By starting it with a pipe, we say we want the following
program executed and the data piped to its STDIN. The "exec" is optional
but prevents an extra copy of /bin/sh from hanging around in memory all the
time. Next is the path to where you placed logjack.pl (could be anywhere
you like), and finally, logjack.pl takes as its one and only parameter the
location of is configuration directory (which again can be anywhere). The
last CustomLog parameter is the log file format. Just say "combined".
Now, the configuration directory should contain three files: "config.pl",
"files.ignore", and "log.ignore". The first contains variables you can set
to control where the output is written, what format the snips should be in,
what reports to generate, etc. "files.ignore" specifies which files you
don't want statistics on, usually image files but whatever you don't want
hit counts on or to see in your reports. "log.ignore" specified which log
lines you don't want to see, usually you just want to put regexpressions
that specify IP addresses for your own machines and perhaps robots who visit
you, although potentially you could filter just about anything with this
file.
One of the things config.pl specifies is the output directory. This
directory should exist, and it should be somewhere accessible from the web
so that its tables and snips can be read by your SSI. Inside that directory
there must also exist a directory called "fileinfo" where individual file
hit counts will be stores.
A note on the individual file hitcounts. They're stored in files whose
names are based on but are not identical to the original files name.
Basically, it's the file's URI, with all characters other than
alphanumerics, periods, hyphens, and underlines converted to =XX (equals
followed by two hex digits). If that doesn't explain it, just run the darn
thing and look at the files in the "fileinfo" dir. You'll see. These are
the files you want to include for page hit counts, download counts, etc.
Questions?
==========
Q: How "live" are the stats?
A: You specify that in config.pl. By default, it'll wait up to five minutes
before writing new reports, but if your machine is relatively quick or your
statistics aren't terribly big, you might want to bump that up. On my own
site, which doesn't have a great many pages to keep track of, I'm never more
than two minutes behind.
Q: Can you run it as a CRON job?
A: Theoretically. In fact, add "exit 0;" after the first "writestats" call
in logjack.pl and it'll be perfectly suitable for that. But why would you
want to? Each time it runs it would have to reanalyze all your logfiles,
whereas if your run it through CustomLog it keeps that information up to date
all the time at virtually no cost CPU-wise. Thus, there will never be a
"CRON job" option in the program -- if you want that, hack it into the code
yourself.
Q: Why doesn't it generate pretty bar graphs like LiveWebStats?
A: Just 'cuz. :) I must admit I didn't pay too much attention to the
generating of reports, they're there because it's easy to do once you've
parsed the logs, but my main objective was to get automatic hit counts for
all my pages. I may add a feature like that in the future, provided it
doesn't chew up too many cycles (my webserver is a SPARCstation IPC, so I'm
not big on "heavy" scripts). One thing you won't see is image generation
on the fly...
Q: You really need to be using SSI or JSP or something like that to take
advantage of this program. Doesn't that stress the webserver? Wouldn't you
rather use static HTML pages?
A: There are, in fact, no straight .html pages on my website, everything is
done with .shtml, so this program is designed to work with that. If you
want straight .html pages generated, LiveWebStats already does an excellent
job of this, so use it! As far as stressing the webserver, my webserver is
a ten year old, 25 MHz computer (SPARCstation IPC) and it doesn't seem to
have any problems with my ludicrous overuse of SSI. It sure takes the work
out of making all my pages match the visual theme of the site.
Q: Does it work under non-Unix operating systems?
A: I have no idea...