Process Apache Logs into Sqlite DB Using Perl

Here’s a simple Perl script, using the SQLite perl module (CPAN::SQLite::DBI), that works nicely to read your Apache log file to create a Sqlite DB which you can use to do ad-hoc analysis using SQL of the logs created by your Apache server.

Simply run the script by passing in the input log file path:

$ apache_log_summary.pl /var/log/apache2/access.log

This will result in the creation of the file named sqlite.db that is a database with a single table named data.

I use the sqlite3 command on my Ubuntu system to connect to the DB:

$ sqlite3 sqlite.db

Then the simple SQL command below will show the data:

select * from data;

The magic is in the Perl regular expression parser.  Note that line 19 expects the log entries in the one of the two formats:


127.0.0.1 – – [08/Jun/2011:08:49:09 -0700] “GET / HTTP/1.1” 200 439 “-” “Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.17) Gecko/20110422 Ubuntu/10.04 (lucid) Firefox/3.6.17 (.NET CLR 3.5.30729)”

127.0.0.1:80 123.123.123.123 – – [08/Jun/2011:08:49:09 -0700] “GET / HTTP/1.1” 200 439 “-” “Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.17) Gecko/20110422 Ubuntu/10.04 (lucid) Firefox/3.6.17 (.NET CLR 3.5.30729)”

Feel free to modify to fit your needs.

     1  #!/usr/bin/perl -w
     2
     3  use DBI;
     4  use strict;
     5
     6  my $dbh = DBI->connect("dbi:SQLite:dbname=sqlite.db",'','');
     7
     8  $dbh->do("CREATE TABLE data (
     9     localsiteref,localsiteport,visitingip,datetimeofvisit,
    10     dateofmonth,monthabbr,year,hour,minute,methodofvisit,
    11     targetfile,httpcode,bytesintransfer,referrer,browserident
    12  )");
    13
    14  open FILE, $ARGV[0] or die $!;
    15
    16  my $count=0;
    17  while () {
    18
    19    if( $_ =~ m/^([\w\.]+):{0,1}([0-9]*)\s([0-9\.]*)\s*\-\s\-\s\[(.+)\]\s"(\w+)\s([^\s]+)\s.*"\s([0-9]+)\s([0-9]+)\s"(.*)"\s"(.*)"$/ ) {
    20      my $localsiteref    = $1;
    21      my $localsiteport   = $2; # int.
    22      my $visitingip      = $3;
    23      my $datetimeofvisit = $4;
    24      my $methodofvisit   = $5;
    25      my $targetfile      = $6;
    26      my $httpcode        = $7; # int.
    27      my $bytesintransfer = $8; # int.
    28      my $referrer        = $9;
    29      my $browserident    = $10;
    30
    31      $targetfile =~ s/'//g;
    32      $referrer =~ s/'//g;
    33      $browserident =~ s/'//g;
    34
    35      my $dateofmonth = 0;my $monthabbr = '';my $year = 0;my $hour = 0;my $minute = 0;
    36      if ( $datetimeofvisit =~ m/^([0-9][0-9])\/(...)\/([0-9][0-9][0-9][0-9]):([0-9][0-9]):([0-9][0-9]):.*$/ ) {
    37        $dateofmonth = $1; # int.
    38        $monthabbr   = $2;
    39        $year        = $3; # int.
    40        $hour        = $4; # int.
    41        $minute      = $5; # int.
    42      }
    43
    44      $dbh->do("INSERT INTO data
    45        (localsiteref,localsiteport,visitingip,datetimeofvisit,
    46         dateofmonth,monthabbr,year,hour,minute,methodofvisit,
    47         targetfile,httpcode,bytesintransfer,referrer,browserident)
    48        VALUES
    49        ('$localsiteref',$localsiteport,'$visitingip','$datetimeofvisit',
    50          $dateofmonth,'$monthabbr',$year,$hour,$minute,'$methodofvisit',
    51          '$targetfile',$httpcode,$bytesintransfer,'$referrer','$browserident')");
    52    }
    53
    54    $count++; last if ( $count > 250000000 );
    55  }
    56
    57  close FILE;
    58
    59  undef($dbh);
    60
    61  exit 0;

Here are some sample interesting SQL queries to run on the database:

echo 'SELECT COUNT(*),visitingip FROM data WHERE 1 = 1 AND visitingip GROUP BY visitingip ORDER BY COUNT(*) ASC;' | sqlite3 sqlite.db 
echo 'SELECT COUNT(*),localsiteref FROM data GROUP BY localsiteref ORDER BY COUNT(*) ASC;' | sqlite3 sqlite.db | sort -n 
echo 'SELECT SUM(bytesintransfer) AS sum,COUNT(*) AS count,localsiteref,targetfile FROM data GROUP BY targetfile ORDER BY COUNT(*) ASC;' | sqlite3 sqlite.db 
echo 'SELECT SUM(bytesintransfer) AS sum,COUNT(*) AS count,localsiteref FROM data GROUP BY localsiteref ORDER BY COUNT(*) ASC;' | sqlite3 sqlite.db 
echo 'SELECT SUM(bytesintransfer) AS sum,COUNT(*) AS count,localsiteref FROM data WHERE hour = 16 AND year = 2011 AND minute >= 0 AND minute < 5 GROUP BY localsiteref;' | sqlite3 sqlite.db