Class: RequestLogAnalyzer::FileFormat::Apache

Inherits:
Base
  • Object
show all
Extended by:
CommonRegularExpressions
Defined in:
lib/request_log_analyzer/file_format/apache.rb

Overview

The Apache file format is able to log Apache access.log files.

The access.log can be configured in Apache to have many different formats. In theory, this FileFormat can handle any format, but it must be aware of the log formatting that is used by sending the formatting string as parameter to the create method, e.g.:

RequestLogAnalyzer::FileFormat::Apache.create('%h %l %u %t "%r" %>s %b')

It also supports the predefined Apache log formats “common” and “combined”. The line definition and the report definition will be constructed using this file format string. From the command line, you can provide the format string using the --apache-format command line option.

Direct Known Subclasses

Nginx, Rack

Defined Under Namespace

Classes: Request

Constant Summary

LOG_FORMAT_DEFAULTS =

A hash of predefined Apache log formats

{
  :common   => '%h %l %u %t "%r" %>s %b',
  :combined => '%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-agent}i"',
  :vhost_combined => '%h %l %v %t "%r" %>s %b "%{Referer}i" "%{User-agent}i" %T/%D',
  :nginx    => '%a %t %h %u "%r" %>s %b',
  :rack     => '%h %l %u %t "%r" %>s %b %T',
  :referer  => '%{Referer}i -> %U',
  :agent    => '%{User-agent}i'
}
APACHE_TIMESTAMP =

I have encountered two timestamp types, with timezone and without. Parse both.

Regexp.union(timestamp('%d/%b/%Y:%H:%M:%S %z'), timestamp('%d/%b/%Y %H:%M:%S'))
LOG_DIRECTIVES =

A hash that defines how the log format directives should be parsed.

{
  '%' => { nil => { :regexp => '%', :captures => [] } },
  'v' => { nil => { :regexp => "(#{hostname_or_ip_address})",  :captures => [{:name => :vhost, :type => :string}] } },
  'h' => { nil => { :regexp => "(#{hostname_or_ip_address})",  :captures => [{:name => :remote_host, :type => :string}] } },
  'a' => { nil => { :regexp => "(#{ip_address})", :captures => [{:name => :remote_ip, :type => :string}] } },
  'b' => { nil => { :regexp => '(\d+|-)', :captures => [{:name => :bytes_sent, :type => :traffic}] } },
  'c' => { nil => { :regexp => '(\+|\-|\X)', :captures => [{:name => :connection_status, :type => :integer}] } },
  'D' => { nil     => { :regexp => '(\d+|-)', :captures => [ {:name => :duration, :type => :duration, :unit => :musec }] },
           'micro' => { :regexp => '(\d+|-)', :captures => [ {:name => :duration, :type => :duration, :unit => :musec }] },
           'milli' => { :regexp => '(\d+|-)', :captures => [ {:name => :duration, :type => :duration, :unit => :msec }] }
         },
  'l' => { nil => { :regexp => '([\w-]+)', :captures => [{:name => :remote_logname, :type => :nillable_string}] } },
  'T' => { nil => { :regexp => '(\d+(?:\.\d+)?|-)', :captures => [{:name => :duration, :type => :duration, :unit => :sec}] } },
  't' => { nil => { :regexp => "\\[(#{APACHE_TIMESTAMP})?\\]", :captures => [{:name => :timestamp, :type => :timestamp}] } },
  's' => { nil => { :regexp => '(\d{3})', :captures => [{:name => :http_status, :type => :integer}] } },
  'u' => { nil => { :regexp => '(\w+|-)', :captures => [{:name => :user, :type => :nillable_string}] } },
  'U' => { nil => { :regexp => '(\/\S*)', :captures => [{:name => :path, :type => :string}] } },
  'r' => { nil => { :regexp => '([A-Z]+) (\S+) HTTP\/(\d+(?:\.\d+)*)', :captures => [{:name => :http_method, :type => :string},
                   {:name => :path, :type => :path}, {:name => :http_version, :type => :string}]} },
  'i' => { 'Referer'    => { :regexp => '(\S+)', :captures => [{:name => :referer, :type => :nillable_string}] },
           'User-agent' => { :regexp => '(.*)',  :captures => [{:name => :user_agent, :type => :user_agent}] }
         }
}

Constants included from CommonRegularExpressions

CommonRegularExpressions::TIMESTAMP_PARTS

Constants inherited from Base

Base::Request

Instance Attribute Summary

Attributes inherited from Base

#line_definitions, #report_trackers

Class Method Summary (collapse)

Methods included from CommonRegularExpressions

anchored, hostname, hostname_or_ip_address, ip_address, timestamp

Methods inherited from Base

#captures?, format_definition, #initialize, line_definer, line_definition, #line_divider, #max_line_length, #parse_line, report, report_definer, #request, #request_class, #setup_environment, #valid_line_definitions?, #valid_request_class?, #well_formed?

Constructor Details

This class inherits a constructor from RequestLogAnalyzer::FileFormat::Base

Class Method Details

+ (Object) access_line_definition(format_string)

Creates the access log line definition based on the Apache log format string



68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# File 'lib/request_log_analyzer/file_format/apache.rb', line 68

def self.access_line_definition(format_string)
  format_string ||= :common
  format_string   = LOG_FORMAT_DEFAULTS[format_string.to_sym] || format_string

  line_regexp = ''
  captures    = []
  format_string.scan(/([^%]*)(?:%(?:\{([^\}]+)\})?>?([A-Za-z%]))?/) do |literal, arg, variable|

    line_regexp << Regexp.quote(literal) # Make sure to parse the literal before the directive

    if variable
      # Check if we recognize the log directive
      directive = LOG_DIRECTIVES[variable][arg] rescue nil

      if directive
        line_regexp << directive[:regexp]   # Parse the value of the directive
        captures    += directive[:captures] # Add the directive's information to the captures
      else
        puts "Apache log directive %#{arg}#{variable} is not yet supported by RLA, the field will be ignored."
        line_regexp << '.*' # Just accept any input for this literal
      end
    end
  end

  # Return a new line definition object
  return RequestLogAnalyzer::LineDefinition.new(:access, :regexp => Regexp.new(line_regexp),
                                    :captures => captures, :header => true, :footer => true)
end

+ (Object) create(*args)

Creates the Apache log format language based on a Apache log format string. It will set up the line definition and the report trackers according to the Apache access log format, which should be passed as first argument. By default, is uses the 'combined' log format.



61
62
63
64
65
# File 'lib/request_log_analyzer/file_format/apache.rb', line 61

def self.create(*args)
  access_line = access_line_definition(args.first)
  trackers = report_trackers(access_line) + report_definer.trackers
  self.new(line_definer.line_definitions.merge(:access => access_line), trackers)
end

+ (Object) report_trackers(line_definition)

Sets up the report trackers according to the fields captured by the access line definition.



98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
# File 'lib/request_log_analyzer/file_format/apache.rb', line 98

def self.report_trackers(line_definition)
  analyze = RequestLogAnalyzer::Aggregator::Summarizer::Definer.new

  analyze.timespan      if line_definition.captures?(:timestamp)
  analyze.hourly_spread if line_definition.captures?(:timestamp)

  analyze.frequency :category => :http_method, :title => "HTTP methods"  if line_definition.captures?(:http_method)
  analyze.frequency :category => :http_status, :title => "HTTP statuses" if line_definition.captures?(:http_status)
  analyze.frequency :category => lambda { |r| r.category }, :title => "Most popular URIs"    if line_definition.captures?(:path)

  analyze.frequency :category => :user_agent, :title => "User agents"    if line_definition.captures?(:user_agent)
  analyze.frequency :category => :referer,    :title => "Referers"       if line_definition.captures?(:referer)

  analyze.duration :duration => :duration,  :category => lambda { |r| r.category }, :title => 'Request duration' if line_definition.captures?(:duration)
  analyze.traffic  :traffic => :bytes_sent, :category => lambda { |r| r.category }, :title => 'Traffic'          if line_definition.captures?(:bytes_sent)

  return analyze.trackers
end