Module: RequestLogAnalyzer::FileFormat::CommonRegularExpressions

Included in:
AmazonS3, Apache, DelayedJob2, DelayedJob21, DelayedJob3, Haproxy, Merb, Mysql, Postgresql, Rails, Rails3, W3c
Defined in:
lib/request_log_analyzer/file_format.rb

Overview

This module contains some methods to construct regular expressions for log fragments that are commonly used, like IP addresses and timestamp.

You need to extend (or include in an unlikely case) this module in your file format to use these regular expression constructors.

Constant Summary

TIMESTAMP_PARTS =
{
  'a' => '(?:Mon|Tue|Wed|Thu|Fri|Sat|Sun)',
  'b' => '(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)',
  'y' => '\d{2}', 'Y' => '\d{4}', 'm' => '\d{2}', 'd' => '\d{2}',
  'H' => '\d{2}', 'M' => '\d{2}', 'S' => '\d{2}', 'k' => '(?:\d| )\d',
  'z' => '(?:[+-]\d{4}|[A-Z]{3,4})',
  'Z' => '(?:[+-]\d{4}|[A-Z]{3,4})',
  '%' => '%'
}

Instance Method Summary (collapse)

Instance Method Details

- (Object) anchored(regexp)



179
180
181
# File 'lib/request_log_analyzer/file_format.rb', line 179

def anchored(regexp)
  /^#{regexp}$/
end

- (Object) hostname(blank = false)

Creates a regular expression to match a hostname



130
131
132
133
# File 'lib/request_log_analyzer/file_format.rb', line 130

def hostname(blank = false)
  regexp = /(?:(?:[a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*(?:[A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])/
  add_blank_option(regexp, blank)
end

- (Object) hostname_or_ip_address(blank = false)

Creates a regular expression to match a hostname or ip address



136
137
138
139
# File 'lib/request_log_analyzer/file_format.rb', line 136

def hostname_or_ip_address(blank = false)
  regexp = Regexp.union(hostname, ip_address)
  add_blank_option(regexp, blank)
end

- (Object) ip_address(blank = false)

Construct a regular expression to parse IPv4 and IPv6 addresses.

Allow nil values if the blank option is given. This can be true to allow an empty string or to a string substitute for the nil value.



165
166
167
168
169
170
171
172
173
174
175
176
177
# File 'lib/request_log_analyzer/file_format.rb', line 165

def ip_address(blank = false)

  # IP address regexp copied from Resolv::IPv4 and Resolv::IPv6, 
  # but adjusted to work for the purpose of request-log-analyzer.
  ipv4_regexp                     = /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/
  ipv6_regex_8_hex                = /(?:[0-9A-Fa-f]{1,4}:){7}[0-9A-Fa-f]{1,4}/
  ipv6_regex_compressed_hex       = /(?:(?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)::(?:(?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)/
  ipv6_regex_6_hex_4_dec          = /(?:(?:[0-9A-Fa-f]{1,4}:){6})#{ipv4_regexp}/
  ipv6_regex_compressed_hex_4_dec = /(?:(?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)::(?:(?:[0-9A-Fa-f]{1,4}:)*)#{ipv4_regexp}/
  ipv6_regexp                     = Regexp.union(ipv6_regex_8_hex, ipv6_regex_compressed_hex, ipv6_regex_6_hex_4_dec, ipv6_regex_compressed_hex_4_dec)

  add_blank_option(Regexp.union(ipv4_regexp, ipv6_regexp), blank)
end

- (Object) timestamp(format_string, blank = false)

Create a regular expression for a timestamp, generated by a strftime call. Provide the format string to construct a matching regular expression. Set blank to true to allow and empty string, or set blank to a string to set a substitute for the nil value.



145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
# File 'lib/request_log_analyzer/file_format.rb', line 145

def timestamp(format_string, blank = false)
  regexp = ''
  format_string.scan(/([^%]*)(?:%([A-Za-z%]))?/) do |literal, variable|
    regexp << Regexp.quote(literal)
    if variable
      if TIMESTAMP_PARTS.has_key?(variable)
        regexp << TIMESTAMP_PARTS[variable]
      else
        raise "Unknown variable: %#{variable}"
      end
    end
  end

  add_blank_option(Regexp.new(regexp), blank)
end