Module: ScopedSearch::QueryLanguage::Tokenizer

Included in:
Compiler
Defined in:
lib/scoped_search/query_language/tokenizer.rb

Overview

The Tokenizer module adds methods to the query language compiler that transforms a query string into a stream of tokens, which are more appropriate for parsing a query string.

Constant Summary

KEYWORDS =

All keywords that the language supports

{ 'and' => :and, 'or' => :or, 'not' => :not, 'set?' => :notnull, 'has' => :notnull, 'null?' => :null,  'before' => :lt, 'after' => :gt, 'at' => :eq }
OPERATORS =

Every operator the language supports.

{ '&' => :and, '|' => :or, '&&' => :and, '||' => :or, '-'=> :not, '!' => :not, '~' => :like, '!~' => :unlike,
'=' => :eq, '==' => :eq, '!=' => :ne, '<>' => :ne, '>' => :gt, '<' => :lt, '>=' => :gte, '<=' => :lte, '^' => :in, '!^' => :notin }

Instance Method Summary (collapse)

Instance Method Details

- (Object) current_char

Returns the current character of the string



19
20
21
# File 'lib/scoped_search/query_language/tokenizer.rb', line 19

def current_char
  @current_char
end

- (Object) each_token(&block) Also known as: each

Tokenizes the string by iterating over the characters.



37
38
39
40
41
42
43
44
45
46
47
48
49
# File 'lib/scoped_search/query_language/tokenizer.rb', line 37

def each_token(&block)
  while next_char
    case current_char
    when /^\s?$/; # ignore
    when '(';  yield(:lparen)
    when ')';  yield(:rparen)
    when ',';  yield(:comma)
    when /\&|\||=|<|>|\^|!|~|-/;  tokenize_operator(&block)
    when '"';                  tokenize_quoted_keyword(&block)
    else;                      tokenize_keyword(&block)
    end
  end
end

- (Object) next_char

Returns the next character of the string, and moves the position pointer one step forward



31
32
33
34
# File 'lib/scoped_search/query_language/tokenizer.rb', line 31

def next_char
  @current_char_pos += 1
  @current_char = @str[@current_char_pos, 1]
end

- (Object) peek_char(amount = 1)

Returns a following character of the string (by default, the next character), without updating the position pointer.



25
26
27
# File 'lib/scoped_search/query_language/tokenizer.rb', line 25

def peek_char(amount = 1)
  @str[@current_char_pos + amount, 1]
end

- (Object) tokenize

Tokenizes the string and returns the result as an array of tokens.



13
14
15
16
# File 'lib/scoped_search/query_language/tokenizer.rb', line 13

def tokenize
  @current_char_pos = -1
  to_a
end

- (Object) tokenize_keyword(&block)

Tokenizes a keyword, and converts it to a Symbol if it is recognized as a reserved language keyword (the KEYWORDS array).



63
64
65
66
67
# File 'lib/scoped_search/query_language/tokenizer.rb', line 63

def tokenize_keyword(&block)
  keyword = current_char
  keyword << next_char while /[^=~<>\s\&\|\)\(,]/ =~ peek_char
  KEYWORDS.has_key?(keyword.downcase) ? yield(KEYWORDS[keyword.downcase]) : yield(keyword)
end

- (Object) tokenize_operator {|OPERATORS[operator]| ... }

Tokenizes an operator that occurs in the OPERATORS hash The .to_s on [peek|next]_char is to prevent a ruby bug when nil values are returned from strings which have forced encoding. github.com/wvanbergen/scoped_search/issues/33 for details

Yields:



55
56
57
58
59
# File 'lib/scoped_search/query_language/tokenizer.rb', line 55

def tokenize_operator(&block)
  operator = current_char
  operator << next_char.to_s if OPERATORS.has_key?(operator + peek_char.to_s)
  yield(OPERATORS[operator])
end

- (Object) tokenize_quoted_keyword {|keyword| ... }

Tokenizes a keyword that is quoted using double quotes. Allows escaping of double quote characters by backslashes.

Yields:

  • (keyword)


71
72
73
74
75
76
77
# File 'lib/scoped_search/query_language/tokenizer.rb', line 71

def tokenize_quoted_keyword(&block)
  keyword = ""
  until next_char.nil? || current_char == '"'
    keyword << (current_char == "\\" ? next_char : current_char)
  end
  yield(keyword)
end