Module: ScopedSearch::QueryLanguage::Tokenizer
- Included in:
- Compiler
- Defined in:
- lib/scoped_search/query_language/tokenizer.rb
Overview
The Tokenizer module adds methods to the query language compiler that transforms a query string into a stream of tokens, which are more appropriate for parsing a query string.
Constant Summary
- KEYWORDS =
All keywords that the language supports
{ 'and' => :and, 'or' => :or, 'not' => :not, 'set?' => :notnull, 'has' => :notnull, 'null?' => :null, 'before' => :lt, 'after' => :gt, 'at' => :eq }
- OPERATORS =
Every operator the language supports.
{ '&' => :and, '|' => :or, '&&' => :and, '||' => :or, '-'=> :not, '!' => :not, '~' => :like, '!~' => :unlike, '=' => :eq, '==' => :eq, '!=' => :ne, '<>' => :ne, '>' => :gt, '<' => :lt, '>=' => :gte, '<=' => :lte, '^' => :in, '!^' => :notin }
Instance Method Summary (collapse)
-
- (Object) current_char
Returns the current character of the string.
-
- (Object) each_token(&block)
(also: #each)
Tokenizes the string by iterating over the characters.
-
- (Object) next_char
Returns the next character of the string, and moves the position pointer one step forward.
-
- (Object) peek_char(amount = 1)
Returns a following character of the string (by default, the next character), without updating the position pointer.
-
- (Object) tokenize
Tokenizes the string and returns the result as an array of tokens.
-
- (Object) tokenize_keyword(&block)
Tokenizes a keyword, and converts it to a Symbol if it is recognized as a reserved language keyword (the KEYWORDS array).
-
- (Object) tokenize_operator {|OPERATORS[operator]| ... }
Tokenizes an operator that occurs in the OPERATORS hash The .to_s on [peek|next]_char is to prevent a ruby bug when nil values are returned from strings which have forced encoding.
-
- (Object) tokenize_quoted_keyword {|keyword| ... }
Tokenizes a keyword that is quoted using double quotes.
Instance Method Details
- (Object) current_char
Returns the current character of the string
19 20 21 |
# File 'lib/scoped_search/query_language/tokenizer.rb', line 19 def current_char @current_char end |
- (Object) each_token(&block) Also known as: each
Tokenizes the string by iterating over the characters.
37 38 39 40 41 42 43 44 45 46 47 48 49 |
# File 'lib/scoped_search/query_language/tokenizer.rb', line 37 def each_token(&block) while next_char case current_char when /^\s?$/; # ignore when '('; yield(:lparen) when ')'; yield(:rparen) when ','; yield(:comma) when /\&|\||=|<|>|\^|!|~|-/; tokenize_operator(&block) when '"'; tokenize_quoted_keyword(&block) else; tokenize_keyword(&block) end end end |
- (Object) next_char
Returns the next character of the string, and moves the position pointer one step forward
31 32 33 34 |
# File 'lib/scoped_search/query_language/tokenizer.rb', line 31 def next_char @current_char_pos += 1 @current_char = @str[@current_char_pos, 1] end |
- (Object) peek_char(amount = 1)
Returns a following character of the string (by default, the next character), without updating the position pointer.
25 26 27 |
# File 'lib/scoped_search/query_language/tokenizer.rb', line 25 def peek_char(amount = 1) @str[@current_char_pos + amount, 1] end |
- (Object) tokenize
Tokenizes the string and returns the result as an array of tokens.
13 14 15 16 |
# File 'lib/scoped_search/query_language/tokenizer.rb', line 13 def tokenize @current_char_pos = -1 to_a end |
- (Object) tokenize_keyword(&block)
Tokenizes a keyword, and converts it to a Symbol if it is recognized as a reserved language keyword (the KEYWORDS array).
63 64 65 66 67 |
# File 'lib/scoped_search/query_language/tokenizer.rb', line 63 def tokenize_keyword(&block) keyword = current_char keyword << next_char while /[^=~<>\s\&\|\)\(,]/ =~ peek_char KEYWORDS.has_key?(keyword.downcase) ? yield(KEYWORDS[keyword.downcase]) : yield(keyword) end |
- (Object) tokenize_operator {|OPERATORS[operator]| ... }
Tokenizes an operator that occurs in the OPERATORS hash The .to_s on [peek|next]_char is to prevent a ruby bug when nil values are returned from strings which have forced encoding. github.com/wvanbergen/scoped_search/issues/33 for details
55 56 57 58 59 |
# File 'lib/scoped_search/query_language/tokenizer.rb', line 55 def tokenize_operator(&block) operator = current_char operator << next_char.to_s if OPERATORS.has_key?(operator + peek_char.to_s) yield(OPERATORS[operator]) end |
- (Object) tokenize_quoted_keyword {|keyword| ... }
Tokenizes a keyword that is quoted using double quotes. Allows escaping of double quote characters by backslashes.
71 72 73 74 75 76 77 |
# File 'lib/scoped_search/query_language/tokenizer.rb', line 71 def tokenize_quoted_keyword(&block) keyword = "" until next_char.nil? || current_char == '"' keyword << (current_char == "\\" ? next_char : current_char) end yield(keyword) end |