Class: RLTK::Lexer::LexerCore

Inherits:
Object
  • Object
show all
Defined in:
lib/rltk/lexer.rb

Overview

The LexerCore class provides most of the functionality of the Lexer class. A LexerCore is instantiated for each subclass of Lexer, thereby allowing multiple lexers to be defined inside a single Ruby program.

Instance Attribute Summary (collapse)

Instance Method Summary (collapse)

Constructor Details

- (LexerCore) initialize

Instantiate a new LexerCore object.



106
107
108
109
110
# File 'lib/rltk/lexer.rb', line 106

def initialize
	@match_type	= :longest
	@rules		= Hash.new {|h,k| h[k] = Array.new}
	@start_state	= :default
end

Instance Attribute Details

- (Object) start_state (readonly)

Returns the value of attribute start_state



103
104
105
# File 'lib/rltk/lexer.rb', line 103

def start_state
  @start_state
end

Instance Method Details

- (Object) lex(string, env, file_name = nil)

Lex string, using env as the environment. This method will return the array of tokens generated by the lexer with a token of type EOS (End of Stream) appended to the end.



115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
# File 'lib/rltk/lexer.rb', line 115

def lex(string, env, file_name = nil)
	# Offset from start of stream.
	stream_offset = 0

	# Offset from the start of the line.
	line_offset = 0
	line_number = 1
	
	# Empty token list.
	tokens = Array.new
	
	# The scanner.
	scanner = StringScanner.new(string)
	
	# Start scanning the input string.
	until scanner.eos?
		match = nil
		
		# If the match_type is set to :longest all of the
		# rules for the current state need to be scanned
		# and the longest match returned.  If the
		# match_type is :first, we only need to scan until
		# we find a match.
		@rules[env.state].each do |rule|
			if (rule.flags - env.flags).empty?
				if txt = scanner.check(rule.pattern)
					if not match or match.first.length < txt.length
						match = [txt, rule]
						
						break if @match_type == :first
					end
				end
			end
		end
		
		if match
			rule = match.last
			
			txt = scanner.scan(rule.pattern)
			type, value = env.instance_exec(txt, &rule.action)
			
			if type
				pos = StreamPosition.new(stream_offset, line_number, line_offset, txt.length, file_name)
				tokens << Token.new(type, value, pos) 
			end
			
			# Advance our stat counters.
			stream_offset += txt.length
			
			if (newlines = txt.count("\n")) > 0
				line_number += newlines
				line_offset  = 0
			else
				line_offset += txt.length()
			end
		else
			error = LexingError.new(stream_offset, line_number, line_offset, scanner.post_match)
			raise(error, 'Unable to match string with any of the given rules')
		end
	end
	
	return tokens << Token.new(:EOS)
end

- (Object) lex_file(file_name, evn)

A wrapper function that calls ParserCore.lex on the contents of a file.



181
182
183
# File 'lib/rltk/lexer.rb', line 181

def lex_file(file_name, evn)
	File.open(file_name, 'r') { |f| lex(f.read, env, file_name) }
end

- (Object) match_first

Used to tell a lexer to use the first match found instead of the longest match found.



187
188
189
# File 'lib/rltk/lexer.rb', line 187

def match_first
	@match_type = :first
end

- (Object) rule(pattern, state = :default, flags = [], &action) Also known as: r

This method is used to define a new lexing rule. The first argument is the regular expression used to match substrings of the input. The second argument is the state to which the rule belongs. Flags that need to be set for the rule to be considered are specified by the third argument. The last argument is a block that returns a type and value to be used in constructing a Token. If no block is specified the matched substring will be discarded and lexing will continue.



200
201
202
203
204
205
206
207
208
# File 'lib/rltk/lexer.rb', line 200

def rule(pattern, state = :default, flags = [], &action)
	# If no action is given we will set it to an empty
	# action.
	action ||= Proc.new() {}
	
	r = Rule.new(pattern, action, state, flags)
	
	if state == :ALL then @rules.each_key { |k| @rules[k] << r } else @rules[state] << r end
end

- (Object) start(state)

Changes the starting state of the lexer.



213
214
215
# File 'lib/rltk/lexer.rb', line 213

def start(state)
	@start_state = state
end