Class: RCite::TextProcessor
- Inherits:
-
Object
- Object
- RCite::TextProcessor
- Defined in:
- lib/rcite/text_processor.rb
Overview
Processes a file, replacing certain preprocessor commands with citations or bibliography entries.
The TextProcessor extracts RCite commands from a text by searching for
a certain regular expression and processes each command according to the
rules described in #process_command. Its main method is #process_text.
Constant Summary
- DEFAULT_PREPROCESSING_REGEXP =
Default value for #preprocessing_regexp.
/%%\s*(?<command>(cite|bib)\s+[^%]*)%%/m.freeze
- DEFAULT_BIBLIOGRAPHY_REGEXP =
Default value for #bibliography_regexp.
/%%\s*(?<command>bibliography[^%]*)%%/m.freeze
- COMMAND_SYNTAX_REGEXP =
Describes the syntax for commands. A command is the string inside the #preprocessing_regexp. It includes the following information:
- which command to call -- cite or bib. Required for the first command in a preprocessing directive, optional for any commands that follow.
- the BibTeX key of the text to be cited/bib'd
- the page that should be cited (optional)
- additional fields as a YAML inline hash (optional)
In other words, the syntax is:
command ::== cite|bib key [page] [hash[, hash]*] key ::== anything_but_whitespace+ page ::== anything_but_whitespace+|"anything_but_quote+" hash ::== hash_key: hash_val+|"anything_but_quote+" hash_key ::== hash_char+ hash_char ::== letter|number|_|- hash_val ::== anything_but_comma*|"anything_but_quote+"Spaces stand for any whitespace. Strings that are enclosed in quotation marks can use
'\"'to mask them. self.command_syntax_regexp.freeze
Instance Attribute Summary (collapse)
-
- (Regexp) bibliography_regexp
Regular expression that describes the special
bibliographypreprocessing command. -
- (Processor) command_processor
The processor that creates the actual citations or bibliography entries.
-
- (Regexp) preprocessing_regexp
Regular Expression that describes a preprocessing command in the text.
Class Method Summary (collapse)
-
+ (Regexp) command_syntax_regexp
Generates the regular expression that is used to parse commands.
Instance Method Summary (collapse)
-
- (TextProcessor) initialize
constructor
Creates a new TextProcessor.
-
- (String) process_command(command)
Replaces a preprocessing command with the corresponding citation or bibliography entry.
-
- (Object) process_file(file)
Reads the contents of
fileand processes them, returning the processed text. -
- (String) process_text(text)
Replaces every occurence of #preprocessing_regex in
textwith the output of #process_command.
Constructor Details
- (TextProcessor) initialize
Creates a new RCite::TextProcessor. #command_processor is initialised with a new Processor.
116 117 118 119 120 121 |
# File 'lib/rcite/text_processor.rb', line 116 def initialize @command_processor = Processor.new @preprocessing_regexp = DEFAULT_PREPROCESSING_REGEXP @bibliography_regexp = DEFAULT_BIBLIOGRAPHY_REGEXP @cited_texts = [] end |
Instance Attribute Details
- (Regexp) bibliography_regexp
Regular expression that describes the special bibliography preprocessing
command. This regexp should not attempt to determine whether the command
syntax is correct.
39 40 41 |
# File 'lib/rcite/text_processor.rb', line 39 def bibliography_regexp @bibliography_regexp end |
- (Processor) command_processor
The processor that creates the actual citations or bibliography entries. RCite::TextProcessor basically just parses the preprocessing commands and then calls Processor#cite or Processor#bib with the extracted parameters.
Must have a group named command that contains a command which
#process_command can use.
21 22 23 |
# File 'lib/rcite/text_processor.rb', line 21 def command_processor @command_processor end |
- (Regexp) preprocessing_regexp
Regular Expression that describes a preprocessing command in the text. This regexp should not attempt to determine whether the command syntax is correct. It must not match text that is matched by #bibliography_regexp.
Must have a group named command that contains a command which
#process_command can use.
32 33 34 |
# File 'lib/rcite/text_processor.rb', line 32 def preprocessing_regexp @preprocessing_regexp end |
Class Method Details
+ (Regexp) command_syntax_regexp
Generates the regular expression that is used to parse commands. This method always returns the exact same result and therefore behaves much like a constant. It is only there to make the regexp more readable and its various parts more transparent.
For command syntax and such, see COMMAND_SYNTAX_REGEXP.
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
# File 'lib/rcite/text_processor.rb', line 62 def self.command_syntax_regexp cmd = /(?<command>bib|cite)/ key = /(?<key>[^\s]+)/ lit_string = /(?:"(?:[^"]|(?:(?<=\\)"))+")/ # matches '"s"' and '"s\""' page = /(?<page>(?:[^\s]+)|#{lit_string})/ hash_key = /[a-zA-Z0-9_\-]+/ hash_val = /(?:(?:[^,]+)|#{lit_string})/ hash_elem = /#{hash_key}:\s*#{hash_val}/ hash = /(?<fields>#{hash_elem}(?:\s*,\s*(?:#{hash_elem}))*)/ result = /^\s* (?:#{cmd}\s+)? #{key} (?:\s+#{page})? (?:\s+#{hash})? \s*$/mx end |
Instance Method Details
- (String) process_command(command)
Replaces a preprocessing command with the corresponding citation or
bibliography entry. Extracts command parameters from command using
COMMAND_SYNTAX_REGEXP and executes the corresponding command of the
#command_processor.
If any errors occur while parsing the command or generating the
citation/bibliography entry, this method returns a string indicating that
there were errors.
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 |
# File 'lib/rcite/text_processor.rb', line 172 def process_command(command) style = @command_processor.style cmd = nil result = [] # subcommands are separated by the pipe symbol command.split("|").each do |subcommand| m = subcommand.match(COMMAND_SYNTAX_REGEXP) return "%%SYNTAX ERROR%%" unless m cmd ||= m[:command] # the command is parsed only for the first subcmd return '%%SYNTAX ERROR: no command specified%%' unless cmd cmd = cmd.to_sym key, page, fields = m[:key], m[:page], m[:fields] page.gsub!(/^"(.*?)"$/m, '\1') if page result << generate_cite_bib(cmd, key, page, fields) @cited_texts << @command_processor.bibliography[key] if cmd == :cite end # each subcommand a_all = style.public_send("_around_all_#{cmd}s") between = style.public_send("_between_#{cmd}s").to_s a_all[0].to_s + result.join(between) + a_all[1].to_s end |
- (Object) process_file(file)
Reads the contents of file and processes them, returning the processed
text. Convenience wrapper for #process_text.
131 132 133 |
# File 'lib/rcite/text_processor.rb', line 131 def process_file(file) process_text(File.read(file)) end |
- (String) process_text(text)
Replaces every occurence of #preprocessing_regex in text with the
output of #process_command. Only the group named command from
preprocessing_regexp is passed to process_command as a parameter.
146 147 148 149 150 151 152 153 154 155 156 157 |
# File 'lib/rcite/text_processor.rb', line 146 def process_text(text) @cited_texts = [] result = text.dup result.gsub!(@preprocessing_regexp) do |m| process_command($~[:command]) end result.gsub!(@bibliography_regexp) do |m| generate_bibliography end @cited_texts = nil result end |