Class: RCite::TextProcessor

Inherits:
Object
  • Object
show all
Defined in:
lib/rcite/text_processor.rb

Overview

Processes a file, replacing certain preprocessor commands with citations or bibliography entries.

The TextProcessor extracts RCite commands from a text by searching for a certain regular expression and processes each command according to the rules described in #process_command. Its main method is #process_text.

Constant Summary

DEFAULT_PREPROCESSING_REGEXP =

Returns Default value for #preprocessing_regexp.

Returns:

/%%\s*(?<command>(cite|bib)\s+[^%]*)%%/m.freeze
DEFAULT_BIBLIOGRAPHY_REGEXP =

Returns Default value for #bibliography_regexp.

Returns:

/%%\s*(?<command>bibliography[^%]*)%%/m.freeze
COMMAND_SYNTAX_REGEXP =

Describes the syntax for commands. A command is the string inside the #preprocessing_regexp. It includes the following information:

  1. which command to call -- cite or bib. Required for the first command in a preprocessing directive, optional for any commands that follow.
  2. the BibTeX key of the text to be cited/bib'd
  3. the page that should be cited (optional)
  4. additional fields as a YAML inline hash (optional)

In other words, the syntax is:

command   ::== cite|bib key [page] [hash[, hash]*]
key       ::== anything_but_whitespace+
page      ::== anything_but_whitespace+|"anything_but_quote+"
hash      ::== hash_key: hash_val+|"anything_but_quote+"
hash_key  ::== hash_char+
hash_char ::== letter|number|_|-
hash_val  ::== anything_but_comma*|"anything_but_quote+"

Spaces stand for any whitespace. Strings that are enclosed in quotation marks can use '\"' to mask them.

Examples:

Valid commands

'cite rauber2008 25        title: "new, title", author: new author'
'bib  rauber_08  25--37                                           '
'cite rauber-08            shorttitle: short title                '
'cite rauber-08  "ยง2 Rn.2" shorttitle: short title                '

Returns:

  • (Regexp)

    the command regexp.

self.command_syntax_regexp.freeze

Instance Attribute Summary (collapse)

Class Method Summary (collapse)

Instance Method Summary (collapse)

Constructor Details

- (TextProcessor) initialize

Creates a new RCite::TextProcessor. #command_processor is initialised with a new Processor.



116
117
118
119
120
121
# File 'lib/rcite/text_processor.rb', line 116

def initialize
  @command_processor    = Processor.new
  @preprocessing_regexp = DEFAULT_PREPROCESSING_REGEXP
  @bibliography_regexp  = DEFAULT_BIBLIOGRAPHY_REGEXP
  @cited_texts = []
end

Instance Attribute Details

- (Regexp) bibliography_regexp

Regular expression that describes the special bibliography preprocessing command. This regexp should not attempt to determine whether the command syntax is correct.

Returns:

  • (Regexp)

    the bibliography regexp.



39
40
41
# File 'lib/rcite/text_processor.rb', line 39

def bibliography_regexp
  @bibliography_regexp
end

- (Processor) command_processor

The processor that creates the actual citations or bibliography entries. RCite::TextProcessor basically just parses the preprocessing commands and then calls Processor#cite or Processor#bib with the extracted parameters.

Must have a group named command that contains a command which #process_command can use.

Returns:



21
22
23
# File 'lib/rcite/text_processor.rb', line 21

def command_processor
  @command_processor
end

- (Regexp) preprocessing_regexp

Regular Expression that describes a preprocessing command in the text. This regexp should not attempt to determine whether the command syntax is correct. It must not match text that is matched by #bibliography_regexp.

Must have a group named command that contains a command which #process_command can use.

Returns:

  • (Regexp)

    the preprocessing regexp.



32
33
34
# File 'lib/rcite/text_processor.rb', line 32

def preprocessing_regexp
  @preprocessing_regexp
end

Class Method Details

+ (Regexp) command_syntax_regexp

Generates the regular expression that is used to parse commands. This method always returns the exact same result and therefore behaves much like a constant. It is only there to make the regexp more readable and its various parts more transparent.

For command syntax and such, see COMMAND_SYNTAX_REGEXP.

Returns:

  • (Regexp)

    A regular expression that describes the syntax of preprocessing commands.



62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
# File 'lib/rcite/text_processor.rb', line 62

def self.command_syntax_regexp
  cmd           = /(?<command>bib|cite)/
  key           = /(?<key>[^\s]+)/
  lit_string    = /(?:"(?:[^"]|(?:(?<=\\)"))+")/ # matches '"s"' and '"s\""'
  page          = /(?<page>(?:[^\s]+)|#{lit_string})/
  hash_key      = /[a-zA-Z0-9_\-]+/
  hash_val      = /(?:(?:[^,]+)|#{lit_string})/
  hash_elem     = /#{hash_key}:\s*#{hash_val}/
  hash          = /(?<fields>#{hash_elem}(?:\s*,\s*(?:#{hash_elem}))*)/
  result        = /^\s*
                   (?:#{cmd}\s+)?
                   #{key}
                   (?:\s+#{page})?
                   (?:\s+#{hash})?
                   \s*$/mx
end

Instance Method Details

- (String) process_command(command)

Replaces a preprocessing command with the corresponding citation or bibliography entry. Extracts command parameters from command using COMMAND_SYNTAX_REGEXP and executes the corresponding command of the #command_processor.

If any errors occur while parsing the command or generating the citation/bibliography entry, this method returns a string indicating that there were errors.

Parameters:

  • command (String)

    The command that should be parsed.

Returns:

  • (String)

    The citation/bibliography entry, or a string indicating that an error occured.



172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
# File 'lib/rcite/text_processor.rb', line 172

def process_command(command)
  style = @command_processor.style
  cmd = nil
  result = []

  # subcommands are separated by the pipe symbol
  command.split("|").each do |subcommand|
    m = subcommand.match(COMMAND_SYNTAX_REGEXP)
    return "%%SYNTAX ERROR%%" unless m

    cmd ||= m[:command] # the command is parsed only for the first subcmd
    return '%%SYNTAX ERROR: no command specified%%' unless cmd
    cmd = cmd.to_sym

    key, page, fields = m[:key], m[:page], m[:fields]
    page.gsub!(/^"(.*?)"$/m, '\1') if page

    result << generate_cite_bib(cmd, key, page, fields)

    @cited_texts << @command_processor.bibliography[key] if cmd == :cite
  end # each subcommand

  a_all = style.public_send("_around_all_#{cmd}s")
  between = style.public_send("_between_#{cmd}s").to_s

  a_all[0].to_s + result.join(between) + a_all[1].to_s
end

- (Object) process_file(file)

Reads the contents of file and processes them, returning the processed text. Convenience wrapper for #process_text.

Parameters:

  • file (String, IO)

    Path to the file or IO object.

Returns:



131
132
133
# File 'lib/rcite/text_processor.rb', line 131

def process_file(file)
  process_text(File.read(file))
end

- (String) process_text(text)

Replaces every occurence of #preprocessing_regex in text with the output of #process_command. Only the group named command from preprocessing_regexp is passed to process_command as a parameter.

Parameters:

  • text (String)

    Any string.

Returns:

  • (String)

    The original text with all preprocessing commands replaced by citations or bibliography entries. If no preprocessing commands were found, this is the unchanged text.



146
147
148
149
150
151
152
153
154
155
156
157
# File 'lib/rcite/text_processor.rb', line 146

def process_text(text)
  @cited_texts = []
  result = text.dup
  result.gsub!(@preprocessing_regexp) do |m|
    process_command($~[:command])
  end
  result.gsub!(@bibliography_regexp) do |m|
    generate_bibliography
  end
  @cited_texts = nil
  result
end