Class: DocParser::Parser

Inherits:
Object
  • Object
show all
Defined in:
lib/docparser/parser.rb

Overview

The main parser class. This is the class you'll use to create your parser The real work happens in the Document class

See Also:

Instance Method Summary collapse

Constructor Details

#initialize(files: [], quiet: false, encoding: 'utf-8', parallel: true, output: nil, range: nil, num_processes: Parallel.processor_count + 1) ⇒ Parser

Creates a new Parser instance


44
45
46
47
48
49
50
51
52
53
54
55
56
57
# File 'lib/docparser/parser.rb', line 44

def initialize(files: [], quiet: false, encoding: 'utf-8', parallel: true,
               output: nil, range: nil,
               num_processes: Parallel.processor_count + 1)
  @num_processes = parallel ? num_processes : 1
  @files = range ? files[range] : files
  @encoding = encoding

  Log4r::Logger['docparser'].level = quiet ? Log4r::ERROR : Log4r::INFO

  initialize_outputs output

  @logger =  Log4r::Logger.new('docparser::parser')
  @logger.info "DocParser v#{VERSION} loaded"
end

Instance Method Details

#parse!(&block) ⇒ Object

Parses the `files`

Accepts a block which is executed for each document in the Document context where you can access the content using Nokogiri.

See Also:


65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# File 'lib/docparser/parser.rb', line 65

def parse!(&block)
  @logger.info "Parsing #{@files.length} files (encoding: #{@encoding})."
  start_time = Time.now

  if @num_processes > 1
    parallel_process(&block)
  else
    serial_process(&block)
  end

  @logger.info 'Processing finished'

  write_to_outputs

  @logger.info format('Done processing in %.2fs.', Time.now - start_time)
end