Method: Traject::Macros::Marc21#extract_marc

Defined in:
lib/traject/macros/marc21.rb

#extract_marc(spec, options = {}) ⇒ Object (private)

A macro that will extract data from marc according to a string field/substring specification.

First argument is a string spec suitable for the MarcExtractor, see Traject::MarcExtractor::Spec.

Second arg is optional options, including options valid on MarcExtractor.new, and others. By default, will de-duplicate results, but see :allow_duplicates

  • :allow_duplicates => boolean, default false, if set to true then will avoid de-duplicating the result array (array.uniq!)

  • :separator: (default ' ' (space)), what to use when joining multiple subfield matches from same field. Set to nil to leave them as separate values (which is actually default if only one subfield is given in spec, like 100a). See MarcExtractor docs for more info.

  • :alternate_script: (default true). True, automatically include 'alternate script' MARC 880 linked fields corresponding to matched specifications. false, do not include. :only include only linked 880s corresponding to spec, not base tags.

Soft-Deprecated options: post-processing transformations

These don't produce a deprecation warning and there is no planned horizon for them to go away, but the alternative of using additional transformation macros (from Traject::Macros::Transformation) composed with extract_marc is recommended.

  • :first => true: take only first value. Instead, use extract_marc(whatever), first_only

  • :translation_map => String: translate with named translation map looked up in load path, uses Tranject::TranslationMap.new(translation_map_arg). Instead, use extract_marc(whatever), translation_map(translation_map_arg)

  • :trim_punctuation => true; trims leading/trailing punctuation using standard algorithms that have shown themselves useful with Marc, using Marc21.trim_punctuation. Instead, use extract_marc(whatever), trim_punctuation

  • :default => String: if otherwise empty, add default value. Instead, use extract_marc(whatever), default("default value")

Examples:

to_field("title"), extract_marc("245abcd"), trim_punctuation
to_field("id"),    extract_marc("001"), first_only
to_field("geo"),   extract_marc("040a", :separator => nil), translation_map("marc040")

If you'd like extract_marc functionality but you're not creating an indexer step, see Traject::Macros::Marc21.extract_marc_from module method.


62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
# File 'lib/traject/macros/marc21.rb', line 62

def extract_marc(spec, options = {})

  # Raise an error if there are any invalid options, indicating a
  # misspelled or illegal option, using a string instead of a symbol, etc.

  unless (options.keys - EXTRACT_MARC_VALID_OPTIONS).empty?
    raise RuntimeError.new("Illegal/Unknown argument '#{(options.keys - EXTRACT_MARC_VALID_OPTIONS).join(', ')}' in extract_marc at #{Traject::Util.extract_caller_location(caller.first)}")
  end


  # We create the TranslationMap and the MarcExtractor here
  # on load, so the lambda can just refer to already created
  # ones, and not have to create a new one per-execution.
  #
  # Benchmarking shows for MarcExtractor at least, there is
  # significant performance advantage.

  if translation_map_arg  = options.delete(:translation_map)
    translation_map = Traject::TranslationMap.new(translation_map_arg)
  else
    translation_map = nil
  end


  extractor = Traject::MarcExtractor.new(spec, options)

  lambda do |record, accumulator, context|
    accumulator.concat extractor.extract(record)
    Marc21.apply_extraction_options(accumulator, options, translation_map)
  end
end