Module: Ratistics::Load::Csv

Extended by:
Csv
Included in:
Csv
Defined in:
lib/ratistics/load/csv.rb

Instance Method Summary collapse

Instance Method Details

#data(contents, opts = {}) ⇒ Array, Hamster

Convert an string representing multiple CSV records into an array of Ruby data structures suitable for further processing. Leading and trailing whitespace will be trimmed from all values.

The second parameter is an optional record definition which describes individual fields in the CSV record. There is one element in the record definition for each field in the record. When a definition is omitted the record will be returned as an array with one element for every field in the CSV. The array will be ordered according to the original record. When a record definition is given the record will be returned as a hash with one key for every field in the definition. If there are fewer fields in the definition than in the record only the first n fields will be returned, where n is the number of fields in the definition. Fields defined as nil will also be skipped.

Each field in the definition can consist of up to two values. When two values are given the field definition must be an array. When only one value is given the field can be a single-element array or just the value. The first value for each field definition can be any data type. It is the field name (key for the returned hash). The second (optional) value must be either a symbol or a lambda. When a symbol is given the corresponding method will be called on the data value before being returned. For example, to convert the data to an integer pass :to_i as the second field element and the #to_i method will be called. When the second element is a lambda the block must accept exactly one parameter. The lambda will be called for the field and the string field value will be passed as the block the argument. The use of lambdas this way allows for complex field processing.

By default the return value is a Ruby Hash. If the Hamster gem is installed a Hamster collection can be returned instead. To return a Hamster collection set the :hamster option to true. Optionally, a specific Hamster class can be specified by setting the :hamster option to a symbol specifying the type to return. For example, :hamster => :set will set the return type to Hamster::Set. The default Hamster return type is Hamster::Vector.

Examples:

Simple field definition

definition = [
  :place,
  :div_tot,
  :div,
  :guntime,
  :nettime,
  :pace,
  :name,
  :age,
  :gender,
  :race_num,
  :city_state
]

Complex field definition

definition = [
  [:place, lambda {|i| i.to_i}],
  nil,
  :div,
  :guntime,
  :nettime,
  :pace,
  [:name],
  [:age, :to_i],
  [:gender],
  [:race_num, :to_i],
  [:city_state]
]
data = Ratistics::Load.csv_data(path)
data = Ratistics::Load.csv_data(path, :def => definition)
data = Ratistics::Load.csv_data(path, :hamster => true)
data = Ratistics::Load.csv_data(path, :def => definition, :hamster => true)
data = Ratistics::Load.csv_data(path, :hamster => :set)
data = Ratistics::Load.csv_data(path, :def => definition, :hamster => :set)

Options Hash (opts):

  • :definition (Array)

    the record definition for processing individual fields (see above)

  • :hamster (Symbol) — default: false

    set to true to return a Hamster collection, or indicate a specific Hamster return type

  • :col_sep (Character)

    column separator (default: ',')

  • :row_sep (Character)

    row separator (default: $/)

  • :quote_char (Character)

    quote character (default: '"')

  • :headers (true, false)

    the first row of the data/file = contains field name headers (default: false)

  • :as (Symbol)

    the data type/structure of the individual records, :hash/:map (default), :array/:catalog/:catalogue


144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
# File 'lib/ratistics/load/csv.rb', line 144

def data(contents, opts = {})
  if opts[:encoding] == :force
    contents = contents.force_encoding('ISO-8859-1').encode('utf-8', :replace => nil)
  end
  definition = opts[:def] || opts[:definition]

  if opts[:as] == :array || opts[:as] == :catalog || opts[:as] == :catalogue
    if definition.nil?
      return catalog_from_data_using_headers(contents, opts)
    else
      return catalog_from_data_using_definition(contents, opts)
    end
  else
    if definition.nil?
      return hash_from_data_using_headers(contents, opts)
    else
      return hash_from_data_using_definition(contents, opts)
    end
  end
end

#file(path, opts = {}) ⇒ Object

Convert a gzipped CSV file into an array of Ruby data structures suitable for further processing.

See Also:

  • #csv_data

25
26
27
28
# File 'lib/ratistics/load/csv.rb', line 25

def file(path, opts = {})
  contents = Ratistics::Load.file_contents(path)
  return data(contents, opts)
end

#gz_file(path, opts = {}) ⇒ Object

Convert a gzipped CSV file into an array of Ruby data structures suitable for further processing.

See Also:

  • #csv_data

43
44
45
46
# File 'lib/ratistics/load/csv.rb', line 43

def gz_file(path, opts = {})
  contents = Ratistics::Load.gz_contents(path)
  return data(contents, opts)
end