RStore

A library for easy batch storage of csv data into a database

Uses the CSV standard library for parsing, Nokogiri for URL handling, and Sequel ORM for database management.

Special Features

Database Requirements

  1. Expects the database table to have an addition column storing an auto-incrementing primary key.
  2. Requires the database to support transactions: Most other database platforms support transactions natively. In MySQL, you'll need to be running InnoDB or BDB table types rather than the more common MyISAM. If you are using MySQL and the table has not been created yet, RStore::CSV will take care of using the correct table type upon creation.

Installation

$ gem install rstore

Note: As RStore depends on Nokogiri for fetching data from URLs, you need to install Nokogiri first to use this feature. However, on some operating systems there can be problems due to missing libraries, so you might want to take a look at the following installation instructions:

Debian Users of Debian Linux (e.g. Ubuntu) need to run:

$ sudo apt-get install libxslt1-dev libxml2-dev

$ gem install nokogiri

Mac OS X The following instruction should work, but I haven't tested them personally

$ sudo port install libxml2 libxslt

$ gem install nokogiri

Source: Installing Nokogiri

If you have any difficulties installing Nokogiri, please let me know, so that I can help you.

Public API Documentation

The documentation is hosted on RubyDoc.info: RStore Public API documentation.

Sample Usage

Sample csv file

"product","quantity","price","created_at","min_demand","max_demand","on_stock" "toy1","1","1.12","2011-2-4","1:30","1:30am","true" "toy2","2","2.22","2012/2/4","2:30","2:30pm","false "toy3","3","3.33","2013/2/4","3:30","3:30 a.m.","True "toy4","4",,,"4:30","4:30 p.m.","False" "toy4","5","5.55","2015-2-4","5:30","5:30AM","1" "toy5","6","6.66","2016/2/4","6:30","6:30 P.M.","0" "toy6","7","7.77",,,,"false"

1) Load gem


require 'rstore/csv'

2) Store database information in a subclass of RStore::BaseDB Naming convention: name => NameDB


class CompanyDB < RStore::BaseDB

  # Same as Sequel.connect, except that you don't need to
  # provide the :database key.
  info(:adapter  => 'mysql',
       :host     => 'localhost',
       :user     => 'root',
       :password => 'xxx')

end

3) Store table information in a subclass of RStore::BaseTable Naming convention: name => NameTable


class ProductsTable < RStore::BaseTable

  # Specify the database table the same way
  # you do in Sequel
  create do
    primary_key :id, :allow_null => false
    String      :product
    Integer     :quantity
    Float       :price
    Date        :created_at
    DateTime    :min_demand
    Time        :max_demand
    Boolean     :on_stock, :allow_null => false, :default => false
  end

end

Note: You can put the database and table class definitions in separate files and require them when needed.

4) Enter csv data into the database The from method accepts a path to a file or directory as well as an URL. The to metthod accepts a string of the form db_name.table_name

RStore::CSV.new do
  from '../easter/children', :recursive => true                   # select a directory or
  from '../christmas/children/toys.csv'                           # file, or
  from 'www.example.com/sweets.csv', :selector => 'pre div.line'  # URL
  to   'company.products'                                         # provide database and table name
  run                                                             # run the program
end

Additional Features


You can change and reset the default options (see section Available Options below for details)

# Search directories recursively and handle the first row of a file as data by default
RStore::CSV.change_default_options(:recursive => true, :has_headers => false)

RStore::CSV.new do
  from 'dir1'
  from 'dir2'
  from 'dir3'
  to   'company.products'
  run
end

# Restore default options
RStore::CSV.reset_default_options

There is also a convenience method enabling you to use all of Sequels query methods.

RStore::CSV.query('company.products') do |table|    # table = Sequel::Dataset object
  table.all                                         # fetch everything
  table.all[3]                                      # fetch row number 4 (see output below)
  table.filter(:id => 2).update(:on_stock => true)  # update entry
  table.filter(:id => 3).delete                     # delete entry
end

*) Output of db[table.name].all[3]

# {:product     => "toy4",
#  :quantity    => 4,
#  :price       => nil,
#  :created_at  => nil,
#  :min_demand  => <DateTime: 2011-10-25T04:30:00+00:00 (39293755/16,0/1,2299161)>,
#  :max_demand  => <DateTime: 2011-10-25T16:30:00+00:00 (39293763/16,0/1,2299161)>,
#  :on_stock    => false}

Access all of Sequels functionality by using the convenience methods BaseDB.connect, BaseTable.name, and BaseTable.table_info:


DB     = CompanyDB.connect           # Open connection to 'company' database
name   = ProductTable.name           # Table name, :products, used as an argument to the following methods.
layout = ProductsTable.table_info    # The Proc that was passed to ProductsTable.create

DB.create_table(name, &layout)       # Create table

DB.alter_table name do               # Alter table
  drop_column :created_at
  add_column  :entry_date, :date
end

DB.drop_table(name)                  # Drop table

Available Options

The method from accepts two kinds of options, file options and parse options:

File Options

File options are used for fetching csv data from a source. The following options are recognized:

Parse Options

Parse options are arguments to CSV::parse. The following options are recognized:

For more information on the parse options, please see section Further Reading below.

Further Reading

Feedback

Any suggestions or criticism are highly welcome! Whatever your feedback, it will help me make this gem better!