Class: Chinese::Vocab

Inherits:
Object
  • Object
show all
Includes:
HelperMethods, WithValidations
Defined in:
lib/chinese/vocab.rb

Constant Summary

OPTIONS =

Mandatory constant for the Options module. Each key-value pair is of the following type: option_key => [default_value, validation]

{:compact      => [false, lambda {|value| is_boolean?(value) }],
:with_pinyin  => [true,  lambda {|value| is_boolean?(value) }],
:thread_count => [8,     lambda {|value| value.kind_of?(Integer) }]}

Instance Attribute Summary (collapse)

Class Method Summary (collapse)

Instance Method Summary (collapse)

Methods included from HelperMethods

#distinct_words, #include_every_char?, #is_unicode?

Constructor Details

- (Vocab) initialize(word_array, options) - (Vocab) initialize(word_array)

Intializes an object.

Examples:

require 'chinese_vocab'

# Extract the Chinese words from a CSV file.
words = Chinese::Vocab.parse_words('path/to/file/hsk.csv', 4)

# Initialize Chinese::Vocab with word array
# :compact => true means single character words are that also appear in multi-character
# words are removed from the word array (["看", "看书"] => [看书])
vocabulary = Chinese::Vocab.new(words, :compact => true)

# Return minimum necessary sentences.
vocabulary.min_sentences(:size => small)

# See how what are the unique characters in all these sentences.
vocabulary.sentences_unique_chars(my_sentences)
# => ["我", "们", "跟", "他", "是", "好", "朋", "友", ...]

# Save to file
vocabulary.to_csv('path/to_file/vocab_sentences.csv')

Overloads:

  • - (Vocab) initialize(word_array, options)

    Parameters:

    • word_array (Array<String>)

      An array of Chinese words that is stored in #words after all non-ascii, non-unicode characters have been stripped and double entries removed.

    • options (Hash)

      The options to customize the following feature.

    Options Hash (options):

    • :compact (Boolean)

      Whether or not to remove all single character words that also appear in at least one multi character word. Example: (["看", "看书"] => [看书]) The reason behind this option is to remove redundancy by focusing on learning distinct characters. Defaults to false.

Parameters:

  • word_array (Array<String>)

    An array of Chinese words that is stored in #words after all non-ascii, non-unicode characters have been stripped and double entries removed.



52
53
54
55
56
57
58
59
# File 'lib/chinese/vocab.rb', line 52

def initialize(word_array, options={})
  @compact = validate { :compact }
  @words    = edit_vocab(word_array)
  @words    = remove_redundant_single_char_words(@words)  if @compact
  @chinese  = is_unicode?(@words[0])
  @not_found        = []
  @stored_sentences = []
end

Instance Attribute Details

- (Boolean) compact (readonly)

The value of the :compact options key.

Returns:

  • (Boolean)

    the value of the :compact options key.



21
22
23
# File 'lib/chinese/vocab.rb', line 21

def compact
  @compact
end

- (Array<String>) not_found (readonly)

of the supported online dictionaries during a call to either #sentences or #min_sentences. Defaults to [].

Returns:

  • (Array<String>)

    holds those Chinese words from #words that could not be found in any



25
26
27
# File 'lib/chinese/vocab.rb', line 25

def not_found
  @not_found
end

- (Array<Hash>) stored_sentences (readonly)

Holds the return value of either #sentences or #min_sentences, whichever was called last. Defaults to [].

Returns:



30
31
32
# File 'lib/chinese/vocab.rb', line 30

def stored_sentences
  @stored_sentences
end

- (Boolean) with_pinyin (readonly)

The value of the :with_pinyin option key.

Returns:

  • (Boolean)

    the value of the :with_pinyin option key.



27
28
29
# File 'lib/chinese/vocab.rb', line 27

def with_pinyin
  @with_pinyin
end

- (Object) words (readonly)

Returns the value of attribute words



19
20
21
# File 'lib/chinese/vocab.rb', line 19

def words
  @words
end

Class Method Details

+ (Array<String>) parse_words(path_to_csv, word_col, options) + (Array<String>) parse_words(path_to_csv, word_col)

Extracts the vocabulary column from a CSV file as an array of strings. The array is normally provided as an argument to #initialize

Examples:

require 'chinese_vocab'

# Extract the Chinese words from a CSV file.
words = Chinese::Vocab.parse_words('path/to/file/hsk.csv', 4)

# Initialize Chinese::Vocab with word array
# :compact => true means single character words are that also appear in multi-character
# words are removed from the word array (["看", "看书"] => [看书])
vocabulary = Chinese::Vocab.new(words, :compact => true)

# Return minimum necessary sentences.
vocabulary.min_sentences(:size => small)

# See how what are the unique characters in all these sentences.
vocabulary.sentences_unique_chars(my_sentences)
# => ["我", "们", "跟", "他", "是", "好", "朋", "友", ...]

# Save to file
vocabulary.to_csv('path/to_file/vocab_sentences.csv')

Overloads:

  • + (Array<String>) parse_words(path_to_csv, word_col, options)

    Parameters:

    • path_to_csv (String)

      The relative or full path to the CSV file.

    • word_col (Integer)

      The column number of the vocabulary column (counting starts at 1).

    • options (Hash)

      The supported options of Ruby's CSV library as well as the :encoding parameter. Exceptions: :encoding is always set to utf-8 and :skip_blanks to true.

  • + (Array<String>) parse_words(path_to_csv, word_col)

    Parameters:

    • path_to_csv (String)

      The relative or full path to the CSV file.

    • word_col (Integer)

      The column number of the vocabulary column (counting starts at 1).

Returns:

  • (Array<String>)

    The vocabluary (Chinese words)

Raises:

  • (ArgumentError)


74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
# File 'lib/chinese/vocab.rb', line 74

def self.parse_words(path_to_csv, word_col, options={})
  # Enforced options:
  # encoding: utf-8 (necessary for parsing Chinese characters)
  # skip_blanks: true
  options.merge!({:encoding => 'utf-8', :skip_blanks => true})
  csv = CSV.read(path_to_csv, options)

  raise ArgumentError, "Column number (#{word_col}) out of range."  unless within_range?(word_col, csv[0])
  # 'word_col counting starts at 1, but CSV.read returns an array,
  # where counting starts at 0.
  col = word_col-1
  csv.reduce([]) {|words, row|
    word = row[col]
    # If word_col contains no data, CSV::read returns nil.
    # We also want to skip empty strings or strings that only contain whitespace.
    words << word  unless word.nil? || word.strip.empty?
    words
  }
end

+ (Boolean) within_range?(column, row)

Input: column: word column number (counting from 1) row : Array of the processed CSV data that contains our word column.

Returns:

  • (Boolean)


586
587
588
589
# File 'lib/chinese/vocab.rb', line 586

def self.within_range?(column, row)
  no_of_cols = row.size
  column >= 1 && column <= no_of_cols
end

Instance Method Details

- (Object) add_key(hash_array, key, &block)



530
531
532
533
534
535
536
537
538
# File 'lib/chinese/vocab.rb', line 530

def add_key(hash_array, key, &block)
  hash_array.map do |row|
    if block
      row.merge({key => block.call(row)})
    else
      row
    end
  end
end

- (Object) add_target_words(hash_array)



452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
# File 'lib/chinese/vocab.rb', line 452

def add_target_words(hash_array)
  puts "Internal: Adding target words..."
  from_queue  = Queue.new
  to_queue    = Queue.new
  # semaphore = Mutex.new
  result      = []
  words       = @words
  puts "add_target_words, words.size = #{words.size}"
  hash_array.each {|hash| from_queue << hash}

  10.times.map {
    Thread.new(words) do

      while(row = from_queue.pop!)
        sentence     = row[:chinese]
        target_words = target_words_per_sentence(sentence, words)

        to_queue << row.merge(:target_words => target_words)

      end
    end
  }.map {|thread| thread.join}

  to_queue.to_a

end

- (Object) alternate_source(sources, selection)



592
593
594
595
596
# File 'lib/chinese/vocab.rb', line 592

def alternate_source(sources, selection)
  sources = sources.dup
  sources.delete(selection)
  sources.pop
end

- (Boolean) contains_all_target_words?(selected_rows, sentence_key)

Returns:

  • (Boolean)


552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
# File 'lib/chinese/vocab.rb', line 552

def contains_all_target_words?(selected_rows, sentence_key)

  matched_words = @words.reduce([]) do |acc, word|

    result = selected_rows.find do |row|
      sentence = row[sentence_key]
      include_every_char?(word, sentence)
    end

    if result
      acc << word
    end

    acc
  end

  if matched_words.size == @words.size
    true
  else
    puts "-----------------------------"
    puts "#contains_all_target_words?"
    puts "Words not found:"
    p @words - matched_words
    puts "-----------------------------"
    false
  end

  #matched_words.size == @words.size
end

- (Object) convert(text)



447
448
449
# File 'lib/chinese/vocab.rb', line 447

def convert(text)
  eval(text.chomp)
end

- (Object) edit_vocab(word_array)

Remove all non-word characters



347
348
349
350
351
352
353
354
355
356
# File 'lib/chinese/vocab.rb', line 347

def edit_vocab(word_array)
  puts "Editing vocabulary..."

  word_array.map {|word|
    edited = remove_parens(word)
    edited = remove_slash(edited)
    edited = remove_er_character_from_end(edited)
    distinct_words(edited).join(' ')
  }.uniq
end

- (Boolean) is_boolean?(value)

Returns:

  • (Boolean)


340
341
342
343
# File 'lib/chinese/vocab.rb', line 340

def is_boolean?(value)
  # Only true for either 'false' or 'true'
  !!value == value
end

- (Object) make_hash(*data)



377
378
379
380
381
# File 'lib/chinese/vocab.rb', line 377

def make_hash(*data)
  require 'digest'
  data = data.reduce("") { |acc, item| acc << item.to_s }
  Digest::SHA2.hexdigest(data)[0..6]
end

- (Array<Hash>, []) min_sentences(options)

Note:

In case of a network error during dowloading the sentences the data fetched so far is automatically copied to a file after several retries. This data is read and processed on the next run to reduce the time spend with downloading the sentences (which is by far the most time-consuming part).

For every Chinese word in #words fetches a Chinese sentence and its English translation from an online dictionary, then calculates and the minimum number of sentences necessary to cover every word in #words at least once. The calculation is based on the fact that many words occur in more than one sentence.

The return value is also stored in #stored_sentences.

Examples:

require 'chinese_vocab'

# Extract the Chinese words from a CSV file.
words = Chinese::Vocab.parse_words('path/to/file/hsk.csv', 4)

# Initialize Chinese::Vocab with word array
# :compact => true means single character words are that also appear in multi-character
# words are removed from the word array (["看", "看书"] => [看书])
vocabulary = Chinese::Vocab.new(words, :compact => true)

# Return minimum necessary sentences.
vocabulary.min_sentences(:size => small)

# See how what are the unique characters in all these sentences.
vocabulary.sentences_unique_chars(my_sentences)
# => ["我", "们", "跟", "他", "是", "好", "朋", "友", ...]

# Save to file
vocabulary.to_csv('path/to_file/vocab_sentences.csv')

Parameters:

  • options (Hash)

    The options to customize the following features.

Options Hash (options):

  • :source (Symbol)

    The online dictionary to download the sentences from, either :nciku or :jukuu. Defaults to :nciku.

  • :size (Symbol)

    The size of the sentence to return from a possible set of several sentences. Supports the values :short, :average, and :long. Defaults to :short.

  • :with_pinyin (Boolean)

    Whether or not to return the pinyin representation of a sentence. Defaults to true.

  • :thread_count (Integer)

    The number of threads used to download the sentences. Defaults to 8.

Returns:

  • (Array<Hash>, [])

    By default each hash holds the following key-value pairs (The return value is also stored in #stored_sentences.):

    • :chinese => Chinese sentence
    • :english => English translation
    • :pinyin => Pinyin
    • :uwc => Unique words count tag (String) of the form "x_words", where x denotes the number of unique words from #words found in the sentence.
    • :uws => Unique words string tag (String) of the form "[词语1,词语2,...]", where 词语 denotes the actual word(s) from #words found in the sentence.


254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
# File 'lib/chinese/vocab.rb', line 254

def min_sentences(options = {})
  @with_pinyin = validate { :with_pinyin }
  # Always run this method.
  thread_count = validate { :thread_count }
  sentences    = sentences(options)

  puts "Calculating the minimum necessary sentences..."
  minimum_sentences = select_minimum_necessary_sentences(sentences)
  # :uwc = 'unique words count'
  with_uwc_tag      = add_key(minimum_sentences, :uwc) {|row| uwc_tag(row[:target_words]) }
  # :uws = 'unique words string'
  with_uwc_uws_tags = add_key(with_uwc_tag, :uws) do |row|
    words = row[:target_words].sort.join(', ')
    "[" + words + "]"
  end
  # Remove those keys we don't need anymore
  result            = remove_keys(with_uwc_uws_tags, :target_words, :word)
  @stored_sentences = result
  @stored_sentences
end

- (Object) remove_er_character_from_end(word)



359
360
361
362
363
364
365
# File 'lib/chinese/vocab.rb', line 359

def remove_er_character_from_end(word)
  if word.size > 2
  word.gsub(/儿$/, '')
  else # Don't remove "儿" form words like 女儿
    word
  end
end

- (Object) remove_keys(hash_array, *keys)



525
526
527
# File 'lib/chinese/vocab.rb', line 525

def remove_keys(hash_array, *keys)
  hash_array.map { |row| row.delete_keys(*keys) }
end

- (Object) remove_parens(word)

Helper functions



333
334
335
336
337
# File 'lib/chinese/vocab.rb', line 333

def remove_parens(word)
  # 1) Remove all ASCII parens and all data in between.
  # 2) Remove all Chinese parens and all data in between.
  word.gsub(/\(.*?\)/, '').gsub(/(.*?)/, '')
end

- (Object) remove_redundant_single_char_words(words)

Input: ["看", "书", "看书"] Output: ["看书"]



386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
# File 'lib/chinese/vocab.rb', line 386

def remove_redundant_single_char_words(words)
  puts "Removing redundant single character words from the vocabulary..."

  single_char_words, multi_char_words = words.partition {|word| word.length == 1 }
  return single_char_words  if multi_char_words.empty?

  non_redundant_single_char_words = single_char_words.reduce([]) do |acc, single_c|

    already_found = multi_char_words.find do |multi_c|
      multi_c.include?(single_c)
    end
    # Add single char word to array if it is not part of any of the multi char words.
    acc << single_c  unless already_found
    acc
  end

  non_redundant_single_char_words + multi_char_words
end

- (Object) remove_slash(word)



368
369
370
371
372
373
374
# File 'lib/chinese/vocab.rb', line 368

def remove_slash(word)
  if word.match(/\//)
    word.split(/\//).sort_by { |w| w.size }.last
  else
    word
  end
end

- (Object) select_minimum_necessary_sentences(sentences)



499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
# File 'lib/chinese/vocab.rb', line 499

def select_minimum_necessary_sentences(sentences)
  with_target_words = add_target_words(sentences)
  rows              = sort_by_target_word_count(with_target_words)

  selected_rows   = []
  unmatched_words = @words.dup
  matched_words   = []

  rows.each do |row|
    words = row[:target_words].dup
    # Delete all words from 'words' that have already been encoutered
    # (and are included in 'matched_words').
    words = words - matched_words

    if words.size > 0  # Words that where not deleted above have to be part of 'unmatched_words'.
      selected_rows << row  # Select this row.

      # When a row is selected, its 'words' are no longer unmatched but matched.
      unmatched_words = unmatched_words - words
      matched_words   = matched_words + words
    end
  end
  selected_rows
end

- (Object) select_sentence(word, options)

Uses options passed from #sentences



407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
# File 'lib/chinese/vocab.rb', line 407

def select_sentence(word, options)
  sentence_pair = Scraper.sentence(word, options)

  sources = Scraper::Sources.keys
  sentence_pair = try_alternate_download_sources(sources, word, options)  if sentence_pair.empty?

  if sentence_pair.empty?
    @not_found << word
    return nil
  else
    chinese, english = sentence_pair

    result = Hash.new
    result.merge!(word:    word)
    result.merge!(chinese: chinese)
    result.merge!(pinyin:  chinese.to_pinyin)  if @with_pinyin
    result.merge!(english: english)
  end
end

- (Hash) sentences(options)

Note:

(Normally you only call this method directly if you really need one sentence per Chinese word (even if these words might appear in more than one of the sentences.).

Note:

In case of a network error during dowloading the sentences the data fetched so far is automatically copied to a file after several retries. This data is read and processed on the next run to reduce the time spend with downloading the sentences (which is by far the most time-consuming part).

For every Chinese word in #words fetches a Chinese sentence and its English translation from an online dictionary, The return value is also stored in #stored_sentences.

Examples:

require 'chinese_vocab'

# Extract the Chinese words from a CSV file.
words = Chinese::Vocab.parse_words('path/to/file/hsk.csv', 4)

# Initialize Chinese::Vocab with word array
# :compact => true means single character words are that also appear in multi-character
# words are removed from the word array (["看", "看书"] => [看书])
vocabulary = Chinese::Vocab.new(words, :compact => true)

# Return a sentence for each word
vocabulary.sentences(:size => small)

Parameters:

  • options (Hash)

    The options to customize the following features.

Options Hash (options):

  • :source (Symbol)

    The online dictionary to download the sentences from, either :nciku or :jukuu. Defaults to :nciku.

  • :size (Symbol)

    The size of the sentence to return from a possible set of several sentences. Supports the values :short, :average, and :long. Defaults to :short.

  • :with_pinyin (Boolean)

    Whether or not to return the pinyin representation of a sentence. Defaults to true.

  • :thread_count (Integer)

    The number of threads used to download the sentences. Defaults to 8.

Returns:

  • (Hash)

    By default each hash holds the following key-value pairs (The return value is also stored in #stored_sentences.):

    • :chinese => Chinese sentence
    • :english => English translation
    • :pinyin => Pinyin


132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
# File 'lib/chinese/vocab.rb', line 132

def sentences(options={})
  puts "Fetching sentences..."
  # Always run this method.

  # We assign all options to a variable here (also those that are passed on)
  # as we need them in order to calculate the id.
  @with_pinyin = validate { :with_pinyin }
  thread_count = validate { :thread_count }
  id           = make_hash(@words, options.to_a.sort)
  words        = @words

  from_queue  = Queue.new
  to_queue    = Queue.new
  file_name   = id

  if File.exist?(file_name)
    puts "examining file"
    words, sentences, not_found = File.open(file_name) { |f| f.readlines }
    words = convert(words)
    convert(sentences).each { |s| to_queue << s }
    @not_found = convert(not_found)
    puts "Size(@not_found)  = #{@not_found.size}"
    size_a = words.size
    size_b = to_queue.size
    puts "Size(words)       = #{size_a}"
    puts "Size(to_queue)    = #{size_b}"
    puts "Size(words+queue) = #{size_a+size_b}"

    # Remove file
    File.unlink(file_name)
  end

  words.each {|word| from_queue << word }
  result = []

  Thread.abort_on_exception = false

  1.upto(thread_count).map {
    Thread.new do

      while(word = from_queue.pop!) do

        begin
          local_result = select_sentence(word, options)
          puts "word: #{word}"
          # rescue SocketError, Timeout::Error, Errno::ETIMEDOUT,
          # Errno::ECONNREFUSED, Errno::ECONNRESET, EOFError => e
        rescue Exception => e
          puts " #{e.message}."
          puts "Please DO NOT abort the program but wait for all threads to terminate."
          puts "Number of running threads: #{Thread.list.size - 1}."
          puts "On termination of all threads, the data will be saved to disk for fast retrieval on the next run of the program."
          raise

        ensure
          from_queue << word  if $!
          puts "Wrote '#{word}' to 'from_queue'"  if $!
        end

        to_queue << local_result  unless local_result.nil?

      end
    end
  }.each {|thread| thread.join }

  @stored_sentences = to_queue.to_a
  @stored_sentences

ensure
  if $!
    while(Thread.list.size > 1) do # Wait for all child threads to terminate.
      sleep 5
    end

    File.open(file_name, 'w') do |f|
      p "============================="
      p "Writing data to file..."
      f.write from_queue.to_a
      f.puts
      f.write to_queue.to_a
      f.puts
      f.write @not_found
      puts "Finished writing data."
      puts "Please run the program again after solving the (connection) problem."
    end
  end
end

- (Array<String>) sentences_unique_chars(sentences)

Note:

If no argument is passed, the data from #stored_sentences is used as input

Finds the unique Chinese characters from either the data in #stored_sentences or an array of Chinese sentences passed as an argument.

Examples:

require 'chinese_vocab'

# Extract the Chinese words from a CSV file.
words = Chinese::Vocab.parse_words('path/to/file/hsk.csv', 4)

# Initialize Chinese::Vocab with word array
# :compact => true means single character words are that also appear in multi-character
# words are removed from the word array (["看", "看书"] => [看书])
vocabulary = Chinese::Vocab.new(words, :compact => true)

# Return minimum necessary sentences.
vocabulary.min_sentences(:size => small)

# See how what are the unique characters in all these sentences.
vocabulary.sentences_unique_chars(my_sentences)
# => ["我", "们", "跟", "他", "是", "好", "朋", "友", ...]

# Save to file
vocabulary.to_csv('path/to_file/vocab_sentences.csv')

Parameters:

  • sentences (Array<String>, Array<Hash>)

    An array of chinese sentences or an array of hashes with the key :chinese.

Returns:

  • (Array<String>)

    The unique Chinese characters



302
303
304
305
306
307
308
309
310
# File 'lib/chinese/vocab.rb', line 302

def sentences_unique_chars(sentences = stored_sentences)
  # If the argument is an array of hashes, then it must be the data from @stored_sentences
  sentences = sentences.map { |hash| hash[:chinese] }  if sentences[0].kind_of?(Hash)

  sentences.reduce([]) do |acc, row|
    acc = acc | row.scan(/\p{Word}/) # only return characters, skip punctuation marks
    acc
  end
end

- (Object) sort_by_target_word_count(with_target_words)



485
486
487
488
489
490
491
492
493
494
495
496
# File 'lib/chinese/vocab.rb', line 485

def sort_by_target_word_count(with_target_words)

  # First sort by size of unique word array (from large to short)
  # If the unique word count is equal, sort by the length of the sentence (from small to large)
  with_target_words.sort_by {|row|
    [-row[:target_words].size, row[:chinese].size] }

    #  The above is the same as:
    #   with_target_words.sort {|a,b|
    #     first = -(a[:target_words].size <=> b[:target_words].size)
    #     first.nonzero? || (a[:chinese].size <=> b[:chinese].size) }
end

- (Object) target_words_per_sentence(sentence, words)



480
481
482
# File 'lib/chinese/vocab.rb', line 480

def target_words_per_sentence(sentence, words)
   words.select {|w| include_every_char?(w, sentence) }
end

- to_csv(path_to_file, options) - to_csv(path_to_file)

This method returns an undefined value.

Saves the data stored in #stored_sentences to disk.

Examples:

require 'chinese_vocab'

# Extract the Chinese words from a CSV file.
words = Chinese::Vocab.parse_words('path/to/file/hsk.csv', 4)

# Initialize Chinese::Vocab with word array
# :compact => true means single character words are that also appear in multi-character
# words are removed from the word array (["看", "看书"] => [看书])
vocabulary = Chinese::Vocab.new(words, :compact => true)

# Return minimum necessary sentences.
vocabulary.min_sentences(:size => small)

# See how what are the unique characters in all these sentences.
vocabulary.sentences_unique_chars(my_sentences)
# => ["我", "们", "跟", "他", "是", "好", "朋", "友", ...]

# Save to file
vocabulary.to_csv('path/to_file/vocab_sentences.csv')

Overloads:

  • - to_csv(path_to_file, options)

    Parameters:

    • path_to_file (String)

      file name and path of where to save the file.

    • options (Hash)

      The supported options of Ruby's CSV library.

  • - to_csv(path_to_file)

    Parameters:

    • path_to_file (String)

      file name and path of where to save the file.



321
322
323
324
325
326
327
328
# File 'lib/chinese/vocab.rb', line 321

def to_csv(path_to_file, options = {})

  CSV.open(path_to_file, "w", options) do |csv|
    @stored_sentences.each do |row|
      csv << row.values
    end
  end
end

- (Object) try_alternate_download_sources(alternate_sources, word, options)



428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
# File 'lib/chinese/vocab.rb', line 428

def try_alternate_download_sources(alternate_sources, word, options)
  sources = alternate_sources.dup
  sources.delete(options[:source])

  result = sources.find do |s|
    options  = options.merge(:source => s)
    sentence = Scraper.sentence(word, options)
    sentence.empty? ? nil : sentence
  end

  if result
    optins = options.merge(:source => result)
    Scraper.sentence(word, options)
  else
    []
  end
end

- (Object) uwc_tag(string)



541
542
543
544
545
546
547
548
549
# File 'lib/chinese/vocab.rb', line 541

def uwc_tag(string)
  size = string.length
  case size
  when 1
    "1_word"
  else
    "#{size}_words"
  end
end