Class: Ebooks::SuffixGenerator

Inherits:
Object
  • Object
show all
Defined in:
lib/moo_ebooks/suffix.rb

Overview

This generator uses data similar to a Markov model, but instead of making a chain by looking up bigrams it uses the positions to randomly replace token array suffixes in one sentence with matching suffixes in another

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(sentences) ⇒ SuffixGenerator

Returns a new instance of SuffixGenerator.


20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# File 'lib/moo_ebooks/suffix.rb', line 20

def initialize(sentences)
  @sentences = sentences.reject(&:empty?)
  @unigrams = {}
  @bigrams = {}

  @sentences.each_with_index do |tikis, i|
    last_tiki = INTERIM
    tikis.each_with_index do |tiki, j|
      @unigrams[last_tiki] ||= []
      @unigrams[last_tiki] << [i, j]

      @bigrams[last_tiki] ||= {}
      @bigrams[last_tiki][tiki] ||= []

      if j == tikis.length - 1 # Mark sentence endings
        @unigrams[tiki] ||= []
        @unigrams[tiki] << [i, INTERIM]
        @bigrams[last_tiki][tiki] << [i, INTERIM]
      else
        @bigrams[last_tiki][tiki] << [i, j + 1]
      end

      last_tiki = tiki
    end
  end
end

Class Method Details

.build(sentences) ⇒ SuffixGenerator

Build a generator from a corpus of tikified sentences “tikis” are token indexes– a way of representing words and punctuation as their integer position in a big array of such tokens

Parameters:

  • sentences (Array<Array<Integer>>)

Returns:


16
17
18
# File 'lib/moo_ebooks/suffix.rb', line 16

def self.build(sentences)
  SuffixGenerator.new(sentences)
end

Instance Method Details

#generate(passes = 5, gram = :unigrams) ⇒ Array<Integer>

Generate a recombined sequence of tikis model is)

Parameters:

  • passes (Integer) (defaults to: 5)

    number of times to recombine

  • gram (Symbol) (defaults to: :unigrams)

    :unigrams or :bigrams (affects how conservative the

Returns:

  • (Array<Integer>)

52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# File 'lib/moo_ebooks/suffix.rb', line 52

def generate(passes = 5, gram = :unigrams)
  index = rand(@sentences.length)
  tikis = @sentences[index]
  used = [index] # Sentences we've already used
  verbatim = [tikis] # Verbatim sentences to avoid reproducing

  passes.times do
    # Map bigram start site => next tiki alternatives
    varsites = make_varsites(tikis, gram, used)

    variant, verbatim, used = make_variant(tikis, varsites, verbatim, used)

    # If we failed to produce a variation from any alternative, there
    # is no use running additional passes-- they'll have the same result.
    break if variant.nil?

    tikis = variant
  end

  tikis
end