Class: Myaso::Tagger::Model

Inherits:
Object
  • Object
show all
Defined in:
lib/myaso/tagger/model.rb

Overview

Any HMM tagger requires a trained model that can perform such tasks as producing smoothed q() and e() values, replace unknown words with special symbols.

Direct Known Subclasses

TnT

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(interpolations = nil) ⇒ Model

Tagging model requires n-grams and lexicon.

It is possible to the the interpolations vector when its values are known. If there are necessity to recompute the interpolations then nil shall be given (default behavior). If there should be no interpolations then false shall be given. In other cases it is possible to set them explicitly.


18
19
20
21
22
23
24
25
26
27
28
# File 'lib/myaso/tagger/model.rb', line 18

def initialize(interpolations = nil)
  @ngrams, @lexicon = Myaso::Ngrams.new, Myaso::Lexicon.new
  @interpolations = if interpolations == false
    [0.33, 0.33, 0.33]
  elsif interpolations.nil?
    nil
  else
    interpolations
  end
  learn!
end

Instance Attribute Details

#interpolationsObject (readonly)

Returns the value of attribute interpolations


8
9
10
# File 'lib/myaso/tagger/model.rb', line 8

def interpolations
  @interpolations
end

#lexiconObject (readonly)

Returns the value of attribute lexicon


8
9
10
# File 'lib/myaso/tagger/model.rb', line 8

def lexicon
  @lexicon
end

#ngramsObject (readonly)

Returns the value of attribute ngrams


8
9
10
# File 'lib/myaso/tagger/model.rb', line 8

def ngrams
  @ngrams
end

Instance Method Details

#conditional(ab, b) ⇒ Object

Conditional probability p(A|B) = p(A, B) / p(B). Returns zero when denominator is zero.


64
65
66
67
# File 'lib/myaso/tagger/model.rb', line 64

def conditional(ab, b)
  return 0.0 if b.zero?
  ab / b.to_f
end

#e(word, tag) ⇒ Object

Function e in the Viterbi algorithm. It process probability of generation word with this tag relatively to all words with this tag.


50
51
52
# File 'lib/myaso/tagger/model.rb', line 50

def e(word, tag)
  conditional(lexicon[word, tag], ngrams[tag])
end

#q(first, second, third) ⇒ Object

Linear interpolation model of processing probability of occurence of the trigram (first, second, third). It consider three summands: the first one has the next sense: probability that current tag is (third) if last two are (first, second), the second one – that last one is (second), and the last summand consider independent probability that current tag is (third).


38
39
40
41
42
43
44
# File 'lib/myaso/tagger/model.rb', line 38

def q(first, second, third)
  q1 = conditional(ngrams[third], ngrams.unigrams_count)
  q2 = conditional(ngrams[second, third], ngrams[second])
  q3 = conditional(ngrams[first, second, third], ngrams[first, second])

  q1 * interpolations[0] + q2 * interpolations[1] + q3 * interpolations[2]
end

#rare?(word) ⇒ Boolean

If word is rare, than it should be replaced in preparation of the training set. So, it can't be in the training set.

Returns:

  • (Boolean)

57
58
59
# File 'lib/myaso/tagger/model.rb', line 57

def rare?(word)
  lexicon[word] <= 1
end