Module: Entropy

Included in:
Discretizer, FSelector::CFS_d, FSelector::FastCorrelationBasedFilter, FSelector::INTERACT, FSelector::InformationGain, FSelector::KS_CCBF, FSelector::SymmetricalUncertainty
Defined in:
lib/fselector/entropy.rb

Overview

entropy-related functions for discrete data

ref: Wikipedia

Instance Method Summary (collapse)

Instance Method Details

- (Float) get_conditional_entropy(vecX, vecY)

Note:

vecX and vecY must be of same length

get the conditional entropy of vector (X) given another vector (Y)

H(X|Y) = sigma_j (P(y_j) * H(X|y_j))

where H(X|y_j) = -1 * sigma_i (P(x_i|y_j) log2 P(x_i|y_j))

Parameters:

  • vecX (Array)

    the first vector

  • vecY (Array)

    the second vector

Returns:

  • (Float)

    H(X|Y)



38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# File 'lib/fselector/entropy.rb', line 38

def get_conditional_entropy(vecX, vecY)
  abort "[#{__FILE__}@#{__LINE__}]: \n"+
        "  two vectors must be of same length" if not vecX.size == vecY.size

  hxy = 0.0
  n = vecX.size.to_f

  vecY.uniq.each do |y_j|
    p1 = vecY.count(y_j)/n

    indices = (0...n).to_a.select { |k| vecY[k] == y_j }
    xvs = vecX.values_at(*indices)
    m = xvs.size.to_f

    xvs.uniq.each do |x_i|
      p2 = xvs.count(x_i)/m

      hxy += -1.0 * p1 * (p2 * Math.log2(p2))
    end
  end

  hxy
end

- (Float) get_information_gain(vecX, vecY)

Note:

vecX and vecY must be of same length

get the information gain of vector (X) given another vector (Y)

IG(X;Y) = H(X) - H(X|Y)
        = H(Y) - H(Y|X) = IG(Y;X)

Parameters:

  • vecX (Array)

    the first vector

  • vecY (Array)

    the second vector

Returns:

  • (Float)

    IG(X;Y)



92
93
94
# File 'lib/fselector/entropy.rb', line 92

def get_information_gain(vecX, vecY)
  get_marginal_entropy(vecX) - get_conditional_entropy(vecX, vecY)
end

- (Float) get_joint_entropy(vecX, vecY)

Note:

vecX and vecY must be of same length

get the joint entropy of vector (X) and vector (Y)

H(X,Y) = H(Y) + H(X|Y)
       = H(X) + H(Y|X)

i.e. H(X,Y) == H(Y,X)

Parameters:

  • vecX (Array)

    the first vector

  • vecY (Array)

    the second vector

Returns:

  • (Float)

    H(X,Y)



76
77
78
# File 'lib/fselector/entropy.rb', line 76

def get_joint_entropy(vecX, vecY)
  get_marginal_entropy(vecY) + get_conditional_entropy(vecX, vecY)
end

- (Float) get_marginal_entropy(vecX)

get the marginal entropy of vector (X)

H(X) = -1 * sigma_i (P(x_i) log2 P(x_i))

Parameters:

  • vecX (Array)

    vector of interest

Returns:

  • (Float)

    H(X)



14
15
16
17
18
19
20
21
22
23
24
# File 'lib/fselector/entropy.rb', line 14

def get_marginal_entropy(vecX)
  h = 0.0
  n = vecX.size.to_f

  vecX.uniq.each do |x_i|
    p = vecX.count(x_i)/n
    h += -1.0 * (p * Math.log2(p))
  end

  h
end

- (Float) get_symmetrical_uncertainty(vecX, vecY)

Note:

vecX and vecY must be of same length

get the symmetrical uncertainty of vector (X) and vector (Y)

                 IG(X;Y)
SU(X;Y) = 2 * -------------
                H(X) + H(Y)

               H(X) - H(X|Y)         H(Y) - H(Y|X)
        = 2 * --------------- = 2 * --------------- = SU(Y;X)
                H(X) + H(Y)           H(X) + H(Y)

Parameters:

  • vecX (Array)

    the first vector

  • vecY (Array)

    the second vector

Returns:

  • (Float)

    SU(X;Y)



113
114
115
116
117
118
119
120
121
122
# File 'lib/fselector/entropy.rb', line 113

def get_symmetrical_uncertainty(vecX, vecY)  
  hx = get_marginal_entropy(vecX)
  hxy = get_conditional_entropy(vecX, vecY)
  hy = get_marginal_entropy(vecY)
  
  su = 0.0
  su = 2*(hx-hxy)/(hx+hy) if not (hx+hy).zero?
  
  su
end