# Module: Entropy

Included in:
Discretizer, FSelector::CFS_d, FSelector::FastCorrelationBasedFilter, FSelector::INTERACT, FSelector::InformationGain, FSelector::KS_CCBF, FSelector::SymmetricalUncertainty
Defined in:
lib/fselector/entropy.rb

## Overview

entropy-related functions for discrete data

ref: Wikipedia

## Instance Method Summary (collapse)

• get the conditional entropy of vector (X) given another vector (Y).

• get the information gain of vector (X) given another vector (Y).

• get the joint entropy of vector (X) and vector (Y).

• get the marginal entropy of vector (X).

• get the symmetrical uncertainty of vector (X) and vector (Y).

## Instance Method Details

### - (Float) get_conditional_entropy(vecX, vecY)

Note:

vecX and vecY must be of same length

get the conditional entropy of vector (X) given another vector (Y)

``````H(X|Y) = sigma_j (P(y_j) * H(X|y_j))

where H(X|y_j) = -1 * sigma_i (P(x_i|y_j) log2 P(x_i|y_j))
``````

Parameters:

• vecX (Array)

the first vector

• vecY (Array)

the second vector

Returns:

• (Float)

H(X|Y)

 ``` 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60``` ```# File 'lib/fselector/entropy.rb', line 38 def get_conditional_entropy(vecX, vecY) abort "[#{__FILE__}@#{__LINE__}]: \n"+ " two vectors must be of same length" if not vecX.size == vecY.size hxy = 0.0 n = vecX.size.to_f vecY.uniq.each do |y_j| p1 = vecY.count(y_j)/n indices = (0...n).to_a.select { |k| vecY[k] == y_j } xvs = vecX.values_at(*indices) m = xvs.size.to_f xvs.uniq.each do |x_i| p2 = xvs.count(x_i)/m hxy += -1.0 * p1 * (p2 * Math.log2(p2)) end end hxy end```

### - (Float) get_information_gain(vecX, vecY)

Note:

vecX and vecY must be of same length

get the information gain of vector (X) given another vector (Y)

``````IG(X;Y) = H(X) - H(X|Y)
= H(Y) - H(Y|X) = IG(Y;X)
``````

Parameters:

• vecX (Array)

the first vector

• vecY (Array)

the second vector

Returns:

• (Float)

IG(X;Y)

 ``` 92 93 94``` ```# File 'lib/fselector/entropy.rb', line 92 def get_information_gain(vecX, vecY) get_marginal_entropy(vecX) - get_conditional_entropy(vecX, vecY) end```

### - (Float) get_joint_entropy(vecX, vecY)

Note:

vecX and vecY must be of same length

get the joint entropy of vector (X) and vector (Y)

``````H(X,Y) = H(Y) + H(X|Y)
= H(X) + H(Y|X)

i.e. H(X,Y) == H(Y,X)
``````

Parameters:

• vecX (Array)

the first vector

• vecY (Array)

the second vector

Returns:

• (Float)

H(X,Y)

 ``` 76 77 78``` ```# File 'lib/fselector/entropy.rb', line 76 def get_joint_entropy(vecX, vecY) get_marginal_entropy(vecY) + get_conditional_entropy(vecX, vecY) end```

### - (Float) get_marginal_entropy(vecX)

get the marginal entropy of vector (X)

``````H(X) = -1 * sigma_i (P(x_i) log2 P(x_i))
``````

Parameters:

• vecX (Array)

vector of interest

Returns:

• (Float)

H(X)

 ``` 14 15 16 17 18 19 20 21 22 23 24``` ```# File 'lib/fselector/entropy.rb', line 14 def get_marginal_entropy(vecX) h = 0.0 n = vecX.size.to_f vecX.uniq.each do |x_i| p = vecX.count(x_i)/n h += -1.0 * (p * Math.log2(p)) end h end```

### - (Float) get_symmetrical_uncertainty(vecX, vecY)

Note:

vecX and vecY must be of same length

get the symmetrical uncertainty of vector (X) and vector (Y)

``````                 IG(X;Y)
SU(X;Y) = 2 * -------------
H(X) + H(Y)

H(X) - H(X|Y)         H(Y) - H(Y|X)
= 2 * --------------- = 2 * --------------- = SU(Y;X)
H(X) + H(Y)           H(X) + H(Y)
``````

Parameters:

• vecX (Array)

the first vector

• vecY (Array)

the second vector

Returns:

• (Float)

SU(X;Y)

 ``` 113 114 115 116 117 118 119 120 121 122``` ```# File 'lib/fselector/entropy.rb', line 113 def get_symmetrical_uncertainty(vecX, vecY) hx = get_marginal_entropy(vecX) hxy = get_conditional_entropy(vecX, vecY) hy = get_marginal_entropy(vecY) su = 0.0 su = 2*(hx-hxy)/(hx+hy) if not (hx+hy).zero? su end```