# Module: Ratistics::Probability

Extended by:
Probability
Included in:
Ratistics, Probability
Defined in:
lib/ratistics/probability.rb

## Overview

Various probability computation functions.

## Instance Method Details

### #cumulative_distribution_function(data, value, opts = {}) {|item| ... } ⇒ 0, ... Also known as: cdf, cumulative_distribution

Calculate the probability that a random variable will be at or below a given value based on the given sample (aka cumulative distribution function, CDF).

0 <= P <= 1

Accepts a block for processing individual items in a raw data sample (:from => :sample).

When a block is given the block will be applied to every element in the data set. Using a block in this way allows probability to be computed against a specific field in a data set of hashes or objects.

Parameters:

• data (Enumerable)

the data to perform the calculation against

• opts (Hash) (defaults to: {})

processing options

Options Hash (opts):

• :from (Symbol)

describes the nature of the data. :sample (the default) indicates data is a raw data sample, :frequency (or :freq) indicates data is a frequency distribution created from the #frequency function.

Yields:

• iterates over each element in the data set

Yield Parameters:

• item

each element in the data set

Returns:

• (0, Float, 1)

the probability of a random variable being at or below the given value. Returns zero if the value is lower than the lowest value in the sample and one if the value is higher than the highest value in the sample. Returns zero for a nil or empty sample.

 ``` 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311``` ```# File 'lib/ratistics/probability.rb', line 288 def cumulative_distribution_function(data, value, opts={}) return 0 if data.nil? || data.empty? count = 0 if opts[:from] == :frequency || opts[:from] == :freq size = 0 data.each do |datum, freq| datum = yield(datum) if block_given? count = count + freq if datum <= value size = size + freq end else data.each do |datum| datum = yield(datum) if block_given? count = count + 1 if datum <= value end size = data.size end return 0 if count == 0 return 1 if count == size return count / size.to_f end```

### #cumulative_distribution_function_value(data, prob, opts = {}) {|item| ... } ⇒ ObjectAlso known as: cdf_value, cumulative_distribution_value

Inverse of the #cumulative_distribution_function function. For the given data sample, return the highest value for a given probability.

Accepts a block for processing individual items in a raw data sample (:from => :sample).

When a block is given the block will be applied to every element in the data set. Using a block in this way allows probability to be computed against a specific field in a data set of hashes or objects.

Will sort the data set using natural sort order unless the :sorted option is true or a block is given.

Parameters:

• data (Enumerable)

the data to perform the calculation against

• opts (Hash) (defaults to: {})

processing options

Options Hash (opts):

• :sorted (true, false)

indicates of the data is already sorted

• :from (Symbol)

describes the nature of the data. :sample (the default) indicates data is a raw data sample, :frequency (or :freq) indicates data is a frequency distribution created from the #frequency function.

Yields:

• iterates over each element in the data set

Yield Parameters:

• item

each element in the data set

Returns:

• (Object)

the highest value in the sample for the given probability

 ``` 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374``` ```# File 'lib/ratistics/probability.rb', line 349 def cumulative_distribution_function_value(data, prob, opts={}, &block) return nil if data.nil? || data.empty? || prob < 0 || prob > 1 if (opts[:from].nil? || opts[:from] == :sample) && !(block_given? || opts[:sorted] == true) data = data.sort end if opts[:from].nil? || opts[:from] == :sample return (block_given? ? yield(data[0]) : data[0]) if prob == 0 return (block_given? ? yield(data[-1]) : data[-1]) if prob == 1 else return Math.min(data.keys, &block) if prob == 0 return Math.max(data.keys, &block) if prob == 1 end if opts[:from] == :freq || opts[:from] == :frequency ps = probability(data, :as => :array, :inc => true, :from => :freq, &block) else ps = probability(data, :as => :array, :inc => true, &block) end ps = Sort.insertion_sort!(ps){|item| item.first} index = Collection.bisect_left(ps, prob){|item| item.last} index = index-1 if prob == ps[index-1].first return ps[index].first end```

### #frequency(data, opts = {}) {|item| ... } ⇒ Hash, ...

Calculates the statistical frequency.

When a block is given the block will be applied to every element in the data set. Using a block in this way allows computation against a specific field in a data set of hashes or objects.

The return value is a hash where the keys are the data elements from the sample and the values are the corresponding frequencies. When the :as option is set to :array the return value will be an array of arrays. Each element of the outer array will be a two-element array with the sample value at index 0 and the corresponding frequency at index 1.

Examples:

``````sample = [13, 18, 13, 14, 13, 16, 14, 21, 13]
Ratistics.frequency(sample) #=> {13=>4, 18=>1, 14=>2, 16=>1, 21=>1}
Ratistics.frequency(sample, :as => :array) #=> [[13, 4], [18, 1], [14, 2], [16, 1], [21, 1]]``````

Parameters:

• data (Enumerable)

the data to perform the calculation against

• opts (Hash) (defaults to: {})

processing options

Options Hash (opts):

• :as (Symbol)

sets the output to :hash/:map or :array/:catalog/:catalogue (default :hash)

Yields:

• iterates over each element in the data set

Yield Parameters:

• item

each element in the data set

Returns:

• (Hash, Array, nil)

the statistical frequency of the given data set or nil if the data set is empty

 ``` 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54``` ```# File 'lib/ratistics/probability.rb', line 40 def frequency(data, opts={}) return nil if data.nil? || data.empty? freq = data.reduce({}) do |memo, datum| datum = yield(datum) if block_given? memo[datum] = memo[datum].to_i + 1 memo end if (opts[:as] == :array || opts[:as] == :catalog || opts[:as] == :catalogue) freq = Collection.catalog_hash(freq) end return freq end```

### #normalize_probability(pmf, opts = {}) ⇒ HashAlso known as: normalize_pmf

Normalize a probability distribution sample.

The data set must be formatted as output by the #probability method. Specifically, a hash where each hash key is a datum from the original data set and each hash value is the probability associated with that datum. A probability hash may become denormalized when performing conditional probability.

Parameters:

• pmf (Enumerable)

the data to perform the calculation against

Returns:

• (Hash)

a new, normalized probability distribution.

 ``` 138 139 140 141 142 143 144 145 146 147 148 149 150``` ```# File 'lib/ratistics/probability.rb', line 138 def normalize_probability(pmf, opts={}) total = pmf.values.reduce(0.0){|n, value| n + value} return { pmf.keys.first => 1 } if pmf.count == 1 return pmf if Math.delta(total, 1.0) < 0.01 factor = 1.0 / total.to_f normalized = pmf.reduce({}) do |memo, pair| memo[pair[0]] = pair[1] * factor memo end return normalized end```

### #probability(data, opts = {}, &block) {|item| ... } ⇒ Array, ... Also known as: pmf

Calculates the statistical probability.

When a block is given the block will be applied to every element in the data set. Using a block in this way allows computation against a specific field in a data set of hashes or objects.

Examples:

``````sample = [13, 18, 13, 14, 13, 16, 14, 21, 13]
Ratistics.probability(sample) #=> {13=>0.4444444444444444, 18=>0.1111111111111111, 14=>0.2222222222222222, 16=>0.1111111111111111, 21=>0.1111111111111111}
Ratistics.probability(sample, :as => :array) #=> [[13, 0.4444444444444444], [18, 0.1111111111111111], [14, 0.2222222222222222], [16, 0.1111111111111111], [21, 0.1111111111111111]]
Ratistics.probability(sample, :as => :catalog) #=> [[13, 0.4444444444444444], [18, 0.1111111111111111], [14, 0.2222222222222222], [16, 0.1111111111111111], [21, 0.1111111111111111]]``````

Parameters:

• data (Enumerable)

the data to perform the calculation against

• opts (Hash) (defaults to: {})

processing options

• block (Block)

optional block for per-item processing

Options Hash (opts):

• :from (Symbol)

describes the nature of the data. :sample indicates data is a raw data sample, :frequency (or :freq) indicates data is a frequency distribution created from the #frequency function. (default :sample)

• :as (Symbol)

sets the output to :hash or :array (default :hash)

Yields:

• iterates over each element in the data set

Yield Parameters:

• item

each element in the data set

Returns:

• (Array, Hash, nil)

the statistical probability of the given data set or nil if the data set is empty

 ``` 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121``` ```# File 'lib/ratistics/probability.rb', line 87 def probability(data, opts={}, &block) return nil if data.nil? || data.empty? from_frequency = (opts[:from] == :frequency || opts[:from] == :freq) if from_frequency count = data.reduce(0) do |n, key, value| key, value = key if key.is_a? Array key = yield(key) if block_given? n + value end else count = data.count data = frequency(data, &block) end prob = data.reduce({}) do |memo, key, value| key, value = key if key.is_a? Array key = yield(key) if from_frequency && block_given? memo[key] = value.to_f / count.to_f memo end if opts[:inc] || opts[:increment] || opts[:incremental] base = 0 prob.keys.sort.each do |key| prob[key] = base = prob[key] + base end end if (opts[:as] == :array || opts[:as] == :catalog || opts[:as] == :catalogue) prob = Collection.catalog_hash(prob) end return prob end```

### #probability_mean(data, opts = {}, &block) {|item| ... } ⇒ Float, 0Also known as: pmf_mean, frequency_mean

Calculates the statistical mean of a probability distribution. Accepts a block for processing individual items.

When a block is given the block will be applied to every element in the data set. Using a block in this way allows computation against a specific field in a data set of hashes or objects.

Parameters:

• data (Enumerable)

the data to perform the calculation against

• opts (Hash) (defaults to: {})

processing options

• block (Block)

optional block for per-item processing

Options Hash (opts):

• :from (Symbol)

describes the nature of the data. :sample indicates data is a raw data sample, :frequency (or :freq) indicates data is a frequency distribution created from the #frequency function, and :probability (or :prob) indicates the data is a probability distribution created by the #probability function. (default :sample)

Yields:

• iterates over each element in the data set

Yield Parameters:

• item

each element in the data set

Returns:

• (Float, 0)

the statistical mean of the given data set or zero if the data set is empty

 ``` 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195``` ```# File 'lib/ratistics/probability.rb', line 180 def probability_mean(data, opts={}, &block) return 0 if data.nil? || data.empty? from_probability = (opts[:from] == :probability || opts[:from] == :prob) unless from_probability data = probability(data, :from => opts[:from], &block) end mean = data.reduce(0.0) do |n, key, value| key, value = key if key.is_a? Array key = yield(key) if from_probability and block_given? n + (key * value) end return mean end```

### #probability_variance(data, opts = {}, &block) {|item| ... } ⇒ Float, 0Also known as: pmf_variance

Calculates the statistical variance of a probability distribution. Accepts a block for processing individual items in a raw data sample (:from => :sample).

When a block is given the block will be applied to every element in the data set. Using a block in this way allows computation against a specific field in a data set of hashes or objects.

Parameters:

• data (Enumerable)

the data to perform the calculation against

• opts (Hash) (defaults to: {})

processing options

• block (Block)

optional block for per-item processing

Options Hash (opts):

• :from (Symbol)

describes the nature of the data. :sample indicates data is a raw data sample, :frequency (or :freq) indicates data is a frequency distribution created from the #frequency function, and :probability (or :prob) indicates the data is a probability distribution created by the #probability function. (default :sample)

Yields:

• iterates over each element in the data set

Yield Parameters:

• item

each element in the data set

Returns:

• (Float, 0)

the statistical variance of the given data set or zero if the data set is empty

 ``` 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249``` ```# File 'lib/ratistics/probability.rb', line 226 def probability_variance(data, opts={}, &block) return 0 if data.nil? || data.empty? if opts[:from] == :probability || opts[:from] == :prob from_probability = true else data = probability(data, :from => opts[:from], &block) from_probability = false end mean = data.reduce(0.0) do |n, key, value| key, value = key if key.is_a? Array key = yield(key) if from_probability && block_given? n + (key * value) end variance = data.reduce(0.0) do |n, key, value| key, value = key if key.is_a? Array key = yield(key) if from_probability && block_given? n + (value * ((key - mean) ** 2)) end return variance end```

### #sample_with_replacement(data, opts = {}) {|item| ... } ⇒ ObjectAlso known as: resample_with_replacement, bootstrap

Resamples the given sample with replacement (aka bootstrap). The resample will have the same number of elements as the original sample unless the :size (or :length, :count) option is given.

Will sort the data set using natural sort order unless the :sorted option is true or a block is given.

Parameters:

• data (Enumerable)

the data to perform the calculation against

• opts (Hash) (defaults to: {})

processing options

Options Hash (opts):

• :size (Integer)

the size of the resample

• :sorted (true, false)

indicates of the data is already sorted

• :from (Symbol)

describes the nature of the data. :sample (the default) indicates data is a raw data sample, :frequency (or :freq) indicates data is a frequency distribution created from the #frequency function.

Yields:

• iterates over each element in the data set

Yield Parameters:

• item

each element in the data set

Returns:

• (Object)

the highest value in the sample for the given probability

 ``` 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433``` ```# File 'lib/ratistics/probability.rb', line 408 def sample_with_replacement(data, opts={}, &block) return [] if data.nil? || data.empty? if (opts[:from].nil? || opts[:from] == :sample) && !(block_given? || opts[:sorted] == true) data = data.sort end if opts[:from] == :freq || opts[:from] == :frequency ps = probability(data, :as => :array, :inc => true, :from => :freq, &block) length = data.reduce(0){|length, item| length += item.last } else ps = probability(data, :as => :array, :inc => true, &block) length = opts[:length] || opts[:size] || opts[:count] || data.length end ps = Sort.insertion_sort!(ps){|item| item.first} resample = [] length.times do prob = rand() index = Collection.bisect_left(ps, prob){|item| item.last} index = index-1 if prob == ps[index-1].first resample << ps[index].first end return resample end```

### #sample_without_replacement(data, opts = {}) {|item| ... } ⇒ ObjectAlso known as: resample_without_replacement, jackknife

Resamples the given sample without replacement (aka jackknife). The resample will have one half the number of elements as the original sample unless the :size (or :length, :count) option is given.

Parameters:

• data (Enumerable)

the data to perform the calculation against

• opts (Hash) (defaults to: {})

processing options

Options Hash (opts):

• :size (Integer)

the size of the resample

• :from (Symbol)

describes the nature of the data. :sample (the default) indicates data is a raw data sample, :frequency (or :freq) indicates data is a frequency distribution created from the #frequency function.

Yields:

• iterates over each element in the data set

Yield Parameters:

• item

each element in the data set

Returns:

• (Object)

the highest value in the sample for the given probability

 ``` 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476``` ```# File 'lib/ratistics/probability.rb', line 461 def sample_without_replacement(data, opts={}, &block) return [] if data.nil? || data.empty? if opts[:from] == :freq || opts[:from] == :frequency data = data.inject([]) do |memo, item| item.last.times{ memo << item.first } memo end else data = Collection.collect(data, &block).shuffle! end length = opts[:length] || opts[:size] || opts[:count] || (data.length / 2) return data.slice!(0, length) end```