Module: Ratistics::CentralTendency

Extended by:
CentralTendency
Included in:
Ratistics, CentralTendency
Defined in:
lib/ratistics/central_tendency.rb

Overview

Various average (central tendency) computation functions.

Instance Method Summary collapse

Instance Method Details

#first_quartile(data, opts = {}, &block) {|item| ... } ⇒ Numeric Also known as: lower_quartile

Calculate the value representing the upper-bound of the first quartile (percentile) of a data sample. This is the equivalent of the median of the subset of the sample from the lower bound to the sample-median.

Will sort the data set using natural sort order unless the :sorted option is true or a block is given.

When a block is given the block will be applied to every element in the data set. Using a block in this way allows probability to be computed against a specific field in a data set of hashes or objects.

Parameters:

  • data (Enumerable)

    the data set against which percentile is computed

  • block (Block)

    optional block for per-item processing

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :sorted (true, false)

    indicates of the data is already sorted

Yields:

  • iterates over each element in the data set

Yield Parameters:

  • item

    each element in the data set

Returns:

  • (Numeric)

    value at the rank nearest to the given percentile

See Also:


280
281
282
283
284
# File 'lib/ratistics/central_tendency.rb', line 280

def first_quartile(data, opts={}, &block)
  return nil if data.nil? || data.empty?
  midpoint = (data.size / 2.0).floor - 1
  return CentralTendency.median(Collection.slice(data, (0..midpoint)), opts, &block)
end

#five_number_summaryObject


377
378
379
380
381
382
383
# File 'lib/ratistics/central_tendency.rb', line 377

def five_number_summary
  #the sample minimum (smallest observation)
  #the lower quartile or first quartile
  #the median (middle value)
  #the upper quartile or third quartile
  #the sample maximum (largest observation)
end

#interquartile_rangeObject Also known as: iqg


351
352
# File 'lib/ratistics/central_tendency.rb', line 351

def interquartile_range
end

#lower_inner_fenceObject


357
358
359
# File 'lib/ratistics/central_tendency.rb', line 357

def lower_inner_fence
  # Q1 - 1.5*IQ
end

#lower_outer_fenceObject


367
368
369
# File 'lib/ratistics/central_tendency.rb', line 367

def lower_outer_fence
  # Q1 - 3*IQ
end

#mean(data, opts = {}) {|item| ... } ⇒ Float, 0 Also known as: avg, average

Calculates the statistical mean.

When a block is given the block will be applied to every element in the data set. Using a block in this way allows probability to be computed against a specific field in a data set of hashes or objects.

Parameters:

  • data (Enumerable)

    the data set to compute the mean of

Yields:

  • iterates over each element in the data set

Yield Parameters:

  • item

    each element in the data set

Returns:

  • (Float, 0)

    the statistical mean of the given data set or zero if the data set is empty


24
25
26
27
28
29
30
31
32
33
34
# File 'lib/ratistics/central_tendency.rb', line 24

def mean(data, opts={})
  return 0 if data.nil? || data.empty?
  total = 0.0

  data.each do |item|
    item = yield(item) if block_given?
    total += item.to_f
  end

  return total / data.size.to_f
end

#median(data, opts = {}) {|item| ... } ⇒ Float, 0

Calculates the statistical median.

Will sort the data set using natural sort order unless the :sorted option is true or a block is given.

When a block is given the block will be applied to every element in the data set. Using a block in this way allows probability to be computed against a specific field in a data set of hashes or objects.

Parameters:

  • data (Enumerable)

    the data set to compute the median of

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :sorted (true, false)

    indicates of the data is already sorted

Yields:

  • iterates over each element in the data set

Yield Parameters:

  • item

    each element in the data set

Returns:

  • (Float, 0)

    the statistical median of the given data set or zero if the data set is empty


190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
# File 'lib/ratistics/central_tendency.rb', line 190

def median(data, opts={})
  return 0 if data.nil? || data.empty?
  data = data.sort unless block_given? || opts[:sorted] == true

  index = data.size / 2
  if data.size % 2 == 0 #even

    if block_given?
      median = (yield(data[index-1]) + yield(data[index])) / 2.0
    else
      median = (data[index-1] + data[index]) / 2.0
    end

  else #odd

    if block_given?
      median = yield(data[index])
    else
      median = data[index]
    end
  end

  return median
end

#midrange(data, opts = {}, &block) {|item| ... } ⇒ Float, 0 Also known as: midextreme

Note:

Unlike other functions with a sorted parameter, #midrange does not actually sort the data set. Instead it scans it for the minimum and maximum elements. Therefore this function will work on an unsorted collection even when a block is given. When the data is sorted, however, the scan will be skipped.

Calculates the statistical midrange.

Will sort the data set using natural sort order unless the :sorted option is true or a block is given.

When a block is given the block will be applied to every element in the data set. Using a block in this way allows probability to be computed against a specific field in a data set of hashes or objects.

Parameters:

  • data (Enumerable)

    the data set to compute the midrange of

  • block (Block)

    optional block for per-item processing

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :sorted (true, false)

    indicates of the data is already sorted

Yields:

  • iterates over each element in the data set

Yield Parameters:

  • item

    each element in the data set

Returns:

  • (Float, 0)

    the statistical midrange of the given data set or zero if the data set is empty


156
157
158
159
160
161
162
163
164
165
166
167
# File 'lib/ratistics/central_tendency.rb', line 156

def midrange(data, opts={}, &block)
  return 0 if data.nil? || data.empty?

  if opts[:sorted] == true
    min = block_given? ? yield(data.first) : data.first
    max = block_given? ? yield(data.last) : data.last
  else
    min, max = Math.minmax(data, &block)
  end

  return CentralTendency.mean([min, max])
end

#mode(data, opts = {}) {|item| ... } ⇒ Array

Calculates the statistical modes.

When a block is given the block will be applied to every element in the data set. Using a block in this way allows probability to be computed against a specific field in a data set of hashes or objects.

Parameters:

  • data (Enumerable)

    the data set to compute the median of

Yields:

  • iterates over each element in the data set

Yield Parameters:

  • item

    each element in the data set

Returns:

  • (Array)

    An array of zero or more values (in no particular order) indicating the modes of the data set


229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
# File 'lib/ratistics/central_tendency.rb', line 229

def mode(data, opts={})
  return [] if data.nil? || data.empty?

  modes = {}

  data.each do |item|

    item = yield(item) if block_given?

    if modes.has_key? item
      modes[item] = modes[item]+1
    else
      modes[item] = 1
    end
  end

  modes = modes.sort_by{|key, value| value * -1 }

  modes = modes.reduce([]) do |memo, mode|
    break(memo) if mode[1] < modes[0][1]
    memo << mode[0]
  end

  return modes
end

#second_quartile(data, opts = {}, &block) {|item| ... } ⇒ Numeric

Calculate the value representing the upper-bound of the second quartile (percentile) of a data sample. This is the equivalent of the sample median.

Will sort the data set using natural sort order unless the :sorted option is true or a block is given.

When a block is given the block will be applied to every element in the data set. Using a block in this way allows probability to be computed against a specific field in a data set of hashes or objects.

Parameters:

  • data (Enumerable)

    the data set against which percentile is computed

  • block (Block)

    optional block for per-item processing

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :sorted (true, false)

    indicates of the data is already sorted

Yields:

  • iterates over each element in the data set

Yield Parameters:

  • item

    each element in the data set

Returns:

  • (Numeric)

    value at the rank nearest to the given percentile

See Also:


312
313
314
315
# File 'lib/ratistics/central_tendency.rb', line 312

def second_quartile(data, opts={}, &block)
  return nil if data.nil? || data.empty?
  return CentralTendency.median(data, opts, &block)
end

#third_quartile(data, opts = {}, &block) {|item| ... } ⇒ Numeric Also known as: upper_quartile

Calculate the value representing the upper-bound of the third quartile (percentile) of a data sample. This is the equivalent of the median of the subset of the sample from the sample-median to the upper bound.

Will sort the data set using natural sort order unless the :sorted option is true or a block is given.

When a block is given the block will be applied to every element in the data set. Using a block in this way allows probability to be computed against a specific field in a data set of hashes or objects.

Parameters:

  • data (Enumerable)

    the data set against which percentile is computed

  • block (Block)

    optional block for per-item processing

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :sorted (true, false)

    indicates of the data is already sorted

Yields:

  • iterates over each element in the data set

Yield Parameters:

  • item

    each element in the data set

Returns:

  • (Numeric)

    value at the rank nearest to the given percentile

See Also:


342
343
344
345
346
347
# File 'lib/ratistics/central_tendency.rb', line 342

def third_quartile(data, opts={}, &block)
  return nil if data.nil? || data.empty?
  midpoint = (data.size / 2.0).ceil
  high = data.size - 1
  return CentralTendency.median(Collection.slice(data, (midpoint..high)), opts, &block)
end

#truncated_mean(data, truncation = nil, opts = {}, &block) {|item| ... } ⇒ Float, 0 Also known as: trimmed_mean

Calculates a truncated statistical mean.

The truncation value represents the number of high and low outliers to remove from the sample before calculating the mean. It is a percentage of the sample size. This percent will be removed from both the high end and the low end of the sample. Therefore the total sample size will be reduced by double the truncation value. A truncation value of 50% or greater will cause an exception to be raised. The truncation value can be expressed as a percentage (10.0) or a decimal (0.10). When an exact truncation is not possible (with one-tenth of one percent precision) the mean will be calculated using interpolation.

If the truncation value is nil then only the highest and lowest individual values will be dropped. A sample size of less that three with a nil truncation value will always return zero.

When a block is given the block will be applied to every element in the data set. Using a block in this way allows probability to be computed against a specific field in a data set of hashes or objects.

Parameters:

  • data (Enumerable)

    the data set to compute the mean of

  • truncation (Float) (defaults to: nil)

    the percentage value of truncation of both high and low outliers

  • block (Block)

    optional block for per-item processing

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :sorted (true, false)

    indicates of the data is already sorted

Yields:

  • iterates over each element in the data set

Yield Parameters:

  • item

    each element in the data set

Returns:

  • (Float, 0)

    the statistical mean of the given data set or zero if the data set is empty


75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
# File 'lib/ratistics/central_tendency.rb', line 75

def truncated_mean(data, truncation=nil, opts={}, &block)
  return 0 if data.nil? || data.empty?
  data = data.sort unless block_given? || opts[:sorted] == true

  if truncation.nil?
    if data.size >= 3
      mean = CentralTendency.mean(data.slice(1..data.size-2))
    else
      mean = 0
    end
  else
    truncation *= 100.0 if truncation < 1.0
    raise ArgumentError if truncation >= 50.0

    interval = 100.0 / data.size
    steps = truncation / interval

    if Math.delta(steps, steps.to_i) < 0.1
      
      # exact truncation
      index, length = steps.floor, data.size-(steps.floor * 2)
      if data.respond_to? :slice
        slice = data.slice(index, length)
      else
        slice = Collection.slice(data, index, length)
      end
      mean = CentralTendency.mean(slice, &block)

    else

      # interpolation truncation
      index1, length1 = steps.floor, data.size-(steps.floor * 2)
      index2, length2 = steps.ceil, data.size-(steps.ceil * 2)

      if data.respond_to? :slice
        slice1 = data.slice(index1, length1)
        slice2 = data.slice(index2, length2)
      else
        slice1 = Collection.slice(data, index1, length2)
        slice2 = Collection.slice(data, index1, length2)
      end

      m1 = CentralTendency.mean(slice1, &block)
      m2 = CentralTendency.mean(slice2, &block)
      mean = mean([m1, m2])
    end
  end

  return mean
end

#upper_inner_fenceObject


362
363
364
# File 'lib/ratistics/central_tendency.rb', line 362

def upper_inner_fence
  # Q3 + 1.5*IQ
end

#upper_outer_fenceObject


372
373
374
# File 'lib/ratistics/central_tendency.rb', line 372

def upper_outer_fence
  # Q3 + 3*IQ
end