tech.v3.datatype.statistics

Nan-aware, high quality and reasonably efficient summation and descriptive statistics.

all-descriptive-stats-names

define-descriptive-stats

macro

(define-descriptive-stats)

descriptive-statistics

(descriptive-statistics x stats-names stats-data {:keys [nan-strategy], :or {nan-strategy :remove}, :as options})(descriptive-statistics x stats-names options)(descriptive-statistics x stats-names)(descriptive-statistics x)

Calculate a set of descriptive statistics on a single reader.

Available stats: #{:min :quartile-1 :sum :mean :mode :median :quartile-3 :max :variance :standard-deviation :skew :n-elems :kurtosis}

options

  • :nan-strategy - defaults to :remove, one of :keep :remove :exception. The fastest option is :keep but this may result in your results having NaN's in them. You can also pass in a double predicate to filter custom double values.

kendalls-correlation

(kendalls-correlation x y options)(kendalls-correlation x y)

kurtosis

(kurtosis x options)(kurtosis x)

max

(max x options)(max x)

mean

(mean x options)(mean x)

double mean of x

median

(median x options)(median x)

min

(min x options)(min x)

mode

(mode data)

Return the value of the most common occurance in the data.

moment-2

(moment-2 x options)(moment-2 x)

moment-3

(moment-3 x options)(moment-3 x)

moment-4

(moment-4 x options)(moment-4 x)

pearsons-correlation

(pearsons-correlation x y options)(pearsons-correlation x y)

percentiles

(percentiles x percentages options)(percentiles x percentages)

Create a reader of percentile values, one for each percentage passed in. Estimation types are in the set of #{:r1,r2...legacy} and are described here: https://commons.apache.org/proper/commons-math/javadocs/api-3.3/index.html.

nan-strategy can be one of :keep :remove :exception and defaults to :exception.

quartile-1

(quartile-1 x options)(quartile-1 x)

quartile-3

(quartile-3 x options)(quartile-3 x)

quartile-outlier-fn

(quartile-outlier-fn x & {:keys [range-mult]})

Create a function that, given floating point data, will return true or false if that data is an outlier. Default range mult is 1.5:

  (or (< val (- q1 (* range-mult iqr)))
      (> val (+ q3 (* range-mult iqr)))

Options:

  • :range-mult - the multiplier used.

quartiles

(quartiles x)(quartiles x options)

return min, 25 50 75 max of item

skew

(skew x options)(skew x)

spearmans-correlation

(spearmans-correlation x y options)(spearmans-correlation x y)

standard-deviation

(standard-deviation x options)(standard-deviation x)

sum

(sum x options)(sum x)

Double sum of data using Kahan compensated summation.

variance

(variance x options)(variance x)