tech.v3.datatype.statistics

Nan-aware, high quality and reasonably efficient summation and descriptive statistics.

all-descriptive-stats-names

define-descriptive-stats

macro

(define-descriptive-stats)

descriptive-statistics

(descriptive-statistics stats-names stats-data {:keys [nan-strategy], :or {nan-strategy :remove}, :as options} src-rdr)(descriptive-statistics stats-names options rdr)(descriptive-statistics stats-names rdr)(descriptive-statistics rdr)

Calculate a set of descriptive statistics on a single reader.

Available stats: #{:min :quartile-1 :sum :mean :mode :median :quartile-3 :max :variance :standard-deviation :skew :n-elems :kurtosis}

options

  • :nan-strategy - defaults to :remove, one of :keep :remove :exception. The fastest option is :keep but this may result in your results having NaN's in them. You can also pass in a double predicate to filter custom double values.

kendalls-correlation

(kendalls-correlation options lhs rhs)(kendalls-correlation lhs rhs)

kurtosis

(kurtosis data options)(kurtosis data)

max

(max data options)(max data)

mean

(mean data options)(mean data)

double mean of data

median

(median data options)(median data)

min

(min data options)(min data)

mode

(mode data)

Return the value of the most common occurance in the data.

moment-2

(moment-2 data options)(moment-2 data)

moment-3

(moment-3 data options)(moment-3 data)

moment-4

(moment-4 data options)(moment-4 data)

pearsons-correlation

(pearsons-correlation options lhs rhs)(pearsons-correlation lhs rhs)

percentiles

(percentiles percentages options data)(percentiles percentages data)

Create a reader of percentile values, one for each percentage passed in. Estimation types are in the set of #{:r1,r2...legacy} and are described here: https://commons.apache.org/proper/commons-math/javadocs/api-3.3/index.html.

nan-strategy can be one of :keep :remove :exception and defaults to :exception.

quartile-1

(quartile-1 data options)(quartile-1 data)

quartile-3

(quartile-3 data options)(quartile-3 data)

quartile-outlier-fn

(quartile-outlier-fn item & [range-mult])

Create a function that, given floating point data, will return true or false if that data is an outlier. Default range mult is 1.5: (or (< val (- q1 (* range-mult iqr))) (> val (+ q3 (* range-mult iqr)))

quartiles

(quartiles item)(quartiles options item)

return min, 25 50 75 max of item

skew

(skew data options)(skew data)

spearmans-correlation

(spearmans-correlation options lhs rhs)(spearmans-correlation lhs rhs)

standard-deviation

(standard-deviation data options)(standard-deviation data)

sum

(sum data options)(sum data)

Double sum of data using Kahan compensated summation.

variance

(variance data options)(variance data)