tech.v3.datatype.statistics
Nan-aware, high quality and reasonably efficient summation and descriptive statistics.
descriptive-statistics
(descriptive-statistics stats-names stats-data {:keys [nan-strategy], :or {nan-strategy :remove}, :as options} src-rdr)
(descriptive-statistics stats-names options rdr)
(descriptive-statistics stats-names rdr)
(descriptive-statistics rdr)
Calculate a set of descriptive statistics on a single reader.
Available stats: #{:min :quartile-1 :sum :mean :mode :median :quartile-3 :max :variance :standard-deviation :skew :n-elems :kurtosis}
options
:nan-strategy
- defaults to :remove, one of :keep :remove :exception. The fastest option is :keep but this may result in your results having NaN's in them. You can also pass in a double predicate to filter custom double values.
percentiles
(percentiles percentages options data)
(percentiles percentages data)
Create a reader of percentile values, one for each percentage passed in. Estimation types are in the set of #{:r1,r2...legacy} and are described here: https://commons.apache.org/proper/commons-math/javadocs/api-3.3/index.html.
nan-strategy can be one of :keep :remove :exception and defaults to :exception.
quartile-outlier-fn
(quartile-outlier-fn item & [range-mult])
Create a function that, given floating point data, will return true or false if that data is an outlier. Default range mult is 1.5: (or (< val (- q1 (* range-mult iqr))) (> val (+ q3 (* range-mult iqr)))
spearmans-correlation
(spearmans-correlation options lhs rhs)
(spearmans-correlation lhs rhs)