tech.v3.datatype.statistics
Nan-aware, high quality and reasonably efficient summation and descriptive statistics.
descriptive-statistics
(descriptive-statistics x stats-names stats-data {:keys [nan-strategy], :or {nan-strategy :remove}, :as options})
(descriptive-statistics x stats-names options)
(descriptive-statistics x stats-names)
(descriptive-statistics x)
Calculate a set of descriptive statistics on a single reader.
Available stats: #{:min :quartile-1 :sum :mean :mode :median :quartile-3 :max :variance :standard-deviation :skew :n-elems :kurtosis}
options
:nan-strategy
- defaults to :remove, one of :keep :remove :exception. The fastest option is :keep but this may result in your results having NaN's in them. You can also pass in a double predicate to filter custom double values.
percentiles
(percentiles x percentages options)
(percentiles x percentages)
Create a reader of percentile values, one for each percentage passed in. Estimation types are in the set of #{:r1,r2...legacy} and are described here: https://commons.apache.org/proper/commons-math/javadocs/api-3.3/index.html.
nan-strategy can be one of :keep :remove :exception and defaults to :exception.
quartile-outlier-fn
(quartile-outlier-fn x & {:keys [range-mult]})
Create a function that, given floating point data, will return true or false if that data is an outlier. Default range mult is 1.5:
(or (< val (- q1 (* range-mult iqr)))
(> val (+ q3 (* range-mult iqr)))
Options:
:range-mult
- the multiplier used.