tech.v3.datatype.argops
Efficient functions for operating in index space. Take-off of the argsort, argmin, etc. type functions from Matlab. These functions generally only work on readers and all return some version of an index or list of indexes.
->binary-operator
(->binary-operator op)
Convert a thing to a binary operator. Thing can be a keyword or an implementation of IFn or an implementation of a BinaryOperator.
->binary-predicate
(->binary-predicate op)
Convert a thing to a binary predicate. Thing can be a keyword or an implementation of IFn or an implementation of a BinaryPredicate.
->double-comparator
(->double-comparator src-comparator)
Convert a thing to a it.unimi.dsi.fastutil.doubles.DoubleComparator.
->long-comparator
(->long-comparator src-comparator)
Convert a thing to a it.unimi.dsi.fastutil.longs.LongComparator.
->unary-operator
(->unary-operator op)
Convert a thing to a unary operator. Thing can be a keyword or an implementation of IFn or an implementation of a UnaryOperator.
->unary-predicate
(->unary-predicate op)
Convert a thing to a unary predicate. Thing can be a keyword or an implementation of IFn or an implementation of a UnaryPredicate.
arg-min-n
(arg-min-n N comparator {:keys [nan-strategy], :or {nan-strategy :last}} values)
(arg-min-n N comparator values)
(arg-min-n N values)
Return the indexes of the top minimum items. Values must be countable and random access. Same options,arguments as argsort.
argfilter
(argfilter pred options rdr)
(argfilter pred rdr)
Filter out values returning either an iterable of indexes or a reader of indexes.
arggroup
(arggroup {:keys [storage-datatype unordered? skip-finalize? map-fn operation-space], :as options} rdr)
(arggroup rdr)
Group by elemens in the reader returning a map of value->list of indexes.
Note the options are passed through to hamf/group-by-reducer which then passes them through to preduce-reducer.
Options:
:storage-datatype
-:int32
,:int64, or
:bitmap`, defaults to whatever will fit based on the element count of the reader.:unordered?
- defaults to false, if true uses a slower algorithm that guarantees the resulting index lists will be ordered. In the case where storage is bitmap, unordered reductions are used as the bitmap forces the end results to be ordered:key-fn
- defaults to identity. In this case the reader's values are used as the keys.
arggroup-by
(arggroup-by partition-fn options rdr)
(arggroup-by partition-fn rdr)
Group by elemens in the reader returning a map of value->list of indexes. Indexes may not be ordered. :storage-datatype may be specific in the options to set the datatype of the indexes else the system will decide based on reader length. See arggroup for Options.
arglast-every
(arglast-every rdr pred-op)
Return the last index where (pred (rdr idx) (rdr (dec idx))) was true by comparing every value and keeping track of the last index where pred was true.
argpartition
(argpartition pred item-iterable)
(argpartition item-iterable)
Returns a sequence of partition-key index-reader. Index generation is not parallelized. This design allows group-by and partition-by to be used interchangeably (if pred is :tech.numerics/eq) as they both result in a sequence of partition-key idx-reader. This design is lazy.
argpartition-by
(argpartition-by unary-op partition-pred item-iterable)
(argpartition-by unary-op item-iterable)
Returns a sequence of partition-key index-reader. Index generation is not parallelized. This design allows group-by and partition-by to be used interchangeably as they both result in a sequence of partition-key idx-reader. This design is lazy.
argshuffle
(argshuffle n-indexes {:keys [seed container-type], :or {container-type :jvm-heap}})
(argshuffle n-indexes)
Serially shuffle N indexes into a an array of data. Returns an array of indexes.
Options:
:seed
- Either nil, an integer, or an implementation ofjava.util.Random
. This seeds the random generator if provided or a new one is created if not.:container-type
- The container type of the data, defaults to:jvm-heap
. See documentation formake-container
.
argsort
(argsort comparator {:keys [parallel? nan-strategy], :or {parallel? true, nan-strategy :last}, :as _options} values)
(argsort comparator values)
(argsort values)
Sort values in index space returning a buffer of indexes. By default uses a parallelized quicksort algorithm by default.
compare-fn
may be one of:- a clojure operator like clojure.core/<
:tech.numerics/<
,:tech.numerics/>
for unboxing comparisons of primitive values.- clojure.core/compare
- A custom java.util.Comparator instantiation.
Options:
:nan-strategy
- General missing strategy. Options are:first
,:last
, and:exception
.:parallel?
- Uses parallel quicksort when true and regular quicksort when false.
binary-argfilter
(binary-argfilter pred options lhs rhs)
(binary-argfilter pred lhs rhs)
Filter out values using a binary predicate. Returns either an iterable of indexes or a reader of indexes.
binary-search
(binary-search data target options)
(binary-search data target)
Returns a long result that points to just before the value or exactly points to the value. In the case where the target is after the last value will return elem-count.
Options:
:comparator
- a specific comparator to use; defaults tocomparator
.
binary-search-impl
macro
(binary-search-impl data target scalar-cast read-fn comparator compare-fn)
index-comparator
(index-comparator src-comparator nan-strategy values)
Given a reader of values an a source comparator, return either an IntComparator or a LongComparator depending on the number of indexes in the reader that compares the values using the passed in comparator.
index-of
(index-of rdr pred value)
(index-of rdr value)
Return the first time pred is true given a comparison value.
last-index-of
(last-index-of rdr pred value)
(last-index-of rdr value)
Return the last index of the last time time pred, which defaults to :tech.numerics/eq, is true.