streams.api
Simple api for creating streams based on random sampling from distributions along with minimal arithmetic and manipulation pathways. Arithmetic ops can be used on scalars and streams.
A stream is an object that when called as a function with no arguments returns the next value in the stream but that also efficiently implements clojure.lang.IReduceInit and clojure.lang.IReduce. These are lazy noncaching versions of clojure's sequences.
Streams are strictly serial entities when they are being iterated. There are no provisions made to protect against threading issues.
Only arithmetic ops are specialized to doubles for performance reasons; streams can be streams of arbitrary objects or really anything that implements IReduceInit.
Note that dtype-next has reservior-sampling.
user> (require '[streams.api :as streams])
nil
user> (streams/sample 20 (streams/+ (streams/uniform-stream)
(streams/* 2.0 (streams/uniform-stream))))
[1.5501202319376306, 0.7635588117246281, 2.3532562778994093, 2.209371262799305,
1.3152501796238574, 1.0452647068536018, 0.7894558426559145, 2.198800934691462,
0.26506472311487705, 2.538111046716471, 2.9001166286861992, 1.3705779064113792,
2.1755184584145306, 1.3351040137971486, 1.6120692556203424, 1.6107428912151116,
2.2510286054117365, 0.8765206662618311, 1.213693353303307, 1.2334256767045018]
For best performance keep streams unlimited until the very end. This keeps intermediate streams from needing to check if their sub-streams have in fact ended for every item:
(streams/take 10000 (steams/+ a b))
Performs better than:
(steams/+ (streams/take 10000 a) b)
*
(* a)(* a b)(* a b c)(* a b c & args)Binary or unary operation *. Operates in the space of doubles. Arguments may be streams or double scalars.
+
(+ a)(+ a b)(+ a b c)(+ a b c & args)Binary or unary operation +. Operates in the space of doubles. Arguments may be streams or double scalars.
-
(- a)(- a b)(- a b c)(- a b c & args)Binary or unary operation -. Operates in the space of doubles. Arguments may be streams or double scalars.
/
(/ a)(/ a b)(/ a b c)(/ a b c & args)Binary or unary operation /. Operates in the space of doubles. Arguments may be streams or double scalars.
batch-stream
(batch-stream batch-fn)Given a function that returns a batch of data - which needs to look like a java.util.List make a stream that reads the values of the batches.
bin-stream
(bin-stream s {:keys [n-bins sample-count x-axis-name y-axis-name], :or {sample-count 100000, x-axis-name :value, y-axis-name :sample-count}})(bin-stream s)Bin a stream returning a sorted vector of {x-axis-name bin-left-marker y-axis-name sample-count}. Returned values are sorted by x-axis-name.
The bin-size, x-axis-name and y-axis-name are returned as metdata on the return value.
Example:
user> (require '[streams.api :as streams])
nil
user> (def binned (streams/bin-stream (streams/gaussian-stream)))
#'user/binned
user> (take 10 binned)
({:value -4.530608699974987, :sample-count 1}
{:value -4.2370268390694985, :sample-count 1}
{:value -4.139166218767668, :sample-count 1}
{:value -4.041305598465839, :sample-count 2}
{:value -3.8455843578621796, :sample-count 4}
{:value -3.74772373756035, :sample-count 8}
{:value -3.64986311725852, :sample-count 6}
{:value -3.552002496956691, :sample-count 4}
{:value -3.4541418766548606, :sample-count 18}
{:value -3.3562812563530313, :sample-count 15})
user> (meta binned)
{:x-axis-name :value,
:y-axis-name :sample-count,
:bin-size 0.09786062030182967,
:min -4.530608699974987,
:max 4.255453330207979}
def-double-binary-op
macro
(def-double-binary-op op-sym docstr)(def-double-binary-op op-sym)Define a unary and binary double from clojure.core or another library such as +. Operation need only have single arity of 2.
def-double-op
macro
(def-double-op op-sym)Define a unary and binary double from clojure.core or another library such as +. Operation must have 1,2,+ arities.
def-double-unary-op
macro
(def-double-unary-op op-sym docstr)(def-double-unary-op op-sym)Define a unary and binary double from clojure.core or another library such as +. Operation need only have single arity of 2.
fastmath-stream
(fastmath-stream n key opts)(fastmath-stream key opts)(fastmath-stream key)Create a stream based on a fastmath distribution.
You can provide a seed via providing an rng:
streams.api> (def ds (fastmath-stream :exponential {:rng (fast-r/rng :mersenne 1)}))
#'streams.api/ds
streams.api> (ds)
2.0910007182186208
streams.api> (def ds (fastmath-stream :exponential {:rng (fast-r/rng :mersenne 1)}))
#'streams.api/ds
streams.api> (ds)
2.0910007182186208
gaussian-stream
(gaussian-stream n opts)(gaussian-stream n)(gaussian-stream)Create a gaussian stream with mean 0 variance 1. An integer seed may be provided with :seed. The specific rng you want may be selected with :rng and will be passed to fastmath.random/rng.
streams.api> (def s (gaussian-stream nil {:seed 1 :rng :mersenne}))
#'streams.api/s
streams.api> (s)
1.0019203836877835
streams.api> (def s (gaussian-stream nil {:seed 1 :rng :mersenne}))
#'streams.api/s
streams.api> (s)
1.0019203836877835
streams.api> fast-r/rngs-list
(:mersenne
:well44497a
:jdk
:well19937c
:well1024a
:well19937a
:well512a
:isaac
:well44497b)
streams.api> (def s (gaussian-stream nil {:seed 1 :rng :well512a}))
#'streams.api/s
streams.api> (s)
-1.6141338321555592
interleave
(interleave)(interleave c0)(interleave c0 c1)(interleave c0 c1 & args)Fast noncaching form of interleave.
log1p
(log1p a)Unary operation log1p. Operates in the space of doubles. Argument may be a streams or a double.
map
(map mapfn s)(map mapfn a b)(map mapfn a b c)(map mapfn a b c & args)Map a function onto one or more streams. Returns a new stream whose limit is the least of any of the streams.
prob-interleave
(prob-interleave args opts)(prob-interleave args)Probabilistically interleave multiple streams. Each argument must be a tuple of stream prob and probabilities will be used with a flat distribution to decide which stream to sample from. Iteration stops when any of the component streams is empty.
Options:
:seed- Provide an integer seed to construct a new java.util.Random.:rng- Provide a clojure function that takes no arguments and returns a double between 0-1.
Example:
streams.graphs> (streams/sample 20 (streams/prob-interleave [[(streams/gaussian-stream) 0.1]
[(streams/gaussian-stream) 0.5]
[(streams/stream 2 1) 0.5]]))
[0.3261978516358189, 0.23722603841776788, 1.0, -0.16219928642385675, 1.0,
0.43443517752548294, -1.93659876689825]
sample
(sample s)(sample n s)Sample stream into a double array. If n is not provided, stream must either already have a limit or an oom is imminent.
stream
macro
(stream code)(stream l code)Create a 'stream' - the lazy noncaching form of repeatedly.
uniform-stream
(uniform-stream n opts)(uniform-stream n)(uniform-stream)Create a uniform stream with values 0-1. An integer seed may be provided with :seed. The specific rng you want may be selected with :rng and will be passed to fastmath.random/rng.
streams.api> (def s (gaussian-stream nil {:seed 1 :rng :mersenne}))
#'streams.api/s
streams.api> (s)
1.0019203836877835
streams.api> (def s (gaussian-stream nil {:seed 1 :rng :mersenne}))
#'streams.api/s
streams.api> (s)
1.0019203836877835
streams.api> fast-r/rngs-list
(:mersenne
:well44497a
:jdk
:well19937c
:well1024a
:well19937a
:well512a
:isaac
:well44497b)
streams.api> (def s (gaussian-stream nil {:seed 1 :rng :well512a}))
#'streams.api/s
streams.api> (s)
-1.6141338321555592