Class DType
- java.lang.Object
-
- tech.v3.DType
-
public class DType extends java.lang.Object
'dtype-next' exposes a container-based API for dealing with bulk containers of primitive data efficiently and uniformly indepedent of if they have jvm-heap-backed storage or native-heap-backed. Elemwise access is provided via the 'Buffer' interface while bulk operations such as copying and setting constant value use fast primitives such as arrayCopy, Arrays.fill, memset and memcpy. Extremely fast copy pathways are provided to copy from jvm heap storage (jvm primitive arrays) to native heap storage - these usually boil down to a single C memcopy call.
All the base C numeric datatypes are supported, unsigned and signed integer types from 8 to 64 bits, 32 and 64 bit floating point types. Contains of unknown type have type ':object', strings have type ':string', etc. Unsigned integer types are denoted by types such as ':uint32' or ':uint8'.
Care has been taken to make creating custom buffers as easy as possible. Default methods have been provided for nearly all the methods on tech.v3.datatype.Buffer and if you need only to create a read-only buffer which is common if the values are defined by code then there are helper interfaces that define yet more of the defaults. These helper classes are (in the tech.v3.datatype namespace): BooleanReader, LongReader, DoubleReader, and ObjectReader. Users implementing these classes need only to provide an implementation of the lsize and readXXX methods XXX denots the datatype. For example:
return new LongReader() { public long lsize() { return 4; } public long readLong(long idx) { return idx; } };
There are two key types not represented in this file - tech.v3.datatype.native_buffer.NativeBuffer and tech.v3.datatype.array_buffer.ArrayBuffer. These are the backing stores of nativeHeap and jvmHeap memory, respectively. They are immutable datastructures, unlike nio buffers, and they support, as best as possible, 64 bit indexing. It can be useful at times to get a direct reference to them.
-
-
Field Summary
Fields Modifier and Type Field Description static clojure.lang.Keyword
bool
Boolean keyword datatype.static clojure.lang.Keyword
float32
32 bit floating point datatype.static clojure.lang.Keyword
float64
64 bit floating point datatype.static clojure.lang.Keyword
int16
Signed short datatype.static clojure.lang.Keyword
int32
Signed int datatype.static clojure.lang.Keyword
int64
Signed 64 bit integer datatype.static clojure.lang.Keyword
int8
Signed byte datatype.static clojure.lang.Keyword
jvmHeap
Allocate data on the JVM heap in JVM primitive arrays.static clojure.lang.Keyword
nativeHeap
Allocate data on the native heap e.g.static clojure.lang.Keyword
uint16
Unsigned short datatype.static clojure.lang.Keyword
uint32
Unsigned int datatype.static clojure.lang.Keyword
uint64
Unsigned 64 bit integer datatype.static clojure.lang.Keyword
uint8
Unsigned byte datatype.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static tech.v3.datatype.ArrayBufferData
asArrayBuffer(java.lang.Object obj)
Attempt to get a array buffer from an object such as a tensor.static tech.v3.datatype.NativeBufferData
asNativeBuffer(java.lang.Object obj)
Attempt to get a native buffer from an object such as a tensor or a numpy array.static java.nio.Buffer
asNioBuffer(java.lang.Object obj)
Attempt an in-place conversion to a nio buffer.static boolean
boolCast(java.lang.Object scalarVal)
Boolean cast that respects numeric values.static java.lang.Object
clone(java.lang.Object data)
Clone a container of data.static java.lang.Object
copy(java.lang.Object src, java.lang.Object dst)
Efficiently copy data from a source container into a destination containe returning the destination container.static long
ecount(java.lang.Object val)
Return the number of elements in the container.static java.lang.Object
elemwiseDatatype(java.lang.Object val)
Return the datatype contained in the container.static tech.v3.datatype.Buffer
emap(clojure.lang.IFn mapFn, java.lang.Object resDtype, java.lang.Object... args)
Elemwentwise-map a function create a new lazy buffer.static org.roaringbitmap.RoaringBitmap
emptyBitmap()
Create a new empty roaring bitmap.static tech.v3.datatype.Buffer
indexedBuffer(java.lang.Object indexes, java.lang.Object buffer)
Create a new Buffer implementation that indexes into a previous Buffer implementation via the provided indexes.static java.lang.Object
indexedMapReduce(long numIters, clojure.lang.IFn indexedMapFn, clojure.lang.IFn reduceFn)
Extremely efficient parallelism primitive.static java.lang.Object
indexedMapReduce(long numIters, clojure.lang.IFn indexedMapFn, clojure.lang.IFn reduceFn, java.lang.Object options)
Extremely efficient parallelism primitive for working through a fixed number of indexes.static java.lang.Object
makeContainer(java.lang.Object dataOrNElems)
Make a container of data.static java.lang.Object
makeContainer(java.lang.Object dtype, java.lang.Object dataOrNElems)
Make a container of data.static java.lang.Object
makeContainer(java.lang.Object storage, java.lang.Object dtype, java.lang.Object dataOrNElems)
Make a container of data.static java.lang.Object
makeContainer(java.lang.Object storage, java.lang.Object dtype, java.lang.Object options, java.lang.Object dataOrNElems)
Make a container of data.static tech.v3.datatype.Buffer
makeList(java.lang.Object dtype)
Make an efficient appendable datastructure that contains a primitive backing store.static clojure.lang.IFn
mapFactory(java.util.List keys)
Return a function taking exactly n-keys arguments that will rapidly construct a new map.static long
numericByteWidth(java.lang.Object dtype)
Return the numeric byte width of a given datatype so for example int32 returns 4.static java.util.Map
opts(java.lang.Object... args)
Create a 'options' map which simply means ensuring the keys are keywords.static java.lang.Object
reverse(java.lang.Object item)
Reverse an sequence, range or reader.static java.lang.Object
setConstant(java.lang.Object item, long offset, long length, java.lang.Object value)
Set a container to a constant value.static java.lang.Object
setConstant(java.lang.Object item, long offset, java.lang.Object value)
Set a container to a constant value.static java.lang.Object
setConstant(java.lang.Object item, java.lang.Object value)
Set a container to a constant value.static java.util.List
shape(java.lang.Object val)
Return the shape of the container as a persistent vector.static java.lang.AutoCloseable
stackResourceContext()
Open a stack-based resource context.static java.lang.Object
subBuffer(java.lang.Object src, long offset)
Create a sub-buffer from a larger buffer.static java.lang.Object
subBuffer(java.lang.Object src, long offset, long length)
Create a sub-buffer from a larger buffer.static java.lang.Object
toArray(java.lang.Object data)
Convert data into the most appropriate JVM array for the datatype.static java.lang.Object
toArray(java.lang.Object data, java.lang.Object dtype)
Convert data into an array of the indicated datatype.static org.roaringbitmap.RoaringBitmap
toBitmap(java.lang.Object data)
Create a roaring bitmap from arbitrary data.static boolean[]
toBooleanArray(java.lang.Object data)
Convert data into a boolean array.static tech.v3.datatype.Buffer
toBuffer(java.lang.Object src)
Convert an object to an implementation of tech.v3.datatype.Buffer.static byte[]
toByteArray(java.lang.Object data)
Convert data into a byte array.static double[]
toDoubleArray(java.lang.Object data)
Convert data into a double array.static float[]
toFloatArray(java.lang.Object data)
Convert data into a long array.static int[]
toIntArray(java.lang.Object data)
Convert data into a integer array.static long[]
toLongArray(java.lang.Object data)
Convert data into a long array.static short[]
toShortArray(java.lang.Object data)
Convert data into a short array.static java.lang.Object
wrapAddress(java.lang.Object gcObject, long address, long nBytes)
Wrap an integer pointer into a buffer.static java.lang.Object
wrapAddress(java.lang.Object gcObject, long address, long nBytes, java.lang.Object dtype)
Wrap an integer pointer into a buffer.
-
-
-
Field Detail
-
bool
public static final clojure.lang.Keyword bool
Boolean keyword datatype.
-
int8
public static final clojure.lang.Keyword int8
Signed byte datatype.
-
uint8
public static final clojure.lang.Keyword uint8
Unsigned byte datatype.
-
int16
public static final clojure.lang.Keyword int16
Signed short datatype.
-
uint16
public static final clojure.lang.Keyword uint16
Unsigned short datatype.
-
int32
public static final clojure.lang.Keyword int32
Signed int datatype.
-
uint32
public static final clojure.lang.Keyword uint32
Unsigned int datatype.
-
int64
public static final clojure.lang.Keyword int64
Signed 64 bit integer datatype.
-
uint64
public static final clojure.lang.Keyword uint64
Unsigned 64 bit integer datatype.
-
float32
public static final clojure.lang.Keyword float32
32 bit floating point datatype.
-
float64
public static final clojure.lang.Keyword float64
64 bit floating point datatype.
-
jvmHeap
public static final clojure.lang.Keyword jvmHeap
Allocate data on the JVM heap in JVM primitive arrays.
-
nativeHeap
public static final clojure.lang.Keyword nativeHeap
Allocate data on the native heap e.g. using 'malloc'.
-
-
Method Detail
-
indexedMapReduce
public static java.lang.Object indexedMapReduce(long numIters, clojure.lang.IFn indexedMapFn, clojure.lang.IFn reduceFn, java.lang.Object options)
Extremely efficient parallelism primitive for working through a fixed number of indexes. This corresponds to an out-of-core reduction across a wide set of indexes followed by an in-core reduction to the final result. This method uses the ForkJoinPool's common pool by default and if this thread is already running inside the common pool it runs the job in a single threaded mode. It is safe to call this function recurrently as it checks to see if the thread is already in a common pool thread and if so runs the code serially.- Parameters:
numIters
- Max iteration size.indexedMapFn
- Function that takes 2 longs, startIndex and groupLen and produces a single result.reduceFn
- fn that takes a more or less lazy sequence of results and combines them or returns them in-place. For side-effecting loops this could be the Clojure function dorun which simply realizes everything and returns nil.options
- Options map (keyword keys) described below.Options (may be null):
- :max-batch-size - Defaults to 64000 to respect safe points and to make the result sequence more manageable.
- :fork-join-pool - Fork join pool to use. Defaults to the common pool.
Example:
double[] doubles = toDoubleArray(range(1000000)); double result = (double)indexedMapReduce(doubles.length, new IFnDef() { //parallel indexed map start block public Object invoke(Object startIdx, Object groupLen) { double sum = 0.0; //RT.intCast is a checked cast. This could //potentially overflow but then the Clojure runtime would //throw an exception and the double array couldn't //address the data. int sidx = RT.intCast(startIdx); //Note max-batch-size keeps the group len from overflowing //size of integer. int glen = RT.intCast(groupLen); for(int idx = 0; idx < glen; ++idx ) { sum += doubles[sidx + idx]; } return sum; } }, //Reduction function receives the results of the per-thread //reduction. new IFnDef() { public Object invoke(Object data) { double sum = 0.0; for( Object c: (Iterable)data) { sum += (double)c; } return sum; } });
-
indexedMapReduce
public static java.lang.Object indexedMapReduce(long numIters, clojure.lang.IFn indexedMapFn, clojure.lang.IFn reduceFn)
Extremely efficient parallelism primitive. See documentation on the 4-arity form of the function.
-
elemwiseDatatype
public static java.lang.Object elemwiseDatatype(java.lang.Object val)
Return the datatype contained in the container. For example a double array has an elemwise-datatype of the Clojure keyword ':float64'.
-
ecount
public static long ecount(java.lang.Object val)
Return the number of elements in the container. For tensors this means the number of elements if the tensor is read elemwise in row-major fashion.
-
shape
public static java.util.List shape(java.lang.Object val)
Return the shape of the container as a persistent vector. null has no shape.
-
stackResourceContext
public static java.lang.AutoCloseable stackResourceContext()
Open a stack-based resource context. Futher allocations of native-heap memory will be cleaned up when this object is closed. This is meant to be used within a try-with-resources pattern.
Example:try (AutoCloseable ac = stackResourceContext()) { Object nativeBuf = makeContainer(nativeHeap, int8, opts("log-level", keyword("info")), range(10)); System.out.println(nativeBuf.toString()); } catch (Exception e) { System.out.println("Error!!" + e.toString()); e.printStackTrace(System.out); } System.out.println("After stack pop - nativemem should be released");
-
makeContainer
public static java.lang.Object makeContainer(java.lang.Object storage, java.lang.Object dtype, java.lang.Object options, java.lang.Object dataOrNElems)
Make a container of data.- Parameters:
storage
- - either jvmHeap or nativeHeap.dtype
- - must be a known datatype and if nativeHeap storage is used must be a numeric or boolean datatype.options
- - a map of Clojure keyword to optional value dependent upon container type. For nativeHeap containers there is ':log-level' - one of the Clojure keywords ':debug', ':trace', ':info'. This results in allocation and deallocation being logged. Another nativeHeap option is ':resource-type' which is one of ':gc', ':stack', null, or ':auto' and defaults to ':auto'. This means that if there is a stack resource context open then the allocation will be tracked by the nearest stack resource context else it will be cleaned up when the garbage collector notes the object is no longer reachable.dataOrNElems
- - either a container of data or an integer number of elements.- Returns:
- - an Object that has an efficient conversion to a buffer via toBuffer.
-
makeContainer
public static java.lang.Object makeContainer(java.lang.Object storage, java.lang.Object dtype, java.lang.Object dataOrNElems)
Make a container of data. See documentation on 4 arity version.
-
makeContainer
public static java.lang.Object makeContainer(java.lang.Object dtype, java.lang.Object dataOrNElems)
Make a container of data. See documentation on 4 arity version.
-
makeContainer
public static java.lang.Object makeContainer(java.lang.Object dataOrNElems)
Make a container of data. See documentation on 4 arity version. In this version jvmHeap will be used and it will match the datatype of the passed in data.
-
clone
public static java.lang.Object clone(java.lang.Object data)
Clone a container of data. This will use the fastest available method to copy the container's data into JVM heap memory. This is useful to, for example, copy from native containers to containers safe to return from inside a stack resource context.
-
toArray
public static java.lang.Object toArray(java.lang.Object data)
Convert data into the most appropriate JVM array for the datatype.
-
toArray
public static java.lang.Object toArray(java.lang.Object data, java.lang.Object dtype)
Convert data into an array of the indicated datatype.
-
toBooleanArray
public static boolean[] toBooleanArray(java.lang.Object data)
Convert data into a boolean array. Numbers will be converted according to the normal numeric rules e.g. 0 is false and anything else is true.
-
toByteArray
public static byte[] toByteArray(java.lang.Object data)
Convert data into a byte array. Data that is out of bounds of a byte will cause a casting exception to be thrown.
-
toShortArray
public static short[] toShortArray(java.lang.Object data)
Convert data into a short array. Data that is out of bounds of a short will cause a casting exception to be thrown.
-
toIntArray
public static int[] toIntArray(java.lang.Object data)
Convert data into a integer array. Data that is out of bounds of a int will cause a casting exception to be thrown.
-
toLongArray
public static long[] toLongArray(java.lang.Object data)
Convert data into a long array. Data that is out of bounds of a long will cause a casting exception to be thrown.
-
toFloatArray
public static float[] toFloatArray(java.lang.Object data)
Convert data into a long array. Data that is out of bounds of a float will cause a casting exception to be thrown.
-
toDoubleArray
public static double[] toDoubleArray(java.lang.Object data)
Convert data into a double array. Data that is out of bounds of a double will cause a casting exception to be thrown.
-
setConstant
public static java.lang.Object setConstant(java.lang.Object item, long offset, long length, java.lang.Object value)
Set a container to a constant value. This tends to be an extremely optimized operation. Returns the container.
-
setConstant
public static java.lang.Object setConstant(java.lang.Object item, long offset, java.lang.Object value)
Set a container to a constant value. This tends to be an extremely optimized operation. Returns the container.
-
setConstant
public static java.lang.Object setConstant(java.lang.Object item, java.lang.Object value)
Set a container to a constant value. This tends to be an extremely optimized operation. Returns the container.
-
copy
public static java.lang.Object copy(java.lang.Object src, java.lang.Object dst)
Efficiently copy data from a source container into a destination containe returning the destination container.
-
subBuffer
public static java.lang.Object subBuffer(java.lang.Object src, long offset, long length)
Create a sub-buffer from a larger buffer.
-
subBuffer
public static java.lang.Object subBuffer(java.lang.Object src, long offset)
Create a sub-buffer from a larger buffer.
-
toBuffer
public static tech.v3.datatype.Buffer toBuffer(java.lang.Object src)
Convert an object to an implementation of tech.v3.datatype.Buffer. This is useful to make code doing an operation independent of the type of data passed in. Conversions are provided for arrays and anything derived from both java.util.List and java.util.RandomAccess.
-
indexedBuffer
public static tech.v3.datatype.Buffer indexedBuffer(java.lang.Object indexes, java.lang.Object buffer)
Create a new Buffer implementation that indexes into a previous Buffer implementation via the provided indexes.
-
boolCast
public static boolean boolCast(java.lang.Object scalarVal)
Boolean cast that respects numeric values. Numeric values of 0 are false, any other numeric value is true. Booleans cast to themselves, null casts to false.
-
reverse
public static java.lang.Object reverse(java.lang.Object item)
Reverse an sequence, range or reader. If range, returns a new range. If sequence, uses clojure.core/reverse If reader, returns a new reader that performs an in-place reverse
-
makeList
public static tech.v3.datatype.Buffer makeList(java.lang.Object dtype)
Make an efficient appendable datastructure that contains a primitive backing store. This object has fast conversions to buffers, fast copy semantics, and fast append semantics.
-
emap
public static tech.v3.datatype.Buffer emap(clojure.lang.IFn mapFn, java.lang.Object resDtype, java.lang.Object... args)
Elemwentwise-map a function create a new lazy buffer. Operations are performed upon indexed access to the returned Buffer.
-
opts
public static java.util.Map opts(java.lang.Object... args)
Create a 'options' map which simply means ensuring the keys are keywords. This is meant to be a quick shorthand method to create a map of keyword to option value where the user can just pass in strings for the keys.
-
numericByteWidth
public static long numericByteWidth(java.lang.Object dtype)
Return the numeric byte width of a given datatype so for example int32 returns 4.
-
wrapAddress
public static java.lang.Object wrapAddress(java.lang.Object gcObject, long address, long nBytes)
Wrap an integer pointer into a buffer. If the pointer is invalid of the number of bytes is wrong then the most likely outcome is that your program will crash at some point in the future.
See the 4-arity version of this function for full documentation. Returns a native buffer.
-
wrapAddress
public static java.lang.Object wrapAddress(java.lang.Object gcObject, long address, long nBytes, java.lang.Object dtype)
Wrap an integer pointer into a buffer. If the pointer is invalid of the number of bytes is wrong then the most likely outcome is that your program will crash at some point in the future.
Data is assumed to be little endian format.
- Parameters:
gcObject
- An optional object passed in that the native buffer will reference. This keeps the gcObject from being cleaned up by gc-based methods until the native-buffer is no longer referencable.address
- Integer address of data.nBytes
- Number of bytes to reference at address.dtype
- Datatype to interpret the data as. nBytes must be commensurate with the binary size of dtype. Returns a native buffer.
-
asNativeBuffer
public static tech.v3.datatype.NativeBufferData asNativeBuffer(java.lang.Object obj)
Attempt to get a native buffer from an object such as a tensor or a numpy array.- Returns:
- an instance of 'tech.v3.datatype.NativeBufferData' or null if an in-place conversion is not possible.
-
asArrayBuffer
public static tech.v3.datatype.ArrayBufferData asArrayBuffer(java.lang.Object obj)
Attempt to get a array buffer from an object such as a tensor.- Returns:
- an instance 'tech.v3.datatype.ArrayBufferData' or null if an in-place conversion is not possible.
-
asNioBuffer
public static java.nio.Buffer asNioBuffer(java.lang.Object obj)
Attempt an in-place conversion to a nio buffer. Returns null if the conversion fails.
-
toBitmap
public static org.roaringbitmap.RoaringBitmap toBitmap(java.lang.Object data)
Create a roaring bitmap from arbitrary data.
-
emptyBitmap
public static org.roaringbitmap.RoaringBitmap emptyBitmap()
Create a new empty roaring bitmap.
-
mapFactory
public static clojure.lang.IFn mapFactory(java.util.List keys)
Return a function taking exactly n-keys arguments that will rapidly construct a new map.
-
-