Package tech.v3

Class DType


  • public class DType
    extends java.lang.Object

    'dtype-next' exposes a container-based API for dealing with bulk containers of primitive data efficiently and uniformly indepedent of if they have jvm-heap-backed storage or native-heap-backed. Elemwise access is provided via the 'Buffer' interface while bulk operations such as copying and setting constant value use fast primitives such as arrayCopy, Arrays.fill, memset and memcpy. Extremely fast copy pathways are provided to copy from jvm heap storage (jvm primitive arrays) to native heap storage - these usually boil down to a single C memcopy call.

    All the base C numeric datatypes are supported, unsigned and signed integer types from 8 to 64 bits, 32 and 64 bit floating point types. Contains of unknown type have type ':object', strings have type ':string', etc. Unsigned integer types are denoted by types such as ':uint32' or ':uint8'.

    Care has been taken to make creating custom buffers as easy as possible. Default methods have been provided for nearly all the methods on tech.v3.datatype.Buffer and if you need only to create a read-only buffer which is common if the values are defined by code then there are helper interfaces that define yet more of the defaults. These helper classes are (in the tech.v3.datatype namespace): BooleanReader, LongReader, DoubleReader, and ObjectReader. Users implementing these classes need only to provide an implementation of the lsize and readXXX methods XXX denots the datatype. For example:

     return new LongReader() {
     public long lsize() { return 4; }
     public long readLong(long idx) { return idx; }
     };
     

    There are two key types not represented in this file - tech.v3.datatype.native_buffer.NativeBuffer and tech.v3.datatype.array_buffer.ArrayBuffer. These are the backing stores of nativeHeap and jvmHeap memory, respectively. They are immutable datastructures, unlike nio buffers, and they support, as best as possible, 64 bit indexing. It can be useful at times to get a direct reference to them.

    • Field Summary

      Fields 
      Modifier and Type Field Description
      static clojure.lang.Keyword bool
      Boolean keyword datatype.
      static clojure.lang.Keyword float32
      32 bit floating point datatype.
      static clojure.lang.Keyword float64
      64 bit floating point datatype.
      static clojure.lang.Keyword int16
      Signed short datatype.
      static clojure.lang.Keyword int32
      Signed int datatype.
      static clojure.lang.Keyword int64
      Signed 64 bit integer datatype.
      static clojure.lang.Keyword int8
      Signed byte datatype.
      static clojure.lang.Keyword jvmHeap
      Allocate data on the JVM heap in JVM primitive arrays.
      static clojure.lang.Keyword nativeHeap
      Allocate data on the native heap e.g.
      static clojure.lang.Keyword uint16
      Unsigned short datatype.
      static clojure.lang.Keyword uint32
      Unsigned int datatype.
      static clojure.lang.Keyword uint64
      Unsigned 64 bit integer datatype.
      static clojure.lang.Keyword uint8
      Unsigned byte datatype.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static tech.v3.datatype.ArrayBufferData asArrayBuffer​(java.lang.Object obj)
      Attempt to get a array buffer from an object such as a tensor.
      static tech.v3.datatype.NativeBufferData asNativeBuffer​(java.lang.Object obj)
      Attempt to get a native buffer from an object such as a tensor or a numpy array.
      static java.nio.Buffer asNioBuffer​(java.lang.Object obj)
      Attempt an in-place conversion to a nio buffer.
      static boolean boolCast​(java.lang.Object scalarVal)
      Boolean cast that respects numeric values.
      static java.lang.Object clone​(java.lang.Object data)
      Clone a container of data.
      static java.lang.Object copy​(java.lang.Object src, java.lang.Object dst)
      Efficiently copy data from a source container into a destination containe returning the destination container.
      static long ecount​(java.lang.Object val)
      Return the number of elements in the container.
      static java.lang.Object elemwiseDatatype​(java.lang.Object val)
      Return the datatype contained in the container.
      static tech.v3.datatype.Buffer emap​(clojure.lang.IFn mapFn, java.lang.Object resDtype, java.lang.Object... args)
      Elemwentwise-map a function create a new lazy buffer.
      static org.roaringbitmap.RoaringBitmap emptyBitmap()
      Create a new empty roaring bitmap.
      static tech.v3.datatype.Buffer indexedBuffer​(java.lang.Object indexes, java.lang.Object buffer)
      Create a new Buffer implementation that indexes into a previous Buffer implementation via the provided indexes.
      static java.lang.Object indexedMapReduce​(long numIters, clojure.lang.IFn indexedMapFn, clojure.lang.IFn reduceFn)
      Extremely efficient parallelism primitive.
      static java.lang.Object indexedMapReduce​(long numIters, clojure.lang.IFn indexedMapFn, clojure.lang.IFn reduceFn, java.lang.Object options)
      Extremely efficient parallelism primitive for working through a fixed number of indexes.
      static java.lang.Object makeContainer​(java.lang.Object dataOrNElems)
      Make a container of data.
      static java.lang.Object makeContainer​(java.lang.Object dtype, java.lang.Object dataOrNElems)
      Make a container of data.
      static java.lang.Object makeContainer​(java.lang.Object storage, java.lang.Object dtype, java.lang.Object dataOrNElems)
      Make a container of data.
      static java.lang.Object makeContainer​(java.lang.Object storage, java.lang.Object dtype, java.lang.Object options, java.lang.Object dataOrNElems)
      Make a container of data.
      static tech.v3.datatype.Buffer makeList​(java.lang.Object dtype)
      Make an efficient appendable datastructure that contains a primitive backing store.
      static clojure.lang.IFn mapFactory​(java.util.List keys)
      Return a function taking exactly n-keys arguments that will rapidly construct a new map.
      static long numericByteWidth​(java.lang.Object dtype)
      Return the numeric byte width of a given datatype so for example int32 returns 4.
      static java.util.Map opts​(java.lang.Object... args)
      Create a 'options' map which simply means ensuring the keys are keywords.
      static java.lang.Object reverse​(java.lang.Object item)
      Reverse an sequence, range or reader.
      static java.lang.Object setConstant​(java.lang.Object item, long offset, long length, java.lang.Object value)
      Set a container to a constant value.
      static java.lang.Object setConstant​(java.lang.Object item, long offset, java.lang.Object value)
      Set a container to a constant value.
      static java.lang.Object setConstant​(java.lang.Object item, java.lang.Object value)
      Set a container to a constant value.
      static java.util.List shape​(java.lang.Object val)
      Return the shape of the container as a persistent vector.
      static java.lang.AutoCloseable stackResourceContext()
      Open a stack-based resource context.
      static java.lang.Object subBuffer​(java.lang.Object src, long offset)
      Create a sub-buffer from a larger buffer.
      static java.lang.Object subBuffer​(java.lang.Object src, long offset, long length)
      Create a sub-buffer from a larger buffer.
      static java.lang.Object toArray​(java.lang.Object data)
      Convert data into the most appropriate JVM array for the datatype.
      static java.lang.Object toArray​(java.lang.Object data, java.lang.Object dtype)
      Convert data into an array of the indicated datatype.
      static org.roaringbitmap.RoaringBitmap toBitmap​(java.lang.Object data)
      Create a roaring bitmap from arbitrary data.
      static boolean[] toBooleanArray​(java.lang.Object data)
      Convert data into a boolean array.
      static tech.v3.datatype.Buffer toBuffer​(java.lang.Object src)
      Convert an object to an implementation of tech.v3.datatype.Buffer.
      static byte[] toByteArray​(java.lang.Object data)
      Convert data into a byte array.
      static double[] toDoubleArray​(java.lang.Object data)
      Convert data into a double array.
      static float[] toFloatArray​(java.lang.Object data)
      Convert data into a long array.
      static int[] toIntArray​(java.lang.Object data)
      Convert data into a integer array.
      static long[] toLongArray​(java.lang.Object data)
      Convert data into a long array.
      static short[] toShortArray​(java.lang.Object data)
      Convert data into a short array.
      static java.lang.Object wrapAddress​(java.lang.Object gcObject, long address, long nBytes)
      Wrap an integer pointer into a buffer.
      static java.lang.Object wrapAddress​(java.lang.Object gcObject, long address, long nBytes, java.lang.Object dtype)
      Wrap an integer pointer into a buffer.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • bool

        public static final clojure.lang.Keyword bool
        Boolean keyword datatype.
      • int8

        public static final clojure.lang.Keyword int8
        Signed byte datatype.
      • uint8

        public static final clojure.lang.Keyword uint8
        Unsigned byte datatype.
      • int16

        public static final clojure.lang.Keyword int16
        Signed short datatype.
      • uint16

        public static final clojure.lang.Keyword uint16
        Unsigned short datatype.
      • int32

        public static final clojure.lang.Keyword int32
        Signed int datatype.
      • uint32

        public static final clojure.lang.Keyword uint32
        Unsigned int datatype.
      • int64

        public static final clojure.lang.Keyword int64
        Signed 64 bit integer datatype.
      • uint64

        public static final clojure.lang.Keyword uint64
        Unsigned 64 bit integer datatype.
      • float32

        public static final clojure.lang.Keyword float32
        32 bit floating point datatype.
      • float64

        public static final clojure.lang.Keyword float64
        64 bit floating point datatype.
      • jvmHeap

        public static final clojure.lang.Keyword jvmHeap
        Allocate data on the JVM heap in JVM primitive arrays.
      • nativeHeap

        public static final clojure.lang.Keyword nativeHeap
        Allocate data on the native heap e.g. using 'malloc'.
    • Method Detail

      • indexedMapReduce

        public static java.lang.Object indexedMapReduce​(long numIters,
                                                        clojure.lang.IFn indexedMapFn,
                                                        clojure.lang.IFn reduceFn,
                                                        java.lang.Object options)
        Extremely efficient parallelism primitive for working through a fixed number of indexes. This corresponds to an out-of-core reduction across a wide set of indexes followed by an in-core reduction to the final result. This method uses the ForkJoinPool's common pool by default and if this thread is already running inside the common pool it runs the job in a single threaded mode. It is safe to call this function recurrently as it checks to see if the thread is already in a common pool thread and if so runs the code serially.
        Parameters:
        numIters - Max iteration size.
        indexedMapFn - Function that takes 2 longs, startIndex and groupLen and produces a single result.
        reduceFn - fn that takes a more or less lazy sequence of results and combines them or returns them in-place. For side-effecting loops this could be the Clojure function dorun which simply realizes everything and returns nil.
        options - Options map (keyword keys) described below.

        Options (may be null):

        • :max-batch-size - Defaults to 64000 to respect safe points and to make the result sequence more manageable.
        • :fork-join-pool - Fork join pool to use. Defaults to the common pool.

        Example:

         double[] doubles = toDoubleArray(range(1000000));
         double result =
          (double)indexedMapReduce(doubles.length,
                                       new IFnDef() {
                                         //parallel indexed map start block
                                         public Object invoke(Object startIdx, Object groupLen) {
                                           double sum = 0.0;
                                           //RT.intCast is a checked cast.  This could
                                           //potentially overflow but then the Clojure runtime would
                                       //throw an exception and the double array couldn't
                                           //address the data.
                                           int sidx = RT.intCast(startIdx);
                                           //Note max-batch-size keeps the group len from overflowing
                                           //size of integer.
                                           int glen = RT.intCast(groupLen);
                                           for(int idx = 0; idx < glen; ++idx ) {
                                             sum += doubles[sidx + idx];
                                           }
                                           return sum;
                                         }
                                       },
                                       //Reduction function receives the results of the per-thread
                                       //reduction.
                                       new IFnDef() {
                                         public Object invoke(Object data) {
                                           double sum = 0.0;
        
                                           for( Object c: (Iterable)data) {
                                             sum += (double)c;
                                           }
                                           return sum;
                                         }
                                       });
         
      • indexedMapReduce

        public static java.lang.Object indexedMapReduce​(long numIters,
                                                        clojure.lang.IFn indexedMapFn,
                                                        clojure.lang.IFn reduceFn)
        Extremely efficient parallelism primitive. See documentation on the 4-arity form of the function.
      • elemwiseDatatype

        public static java.lang.Object elemwiseDatatype​(java.lang.Object val)
        Return the datatype contained in the container. For example a double array has an elemwise-datatype of the Clojure keyword ':float64'.
      • ecount

        public static long ecount​(java.lang.Object val)
        Return the number of elements in the container. For tensors this means the number of elements if the tensor is read elemwise in row-major fashion.
      • shape

        public static java.util.List shape​(java.lang.Object val)
        Return the shape of the container as a persistent vector. null has no shape.
      • stackResourceContext

        public static java.lang.AutoCloseable stackResourceContext()

        Open a stack-based resource context. Futher allocations of native-heap memory will be cleaned up when this object is closed. This is meant to be used within a try-with-resources pattern.

        Example:
         try (AutoCloseable ac = stackResourceContext()) {
            Object nativeBuf = makeContainer(nativeHeap, int8, opts("log-level", keyword("info")),
                                               range(10));
           System.out.println(nativeBuf.toString());
         } catch (Exception e) {
           System.out.println("Error!!" + e.toString());
           e.printStackTrace(System.out);
          }
         System.out.println("After stack pop - nativemem should be released");
         
      • makeContainer

        public static java.lang.Object makeContainer​(java.lang.Object storage,
                                                     java.lang.Object dtype,
                                                     java.lang.Object options,
                                                     java.lang.Object dataOrNElems)
        Make a container of data.
        Parameters:
        storage - - either jvmHeap or nativeHeap.
        dtype - - must be a known datatype and if nativeHeap storage is used must be a numeric or boolean datatype.
        options - - a map of Clojure keyword to optional value dependent upon container type. For nativeHeap containers there is ':log-level' - one of the Clojure keywords ':debug', ':trace', ':info'. This results in allocation and deallocation being logged. Another nativeHeap option is ':resource-type' which is one of ':gc', ':stack', null, or ':auto' and defaults to ':auto'. This means that if there is a stack resource context open then the allocation will be tracked by the nearest stack resource context else it will be cleaned up when the garbage collector notes the object is no longer reachable.
        dataOrNElems - - either a container of data or an integer number of elements.
        Returns:
        - an Object that has an efficient conversion to a buffer via toBuffer.
      • makeContainer

        public static java.lang.Object makeContainer​(java.lang.Object storage,
                                                     java.lang.Object dtype,
                                                     java.lang.Object dataOrNElems)
        Make a container of data. See documentation on 4 arity version.
      • makeContainer

        public static java.lang.Object makeContainer​(java.lang.Object dtype,
                                                     java.lang.Object dataOrNElems)
        Make a container of data. See documentation on 4 arity version.
      • makeContainer

        public static java.lang.Object makeContainer​(java.lang.Object dataOrNElems)
        Make a container of data. See documentation on 4 arity version. In this version jvmHeap will be used and it will match the datatype of the passed in data.
      • clone

        public static java.lang.Object clone​(java.lang.Object data)
        Clone a container of data. This will use the fastest available method to copy the container's data into JVM heap memory. This is useful to, for example, copy from native containers to containers safe to return from inside a stack resource context.
      • toArray

        public static java.lang.Object toArray​(java.lang.Object data)
        Convert data into the most appropriate JVM array for the datatype.
      • toArray

        public static java.lang.Object toArray​(java.lang.Object data,
                                               java.lang.Object dtype)
        Convert data into an array of the indicated datatype.
      • toBooleanArray

        public static boolean[] toBooleanArray​(java.lang.Object data)
        Convert data into a boolean array. Numbers will be converted according to the normal numeric rules e.g. 0 is false and anything else is true.
      • toByteArray

        public static byte[] toByteArray​(java.lang.Object data)
        Convert data into a byte array. Data that is out of bounds of a byte will cause a casting exception to be thrown.
      • toShortArray

        public static short[] toShortArray​(java.lang.Object data)
        Convert data into a short array. Data that is out of bounds of a short will cause a casting exception to be thrown.
      • toIntArray

        public static int[] toIntArray​(java.lang.Object data)
        Convert data into a integer array. Data that is out of bounds of a int will cause a casting exception to be thrown.
      • toLongArray

        public static long[] toLongArray​(java.lang.Object data)
        Convert data into a long array. Data that is out of bounds of a long will cause a casting exception to be thrown.
      • toFloatArray

        public static float[] toFloatArray​(java.lang.Object data)
        Convert data into a long array. Data that is out of bounds of a float will cause a casting exception to be thrown.
      • toDoubleArray

        public static double[] toDoubleArray​(java.lang.Object data)
        Convert data into a double array. Data that is out of bounds of a double will cause a casting exception to be thrown.
      • setConstant

        public static java.lang.Object setConstant​(java.lang.Object item,
                                                   long offset,
                                                   long length,
                                                   java.lang.Object value)
        Set a container to a constant value. This tends to be an extremely optimized operation. Returns the container.
      • setConstant

        public static java.lang.Object setConstant​(java.lang.Object item,
                                                   long offset,
                                                   java.lang.Object value)
        Set a container to a constant value. This tends to be an extremely optimized operation. Returns the container.
      • setConstant

        public static java.lang.Object setConstant​(java.lang.Object item,
                                                   java.lang.Object value)
        Set a container to a constant value. This tends to be an extremely optimized operation. Returns the container.
      • copy

        public static java.lang.Object copy​(java.lang.Object src,
                                            java.lang.Object dst)
        Efficiently copy data from a source container into a destination containe returning the destination container.
      • subBuffer

        public static java.lang.Object subBuffer​(java.lang.Object src,
                                                 long offset,
                                                 long length)
        Create a sub-buffer from a larger buffer.
      • subBuffer

        public static java.lang.Object subBuffer​(java.lang.Object src,
                                                 long offset)
        Create a sub-buffer from a larger buffer.
      • toBuffer

        public static tech.v3.datatype.Buffer toBuffer​(java.lang.Object src)
        Convert an object to an implementation of tech.v3.datatype.Buffer. This is useful to make code doing an operation independent of the type of data passed in. Conversions are provided for arrays and anything derived from both java.util.List and java.util.RandomAccess.
      • indexedBuffer

        public static tech.v3.datatype.Buffer indexedBuffer​(java.lang.Object indexes,
                                                            java.lang.Object buffer)
        Create a new Buffer implementation that indexes into a previous Buffer implementation via the provided indexes.
      • boolCast

        public static boolean boolCast​(java.lang.Object scalarVal)
        Boolean cast that respects numeric values. Numeric values of 0 are false, any other numeric value is true. Booleans cast to themselves, null casts to false.
      • reverse

        public static java.lang.Object reverse​(java.lang.Object item)
        Reverse an sequence, range or reader. If range, returns a new range. If sequence, uses clojure.core/reverse If reader, returns a new reader that performs an in-place reverse
      • makeList

        public static tech.v3.datatype.Buffer makeList​(java.lang.Object dtype)
        Make an efficient appendable datastructure that contains a primitive backing store. This object has fast conversions to buffers, fast copy semantics, and fast append semantics.
      • emap

        public static tech.v3.datatype.Buffer emap​(clojure.lang.IFn mapFn,
                                                   java.lang.Object resDtype,
                                                   java.lang.Object... args)
        Elemwentwise-map a function create a new lazy buffer. Operations are performed upon indexed access to the returned Buffer.
      • opts

        public static java.util.Map opts​(java.lang.Object... args)
        Create a 'options' map which simply means ensuring the keys are keywords. This is meant to be a quick shorthand method to create a map of keyword to option value where the user can just pass in strings for the keys.
      • numericByteWidth

        public static long numericByteWidth​(java.lang.Object dtype)
        Return the numeric byte width of a given datatype so for example int32 returns 4.
      • wrapAddress

        public static java.lang.Object wrapAddress​(java.lang.Object gcObject,
                                                   long address,
                                                   long nBytes)

        Wrap an integer pointer into a buffer. If the pointer is invalid of the number of bytes is wrong then the most likely outcome is that your program will crash at some point in the future.

        See the 4-arity version of this function for full documentation. Returns a native buffer.
      • wrapAddress

        public static java.lang.Object wrapAddress​(java.lang.Object gcObject,
                                                   long address,
                                                   long nBytes,
                                                   java.lang.Object dtype)

        Wrap an integer pointer into a buffer. If the pointer is invalid of the number of bytes is wrong then the most likely outcome is that your program will crash at some point in the future.

        Data is assumed to be little endian format.

        Parameters:
        gcObject - An optional object passed in that the native buffer will reference. This keeps the gcObject from being cleaned up by gc-based methods until the native-buffer is no longer referencable.
        address - Integer address of data.
        nBytes - Number of bytes to reference at address.
        dtype - Datatype to interpret the data as. nBytes must be commensurate with the binary size of dtype. Returns a native buffer.
      • asNativeBuffer

        public static tech.v3.datatype.NativeBufferData asNativeBuffer​(java.lang.Object obj)
        Attempt to get a native buffer from an object such as a tensor or a numpy array.
        Returns:
        an instance of 'tech.v3.datatype.NativeBufferData' or null if an in-place conversion is not possible.
      • asArrayBuffer

        public static tech.v3.datatype.ArrayBufferData asArrayBuffer​(java.lang.Object obj)
        Attempt to get a array buffer from an object such as a tensor.
        Returns:
        an instance 'tech.v3.datatype.ArrayBufferData' or null if an in-place conversion is not possible.
      • asNioBuffer

        public static java.nio.Buffer asNioBuffer​(java.lang.Object obj)
        Attempt an in-place conversion to a nio buffer. Returns null if the conversion fails.
      • toBitmap

        public static org.roaringbitmap.RoaringBitmap toBitmap​(java.lang.Object data)
        Create a roaring bitmap from arbitrary data.
      • emptyBitmap

        public static org.roaringbitmap.RoaringBitmap emptyBitmap()
        Create a new empty roaring bitmap.
      • mapFactory

        public static clojure.lang.IFn mapFactory​(java.util.List keys)
        Return a function taking exactly n-keys arguments that will rapidly construct a new map.