Dense Arrays

In this tutorial we will learn how to create, read, and write a simple dense array in TileDB.

Full programs
Program Links
quickstart_dense quickstartcpp quickstartpy

Basic concepts and definitions

Creating a dense array

Note

The order of the dimensions (as added to the domain) is important later when specifying subarrays. For instance, in the above schema, subarray [1,2], [2,4] means slice the first two values in the rows dimension domain, and values 2,3,4 in the cols dimension domain.

All that is left to do is create the empty array on disk so that it can be written to. We specify the name of the array to create, and the schema to use. This command will essentially persist the array schema we just created on disk.

Note

The array name here will be used to create a data directory in the current working path (see On-disk Structure. below). The array name can also be a full URI, for example a path like file:///home/username/my_array or an S3 URI like s3://bucket-name/array-name.

Writing to the array

We will populate the array with values 1, 2, ..., 16. To start, prepare the data to be written:

Although the cell layout is covered thoroughly in later tutorials, here what you should know is that you are telling TileDB that the cell values in your buffer will be written in row-major order in the cells of the array (i.e., 1 will be stored in cell (1,1), 2 in (1,2), etc.).

The array data is now stored on disk. The resulting array is depicted in the figure below.

../_images/quickstart_dense1.png

Reading from the array

We will next explain how to read the cell values in subarray [1,2], [2,4], i.e., in the blue rectangle shown in the figure above. The result values should be 2 3 4 6 7 8, reading in row-major order (i.e., first the three selected columns of row 1, then the three selected columns of row 2).

The row-major layout here means that the cells will be returned in row-major order within the subarray [1,2], [2,4] (more information on cell layouts is covered in later tutorials).

Now data holds the result cell values on attribute a. If you compile and run the example of this tutorial as shown below, you should see the following output:

On-disk structure

A TileDB array is stored on disk as a directory with the name given at the time of array creation. If we look into the array on disk after it has been written to, we will see something like the following

$ ls -l quickstart_dense_array/
total 8
drwx------  4 stavros  staff  128 Jun 25 15:18 __1561490302161_1561490302161_15bab0281e2e44f2a803eb6f3001ed00
-rwx------  1 stavros  staff  149 Jun 25 15:18 __array_schema.tdb
-rwx------  1 stavros  staff    0 Jun 25 15:18 __lock.tdb
drwx------  2 stavros  staff   64 Jun 25 15:18 __meta

The array directory and files __array_schema.tdb and __lock.tdb were written upon array creation, whereas subdirectory __1561490302161_1561490302161_15bab0281e2e44f2a803eb6f3001ed00 was created after array writting. This subdirectory, called fragment, contains the written cell values for attribute a in file a.tdb, along with associated metadata:

$ ls -l quickstart_dense_array/__1561490302161_1561490302161_15bab0281e2e44f2a803eb6f3001ed00/
total 16
-rwx------  1 stavros  staff  602 Jun 25 15:18 __fragment_metadata.tdb
-rwx------  1 stavros  staff   84 Jun 25 15:18 a.tdb

The TileDB array hierarchy on disk and more details about fragments are discussed in later tutorials.