Sparse Arrays

In this tutorial we will learn how to create, read, and write a simple sparse array in TileDB.

Full programs
Program Links
quickstart_sparse quickstartcpp quickstartpy

Basic concepts and definitions

Creating a sparse array

Note

The order of the dimensions (as added to the domain) is important later when specifying subarrays. For instance, in the above example, subarray [1,2], [2,4] means slice the first two values in the rows dimension domain, and values 2,3,4 in the cols dimension domain.

All that is left to do is create the empty array on disk so that it can be written to. We specify the name of the array to create, and the schema to use. This command will essentially persist the array schema we just created on disk.

Writing to the array

We will populate the array by writing some values to its cells, specifically 1, 2, and 3 at cells (1,1), (2,4) and (2,3), respectively. Notice that, contrary to the dense case, here we specify the exact indices where the values will be written, i.e., we provide the cell coordinates.

The array data is now stored on disk. The resulting array is depicted in the figure below.

../_images/quickstart_sparse1.png

Reading from the array

We will next explain how to read the cell values in subarray [1,2], [2,4], i.e., in the blue rectangle shown in the figure above. The result values should be 3 2, reading in row-major order.

The row-major layout here means that the cells will be returned in row-major order within the subarray [1,2], [2,4] (more information on cell layouts is covered in later tutorials).

If you compile and run this tutorial example as shown below, you should see the following output:

On-disk structure

A TileDB array is stored on disk as a directory with the name given at the time of array creation. If we look into the array on disk after it has been written to, we will see something like the following

$ ls -l quickstart_sparse_array/
total 8
drwx------  5 stavros  staff  160 Jun 25 15:22 __1561490578769_1561490578769_9e429a59930b4a9c83baa57eb2fb41a8
-rwx------  1 stavros  staff  153 Jun 25 15:22 __array_schema.tdb
-rwx------  1 stavros  staff    0 Jun 25 15:22 __lock.tdb
drwx------  2 stavros  staff   64 Jun 25 15:22 __meta

The array directory and files __array_schema.tdb and __lock.tdb were written upon array creation, whereas subdirectory __1561490578769_1561490578769_9e429a59930b4a9c83baa57eb2fb41a8 was created after array writting. This subdirectory, called fragment, contains the written cell values for attribute a in file a.tdb and the corresponding coordinates in a separate file __coords.tdb, along with associated metadata:

$ ls -l quickstart_sparse_array/__1561490578769_1561490578769_9e429a59930b4a9c83baa57eb2fb41a8/
total 24
-rwx------  1 stavros  staff  106 Jun 25 15:22 __coords.tdb
-rwx------  1 stavros  staff  611 Jun 25 15:22 __fragment_metadata.tdb
-rwx------  1 stavros  staff   32 Jun 25 15:22 a.tdb

The TileDB array hierarchy on disk and more details about fragments are discussed in later tutorials.