Encryption

In this tutorial you will how to use the at-rest encryption feature for TileDB arrays. It is recommended that you read the dense arrays tutorial first.

Full programs
Program Links
encryption encryptioncpp

Basic concepts and definitions

Creating an encrypted array

Creating an encrypted array is very similar to the unencrypted case. First, configure the array schema with a domain, attributes, etc. in the usual way:

Next, create the array. Note that when creating an encrypted array, you must specify the encryption algorithm and the key to use:

The encryption key must be provided when the array is created because TileDB also encrypts metadata such as the array schema. This same encryption algorithm and key will need to be provided again for any further array operations such as reading and writing.

Writing to an encrypted array

Writing to an encrypted array is also very similar to the unencrypted case. The only difference is that when opening the array, you must specify the same encryption algorithm and key as you used when creating the array.

To start, prepare the unencrypted (plaintext) data to be written:

The data for attribute a is now stored encrypted on disk.

If you specify an incorrect or invalid encryption key when opening the array, TileDB will return an error, meaning the array was not opened and therefore cannot be written to.

Reading from an encrypted array

As with writing, the only difference when reading from encrypted arrays is that you must open the array with the correct encryption key.

Now data holds the decrypted (plaintext) result values from attribute a.

If you specify an incorrect or invalid encryption key when opening the array, TileDB will return an error, meaning the array was not opened and therefore cannot be read from.

Note

By default, TileDB caches array data and metadata in main memory after opening and reading from arrays. These caches will store decrypted (plaintext) array data in the case of encrypted arrays. For a bit of extra in-flight security (at the cost of performance), you can disable the TileDB caches. See the tutorial on Configuration Parameters to learn how to do that.

Supported encryption algorithms

TileDB currently supports a single type of encryption, AES-256 in the GCM mode, which is a symmetric, authenticated encryption algorithm. The details of this encryption method are outside the scope of this tutorial, but at a high level this means when creating, reading or writing arrays you must provide the same 256-bit encryption key. The authenticated nature of the encryption scheme means that a message authentication code (MAC) is stored together with the encrypted data, allowing verification that the persisted ciphertext was not modified.

On macOS and Linux TileDB uses the OpenSSL library for encryption, and the next generation cryptography (CNG) API on Windows.

Encryption key lifetime

TileDB never persists the encryption key, but TileDB does store a copy of the encryption key in main memory while an encrypted array is open. When the array is closed, TileDB will zero out the memory used to store its copy of the key, and free the associated memory.

Performance

Due to the extra processing required to encrypt and decrypt array metadata and attribute data, you may experience lower performance on opening, reading and writing for encrypted arrays.

To mitigate this, TileDB internally parallelizes encryption and decryption using a chunking strategy. Additionally, when compression or other filtering is configured on array metadata or attribute data, encryption occurs last, meaning the compressed (or filtered in general) is what gets encrypted.

Finally, newer generations of some Intel and AMD processors offer instructions for hardware acceleration of encryption and decryption. The encryption libraries that TileDB employs are configured to use hardware acceleration if it is available.