The APIs have been designed to be close to the way of thinking of each programming language, but they share common principles. This chapter will give you a rapid overview of how the API works and the principles behind each language binding.
Most success and failure conditions are based on return values. When an operation is successful, a function returns with the status qdb_e_ok. In Python and Java, errors are translated to exceptions.
Prior to running any command, a connection to the cluster must be established. It is possible to either connect to a single node or to multiple nodes within the cluster.
Connecting to a single node is more simple and suitable for non-critical clients. Connecting to multiple nodes enables the client, at initialization, to try several different nodes should one fail.
Once the connection is established, the client will lazily explore the ring as requests are made.
When putting an entry, the call succeeds if the entry does not exist and fails if it already exists. When updating an entry, the call always succeeds: if the entry does not exist, it will be created.
Unless otherwise noted, all calls are atomic. When a call returns, you can be assured the data has been persisted to disk (to the limit of what the operating system can guarantee in that aspect).
A lot of calls allocate memory on the client side. For example, when you get an entry, the calls allocate a buffer large enough to hold the entry for you. In C, the memory needs to be explicitly released. In C++, Java and Python, this is not necessary as a memory management mechanism is included with the API.
As of quasardb 1.2.0, if the cluster uses Data replication, read queries are automatically load-balanced. Nodes containing replicated entries may respond instead of the original node to provide faster lookup times.
Any entry within quasardb can have an expiry time. Once the expiry time is passed, the entry is removed and is no longer accessible. Through the API the expiry time precision is one second. Internally, quasardb clock resolution is operating system dependant, but often below 100 µs.
Expiry time can either be absolute (with the number of seconds relative to epoch) or relative (with the number of seconds relative to when the call is made). To prevent an entry from expiring, one provides a 0 absolute time. By default entries never expire. Specifying an expiry in the past results in the entry being removed.
Modifying an entry in any way (via an update, removal, compare and swap operation...) resets the expiry to 0 unless otherwise specified.
All absolute expiry time are UTC and 64-bit large, meaning there is no practical limit to an expiry time.
Iteration is unordered, that is, the order in which entries are returned is undetermined. Every entry will be returned once: no entry may be returned twice.
If a node becomes unavailable during iteration, the contents stored on that node may be skipped over, depending on the replication configuration of the cluster.
If it is impossible to recover from an error during the iteration, the iteration will prematurely stop. It is the caller’s decision to try again or give up.
The “current” state of the cluster is what is iterated upon. No “snapshot” is made. If an entry is added during iteration it may, or may not, be included in the iteration, depending on its placement respective to the iteration cursor. It is planned to change this behaviour to allow “consistent” iteration in a future release.
Quasardb enables you to access entries provided that you know the associated key. But what if you don’t know the key? It is still possible to iterate on the whole cluster to list all entries but this is not very efficient.
Fortunately, quasardb provides you with a prefix based search. This feature enables you to list all keys based on a prefix, in other words, you can list all keys starting with a specified bytes sequence.
This feature transforms quasardb into a hierarchical database, since with an appropriate naming scheme it becomes possible to group keys.
Let’s say you want to store financial instruments’ values into quasardb. Imagine we have the following entries:
- instruments.forex.spot.usd.eur
- instruments.forex.spot.usd.cad
- instruments.debt.bond.mybound1
- instruments.equity.stock.mystock1
In one query you can efficiently list all available forex spots for a given currency:
// we assume a properly initialized qdb::handle named h
qdb_error_t err = qdb_e_uninitialized;
std::vector<std::string> usd_spots = h.prefix_get("instruments.forex.spot.usd.", err);
if (err != qdb_e_ok)
{
// error management
// ...
}
usd_spots will contain the list of all keys (with their full name) starting with “instruments.forex.spot.usd.”, in our case the list will contain:
- instruments.forex.spot.usd.eur
- instruments.forex.spot.usd.cad
Once you have this list, it’s easy to query the content.
- The client needs to have enough memory to allocate the results list
- The search prefix needs to be at least three bytes long
- It is not possible to list reserved entries (entries starting with “qdb”)
- Once the list is returned, it may change as concurrent requests may add or remove entries that ought to be in the list
How fast is the query? The complexity isn’t dependent on the number of entries in your cluster. Whether you have 1 billion entries or only two, the query runs in comparable time (if you set aside the memory management overhead which varies in time based on the size of the result).
The complexity of the request is dependent on the number of nodes and the length of the key.
Formally, if \(k\) is the number of characters in the prefix, and \(n\) the number of nodes in the cluster, the complexity is:
This means that run time grows linearly with the cluster size.
Note
As of this writing, we are working on an improved version whose run time complexity will be:
Prefix-based search brings a lot of flexibility to quasardb, enabling you to organize your data into logical trees for efficient queries. Although the runtime performance is dependent on the cluster size, performance is excellent and an order of magnitude faster than iteration. Additionally, performance for large clusters will be greatly improved in future releases.
If you have used quasardb to manage small entries (that is entries smaller than 1 KiB) you certainly have noticed that performance isn’t as good as with larger entries. The reason for this is that whatever optimizations we might put into quasardb, every time you request the cluster, the request has to go through the network back and forth.
Assuming that you have a 1 ms latency between the client and the server, if you want to query 1,000 entries sequentially it will take you at least 2 seconds, however small the entry might be, however large the bandwidth might be.
Batch operations solve this problem by enabling you to group multiple queries into a single request. This grouping can speed up processing by several orders of magnitude.
How to query the content of many small entries at once? If we assume we have a vector of strings containing the entries named “entries” getting all entries is a matter of building the batch and running it:
// we assume the existence and correctness of std::vector<std::string> entries;
std::vector<qdb_operations_t> operations(entries.size());
std::transform(entries.begin(), entries.end(), operations.begin(), [](const std::string & str) -> qdb_operation_t
{
qdb_operation_t op;
// it is paramount that unused parameters are set to zero
memset(&op, 0, sizeof(op));
op.error = qdb_e_uninitialized; // this is optional
op.type = qdb_op_get_alloc; // this specifies the kind of operation we want
op.alias = str.c_str();
return op;
});
// we assume a properly initialized qdb::handle named h
size_t success_count = h.run_batch(&operations[0], operations.size());
if (success_count != operations.size())
{
// error management
// each operation will have its error member updated properly
}
Each result is now available in the “result” structure member and its size is stored in the “result_size”. This an API allocated buffer. Releasing all memory is done in the following way:
qdb_free_operations(h, &operations[0], operations.size());
operations.clear();
- The order in which operations in a batch are executed is undetermined
- Each operation in a batch is ACID, however the batch as a whole is neither ACID nor transactional
- Running a batch adds overhead. Using the batch API for small batches may therefore yield unsatisfactory performance
Batches may contain any combination of gets, puts, updates, removes, compare and swaps, get and updates (atomic), get and removes (atomic) and conditional removes.
Warning
Since the execution order is undetermined, it is strongly advised to avoid dependencies within a single batch. For performance reasons the API doesn’t perform any semantic check.
Each operation receives a status, independent from other operations. If for some reason the cluster estimates that running the batch may be unsafe or unreliable, operations may be skipped and will have the qdb_e_skipped error code. This can also happen in case of a global error (unstable ring, low memory condition) or malformed batch.
A batch with an invalid request or an invalid number of operations is considered malformed as a whole and ignored. This is because quasardb considers that a batch with invalid entries is probably erroneous as a whole and even requests that look valid should not be run as a precaution.
For example, if you submit a batch of put operations and one of the operations has an invalid parameter (for example an empty alias), the whole batch will be in error. The operation with the invalid parameter will have the qdb_e_invalid_argument error code and other operations will have the qdb_e_skipped error code.
Batch operations have three stages:
- Mapping - The API maps all operations to the proper nodes in making all necessary requests. This phase, although very fast, is dependant on the cluster size and has a worst case of three requests per node.
- Dispatching - The API sends groups of operations in optimal packets to each node. This phase is only dependant on the size of the batch.
- Reduction - Results from the cluster are received, checked and reduced. This phase is only dependant on the size of the batch.
Formally, if you consider the first phase as a constant overhead, the complexity of batch operations, with \(i\) being the number of operations inside a batch is:
Note
Because of the first phase, running batches that are smaller than three times the size of the cluster may not yield the expected performance improvement. For example, if you cluster is 10 nodes large, it is recommended to have batches of at least 30 operations.
Used properly, batch operations can turn around performance and enable you to process extremely fast large sets of small operations.