Basic Usage

The datareservoirio library provide access to DataReservoir.io, and is a native Python experience. The following features are supported through the Python API:

  • Manage series (time series or sequences):

    • Access existing series

    • Create and upload new series

    • Edit and append to existing series

    • Delete series

  • Manage metadata

    • Set metadata

    • Edit existing metadata

    • Delete metadata

  • Browse and search for series based on metadata

All of the above functionality are exposed as high-level methods in the Client class. The general workflow can be summarized as:

  1. Authenticate

  2. Instantiate a new client (using Client)

  3. Go… Data is waiting for you!

Examples

Manage series

Store series data

# Create and store a simple time series

import datareservoirio as drio
import numpy as np
import pandas as pd

auth = drio.Authenticator()
# Follow instructions to authenticate

client = drio.Client(auth)

dt_index = pd.date_range('2018-01-01 00:00:00', periods=10, freq='6H')
series = pd.Series(np.random.rand(10), index=dt_index)

response = client.create(series)

If the request was successful, a Python dictionary containing essential information is returned:

{
    'FileId': '2465e7c8-7a5e-4602-bb3b-a5a01382aa1f',
    'TimeSeriesId': '8050a49c-8b61-448d-bdbb-51248a23dbd9',
    'TimeOfFirstSample': 1514764800000000000,
    'TimeOfLastSample': 1514959200000000000
}

Important

TimeSeriesId is the unique identifier (guid) assigned to the series. It is recommended that you add some metadata to the series so that it is easier to find at a later time, or at least store the TimeSeriesId for later reference.

Important

DataReservoir.io works with UTC-time. All datetime-like objects are converted to UTC and therefore, time zone information is lost when data is stored in DataReservoir.io.

You can also store a sequence of data. However, you are required to define an integer index. (This is useful when appending and updating the data later.)

Store sequence:

# Create and store a simple sequence
series = pd.Series(np.random.rand(10), index=np.arange(10))
response = client.create(series)

Edit and append data

You can always append new data to an existing time series (and sequence). However, any overlappinging indecies will result in overwrite/edit of existing data:

dt_index = pd.date_range('2018-01-02 00:00:00', periods=10, freq='6H')
series = pd.Series(np.random.rand(10), index=dt_index)

series_id = response['TimeSeriesId']
response = client.append(series, series_id)

Data verification process

Data that have been uploaded to DataReservoir.io will always go through a validation process before it is made part of the series. By default, Client.create() and Client.append() will wait for this validation process to complete successfully before appending the data to the timeseres. This behavior can be changed using the wait_on_verification parameter:

response = client.create(series, wait_on_verification=False)

response = client.append(series, series_id, wait_on_verification=False)

The result is that the data is queued for processing and the method returns immediately. When the validation process eventually completes, the data will be made available on the series.

Important

Setting wait_on_verification to False is significantly faster, but is only recommended when the data is “validated” in advance. If the data should not pass the server-side validation the data will be ignored.

Access existing data

You can access any data you have TimeSeriesId (and authorization) for:

# Get entire timeseries
timeseries = client.get(series_id)

# Get a slice of time series
timeseries = client.get(series_id, start='2018-01-01 12:00:00',
                        end='2018-01-02 06:00:00')

# Get a sequence
sequence = client.get(series_id, convert_date=False)

Note

Client.get() returns pandas.Series.

Delete data

Note that deleting data is permanent and all references to TimeSerieId is removed from the DataReservoir.io inventory:

client.delete(series_id)

Manage metadata

Similar to series, you can add, update, and delete metadata. In addition, you can assign a metadata entry to one or more series’.

Create metadata entries

# Create a metadata entry
response = client.metadata_set('foo.bar', 'baz',
                               vendor='Sensor Corp', type='Voltmeter')

Update/edit metadata entries

# Update/edit a metadata entry
response = client.metadata_set('foo.bar', 'baz',
                               vendor='Sensor Corp', type='Ampermeter')

Get metadata entries

# Get a metedata entry based on namespace and key
metadata = client.metadata_get(namespace='foo.bar', key='baz')

# or directly by id
metadata = client.metadata_get(metadata_id=metadata_id)

Delete metadata entries

# Delete metadata
client.metadata_delete(metadata_id)

Set metadata on series

You can assign metadata on series from existing metadata entries or just create a new one:

# Existing meteadata
client.set_metadata(series_id, metadata_id=metadata_id)

# Or create a new one during assignment
client.set_metadata(series_id, namespace='foo.bar', key='baz',
                    vendor='Sensor Corp', type='Gyroscope')

Do’s and don’ts

Data size vs. memory available

When dealing with high-frequent data and/or long time spans, you should keep the memory usage in mind. Having all the data in memory at the same time could cause problems and make your script fail.

This example shows you how you can download 6 months of timeseries data, and calculate the 1-hour standard deviation. In the Advanced Configuration section you can see how to enable and configure caching. Caching allows you to speed up repeating series downloads.

Use for loops to download data in chunks

It is recommended to download data in smaller chunks (such as one day, or one hour chunks).

# Make a date iterator
start_end = pd.date_range(start="2020-01-01 00:00", end="2020-02-01 00:00", freq="1H")
start_end_iter = zip(start_end[:-1], start_end[1:])

series_id = <your time series ID>


# Get timeseries in chunks
for start, end in start_end_iter:
    timeseries = client.get(series_id, start=start, end=end)

Resample data

It could be useful to resample the data. This is easily done taking advantage of Pandas capabilities:

# Resample using 1-minute mean
timeseries_resampled_mean = timeseries.resample("1min").agg(np.mean)

# Or, get the 1-minute standard deviation
timeseries_resampled_std = timeseries.resample("1min").agg(np.std)