Cookbook¶

There are many complex yet common use cases for datareservoirio. We have collected some of them in this section. If you have suggestions on what more we can add to this section, please let us know!

Visualize data¶

It is really easy to visualize data with Matplotlib:

import datareservoirio as drio
import matplotlib.pyplot as plt


auth = drio.Authenticator()
client = drio.Client(auth)

data = client.get(series_id, start='2018-02-14', end='2018-02-17')

plt.figure()
plt.plot(data)

Save data to file¶

Sometimes you may want to dump data to file (Don’t worry, we won’t judge you):

import datareservoirio as drio


auth = drio.Authenticator()
client = drio.Client(auth)

data = client.get(series_id, start='2018-02-14', end='2018-02-17')
data.to_csv('path')

Note

Data is dumped to file using the built-in Pandas functionality. Thus, you can choose many different file-formats where CSV is just one of them.

Work with higher dimensional data¶

Let’s see how you can upload and store a higher dimensional dataset:

import datareservoirio as drio


auth = drio.Authenticator()
client = drio.Client(auth)

data_dict = {
    'x': np.random.rand(10),
    'y': np.random.rand(10),
    'z': np.random.rand(10),
}

df = pd.DataFrame(data_dict, index = np.arange(10))

series_ids = {}
for name, col in df.iteritems():
    response = client.create(series=col)
    series_ids[name] = response['TimeSeriesId']

Now it will be possible to reconstruct the original dataframe since we have all the TimeSeriesId s:

data_dict = {
    name: client.get(series_id, convert_date=False)
    for name, series_id in series_ids.items()
    }

df = pd.DataFrame(data_dict)

Work with large amount of data¶

When working with large data sizes (long time spans and/or high sampling frequency), it is often useful to download data in chunks and resample so that you don’t have all the data in memory at the same time. Let’s see how you can download 6 months of data and get the 1-hour standard deviation:

import numpy as np
import datareservoirio as drio


auth = drio.Authenticator()
client = drio.Client(auth)

start_end = pd.date_range(start="2020-01-01 00:00", end="2020-06-01 00:00", freq="1D")
start_end_iter = zip(start_end[:-1], start_end[1:])

series_id = <your time series ID>


result = pd.Series()
for start, end in start_end_iter:
    timeseries = client.get(series_id, start=start, end=end)

    result = pd.concat([result, timeseries.resample("1H").agg(np.std)])