Advanced Configuration¶
Authentication¶
Single user / interactive¶
The default and recommended method for authentication for users is using
Authenticator
. You will be guided to your organizations login
webpage, and login as usual. (We will not see or store your credentials!). Once
authenticated, you can choose to re-use your (valid) access token (i.e. not be
prompted to authenticate next time) or re-authenticate everytime:
import datareservoirio as drio
# Re-use (valid) access token from last sesssion
auth = drio.Authenticator()
# or re-authenticate
auth = drio.Authenticator(auth_force=True)
Caution
Users on shared computers should always re-authenticate since access token from a different user may unintentionally be used.
If you desire to have multiple seperate session, it is advisable to set a session key during authetication. This will keep the sessions (token cache) seperate:
auth_0 = drio.Authenticator(session_key="my_unique_session_0")
auth_1 = drio.Authenticator(session_key="my_unique_session_1")
Service account / non-interactive client¶
If you require client/backend type of authentication flow where user interaction
is not feasible nor desired, you can use the
authenticate.ClientAuthenticator
:
import datareservoirio as drio
auth = drio.authenticate.ClientAuthenticator("my_client_id", "my_client_secret")
Contact us and we will provide you the specifics.
Caching¶
The Client
class employs a disk cache to speed up repeating series
downloads. Beside turning the cache on and off, several aspects of it can be
configured during instantiation. The configuration are passed on as a
dictionary:
format
: format used to store series on disk, either ‘parquet’ or ‘csv’. Default is ‘parquet’.max_size
: size in megabytes that the cache is allowed to use. Default is 1024MB.cache_root
: control the cache storage location. Default locations are:Windows:
%LOCALAPPDATA%\\datareservoirio\\Cache
Linux:
~/.cache/datareservoirio
(XDG default)MacOs:
~/Library/Caches/datareservoirio
Example:
import datareservoirio as drio
auth = drio.Authenticator()
# Initiate a client with 32GB cache in the 'c:\project\drio_cache'
client = drio.Client(auth, cache=True,
cache_opt={'format': 'parquet', 'max_size': 32*1024,
'cache_root': r'c:\project\drio_cache'})
The cache has near disk-bound performance and will benefit greatly from fast low-latency solid state drives.
Warning
The cache is “cleaned up” during instantiation of Client
. If
it is instantiated with defaults cache options, it will potentially delete
the larger cache set up by another instance! Caution is adviced!
Note
If you are working with several “larger” projects at once, it may be a good idea to configure dedicated cache locations for each project.
Logging¶
To simplify debugging, enable logging for the logger named ‘datareservoirio’.
import logging
# Basic configuration of the root logger, including 'datareservoirio'
logging.basicConfig(format='%(asctime)s %(name)-20s %(levelname)-5s %(message)s', level=logging.INFO)
import logging
import datareservoirio
# Configure desired log level specifically for 'datareservoirio'
logger = logging.getLogger('datareservoirio')
logger.setLevel(logging.DEBUG)
# Short-hand for the above
datareservoirio.set_log_level(logging.DEBUG)
import logging
# Advanced configuration allowing control of log level, message format and output handler
logger = logging.getLogger('datareservoirio')
handler = logging.StreamHandler()
formatter = logging.Formatter('%(asctime)s %(name)-20s %(levelname)-5s %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(logging.DEBUG)
The following log names can be used to fine-tune the desired log output:
datareservoirio: top level module including configuration, authentication and client
datareservoirio.storage: storage module, including cache and data download
datareservoirio.rest_api: API module with logging of request parameters and responses
If you require even more detailed logging, consider using loggers from
requests
, oauthlib
, requests-oauthlib
and azure-storage-blob