How to load tree-structured data

Tree structured dataset is sometimes annoying for the developer. The module irfpy.util.timeseriesdb helps a bit.

Obtain the file which includes the specific time

Consider the dataset as follows

rootfolder -+- orb0001  # Data from 2004-01-05 15:30:00
            +- orb0002  # Data from 2004-01-05 18:30:00
            +- orb0003  # Data from 2004-01-06 00:30:00
            +- orb0004  # Data from 2004-01-06 09:30:00
            +- orb0011  # Data from 2004-01-12 11:30:00
            +- orb0012  # Data from 2004-01-12 21:30:00

In this case, you can prepare the data base of this tree structure as follows.

import datetime
import irfpy.util.timeseriesdb
db = irfpy.util.timeseriesdb.DB()
db.append('rootfolder/orb0001', datetime.datetime(2004, 1, 5, 15, 30))
db.append('rootfolder/orb0002', datetime.datetime(2004, 1, 5, 18, 30))
db.append('rootfolder/orb0003', datetime.datetime(2004, 1, 6, 0, 30))
db.append('rootfolder/orb0004', datetime.datetime(2004, 1, 6, 9, 30))
db.append('rootfolder/orb0011', datetime.datetime(2004, 1, 12, 11, 30))
db.append('rootfolder/orb0012', datetime.datetime(2004, 1, 12, 21, 30))

Then, db includes the information on the file name and the start time of each file. The database considers that each file starts with the time specified, and ends the next file starts. For example, the orb0003 file is assumed to cover from 2004-01-06T00:30 to 09:30. The last file (orb0012) is considered to cover from 2004-01-12T21:30 until forever.

If you want to know which file provides the data at 2004-01-05T19:00:00. It is obviously orb0002. Yes. You may get by get() method.

>>> print(db.get(datetime.datetime(2004, 1, 5, 19, 0, 0)))
rootfolder/orb0002

Obtain the file list that covers between t0 and t1

Let’s consider t0=2004-01-06T00:00:00 and t1=2004-01-12T12:00:00. From the data structure above, files orb0002, 0003, 0004, 0011 will be concerned.

>>> t0 = datetime.datetime(2004, 1, 6)
>>> t1 = datetime.datetime(2004, 1, 12, 12)
>>> file0 = db.get(t0)
>>> while db.gettime(file0) <= t1:
...     print(file0)
...     try:
...         file0 = db.nextof(file0)
...     except irfpy.util.timeseriesdb.DataNotInDbError as e:
...         break
rootfolder/orb0002
rootfolder/orb0003
rootfolder/orb0004
rootfolder/orb0011