irfpy.mima.rawdata

Official implementation of raw data access.

A new implementation of raw data access using the data center (irfpy.util.datacenter.BaseDataCenter) approach.

Scipy version

Scipy >=1.4.0 is recommended. Prior version has a minor problem loading several specific data files.

For developer

This is almost identical to VEX version. So if one change the contents, it is also recommended to change VEX version.

class irfpy.mima.rawdata.DataCenterCount2d(emulate_full=True)[source]

Bases: irfpy.util.datacenter.BaseDataCenter

Raw count data center for 2D, emulated matrix.

>>> dc = DataCenterCount2d()
>>> import datetime
>>> t0 = datetime.datetime(2007, 3, 25, 5)
>>> t, d = dc.nearest(t0)
>>> print(t)
2007-03-25 04:59:57.417000
>>> print(d)
<<class 'irfpy.imacommon.imascipac.CntMatrix2D'>(MEX/IMA)@2007-03-25T04:59:57.417000:MOD=24 >>24<<:POL=08/08:CNTmax=30>
>>> t1 = datetime.datetime(2007, 3, 25, 6)
>>> tlist, dlist = dc.get_array_strict(t0, t1)
>>> from pprint import pprint
>>> pprint(tlist)    
[datetime.datetime(2007, 3, 25, 5, 0, 9, 417000),
 datetime.datetime(2007, 3, 25, 5, 0, 21, 417000),
 ...
 datetime.datetime(2007, 3, 25, 5, 59, 57, 948180)]
>>> for t, d in dc.iter_wide(t0, t1):
...     print(t)     
2007-03-25 04:59:57.417000
2007-03-25 05:00:09.417000
...
2007-03-25 05:59:57.948180
2007-03-25 06:00:09.948180

Initializer.

Parameters
  • cache_size – Size of the ring cache.

  • name – The name of the

  • copy – Boolean if the returned data is to be deep-copied (True) or reference (False). It is good to return the data after the copy, since then the data is always original. Returning reference is possibly faster, while there are side effect that the post-processing will destroy the original data. Therefore, it is recommended to set True always. The copy value can be overwritten by each method as necessity.

search_files()[source]

Search the data files, returning a list of data file.

This method searches the data files under the base_folder. This method should return a list / tuple of the data file name (usually a full path).

This method is called only once when __init__() was called.

Returns

A list / tuple of the data file. It should be full path (or relative path from the current path), and sorted from earlier data to later data.

read_file(filename)[source]

The file is read, and return the contents as a tuple with size 2, (tlist, dlist).

This method is an abstract method, meaning that the developer of the data center should implement it. See SampleDataCenter for more details.

The implementation of this method should follow:

  • Returned value is a tuple with a size of 2. - The first element is a tuple/list specifying the time (with each element as datetime.datetime object) - The second element is a tuple/list specifying the data, with any format. - The length of both two elements should be the same.

If the given filename is corrupted or empty, a two empty tuple would be returned (i.e., return (), ()). In this case, return None for the exact_starttime() method.

Parameters

filename – File name

Returns

The contents of the data file

Return type

tuple

approximate_starttime(filename)[source]

Start time should be guessed for each file.

A guessed start time should be returned. It is OK if it is very approximate, but the orders of the guessed-start and the exact-start should be identical. This method must be very fast, because it is called for all the files in the data base (i.e. all the files retuned by search_files() method).

A practical suggestion for implementation is to guess the time from the filename.

Parameters

filename – A string, filename.

Returns

An approximate, guessed start time of the file

Return type

datetime.datetime

class irfpy.mima.rawdata.DataCenterCount3d(emulate_full=True)[source]

Bases: irfpy.util.datacenter.BaseDataCenter

Raw count data center for IMA 3D data.

The datacenter approach is used:

>>> dc = DataCenterCount3d()
>>> import datetime
>>> t0 = datetime.datetime(2007, 3, 25, 5)
>>> t, d = dc.nearest(t0)
>>> print(t)
2007-03-25 05:01:33.510740
>>> print(d)
<<class 'irfpy.imacommon.imascipac.CntMatrix'>(MEX/IMA)@2007-03-25T05:01:33.510740:MOD=24 >>24<<:CNTmax=19>
>>> t1 = datetime.datetime(2007, 3, 25, 6)
>>> tlist, dlist = dc.get_array_strict(t0, t1)
>>> from pprint import pprint
>>> pprint(tlist)    
[datetime.datetime(2007, 3, 25, 5, 1, 33, 510740),
 datetime.datetime(2007, 3, 25, 5, 4, 45, 479500),
...
 datetime.datetime(2007, 3, 25, 5, 55, 57, 979440),
 datetime.datetime(2007, 3, 25, 5, 59, 9, 948180)]
>>> for t, d in dc.iter_wide(t0, t1):
...     print(t)     
2007-03-25 04:58:21.417000
2007-03-25 05:01:33.510740
...
2007-03-25 05:59:09.948180
2007-03-25 06:02:22.010680

Initializer.

Parameters
  • cache_size – Size of the ring cache.

  • name – The name of the

  • copy – Boolean if the returned data is to be deep-copied (True) or reference (False). It is good to return the data after the copy, since then the data is always original. Returning reference is possibly faster, while there are side effect that the post-processing will destroy the original data. Therefore, it is recommended to set True always. The copy value can be overwritten by each method as necessity.

search_files()[source]

Search the data files, returning a list of data file.

This method searches the data files under the base_folder. This method should return a list / tuple of the data file name (usually a full path).

This method is called only once when __init__() was called.

Returns

A list / tuple of the data file. It should be full path (or relative path from the current path), and sorted from earlier data to later data.

read_file(filename)[source]

The file is read, and return the contents as a tuple with size 2, (tlist, dlist).

This method is an abstract method, meaning that the developer of the data center should implement it. See SampleDataCenter for more details.

The implementation of this method should follow:

  • Returned value is a tuple with a size of 2. - The first element is a tuple/list specifying the time (with each element as datetime.datetime object) - The second element is a tuple/list specifying the data, with any format. - The length of both two elements should be the same.

If the given filename is corrupted or empty, a two empty tuple would be returned (i.e., return (), ()). In this case, return None for the exact_starttime() method.

Parameters

filename – File name

Returns

The contents of the data file

Return type

tuple

approximate_starttime(filename)[source]

Start time should be guessed for each file.

A guessed start time should be returned. It is OK if it is very approximate, but the orders of the guessed-start and the exact-start should be identical. This method must be very fast, because it is called for all the files in the data base (i.e. all the files retuned by search_files() method).

A practical suggestion for implementation is to guess the time from the filename.

Parameters

filename – A string, filename.

Returns

An approximate, guessed start time of the file

Return type

datetime.datetime

irfpy.mima.rawdata.read_mat_file(filename, emulate_full=True)[source]
irfpy.mima.rawdata.read_mat_file_3d(filename, emulate_full=True)[source]

Return the 3D data series.

Parameters

filename – The matlab file name

Returns

(tlist, dlist), where tlist is the observation time and dlist is for the data.