irfpy.util.intermfile
¶
Intermediate file handling.
Code author: Futaana
It handles the intermediate file. Here intermediate file means any data file that stores the results of any calculation or processing.
Example use cases:
You read a hugedata set but needed only a part of it in the next processing. Therefore, the needed part should be saved into a separated file in my local harddisk.
(More will come)
Sample usage:
Using
intermfile()
function:
from irfpy.util.intermfile import intermfile
def my_function(a, b, c=0, d=0):
return (a + b, c * d)
dat = intermfile('datfile.pickle', my_function, args=[5, 10], kwds={'c': 3, 'd': 7})
Using decorator,
interm()
:
from irfpy.util.intermfile import interm
@interm('datafile.pickle')
def my_function(a, b, c=0, d=0):
return (a + b, c * d)
my_function(5, 10, c=3, d=7)
In either cases, the call my_function returns (15, 21) always. The value is obtaine from running my_function() if the ‘datafile.pickle’ is not found, or obained from the intermediate file ‘datafile.pickle’.
Which to choose? Function (option 1) is flexible but less intuitive. Decorator (option 2) is easy but less flexible.
Note
Good point of using
interm()
decorator is the simple syntax for users. The call is trivial. However, the decorator is less flexible, because the meta data (intermediate filename, version, or expiration time) should be defined at compile of function, not runtime.Good point of using
intermfile()
function is flexibility. User can determine the intermediate file name at runtime, indicating that the user can dump multiple intermediate files from the same function (e.g. using different parameters). The weak point is that the syntax is less trivial, and the function’s argumenets or keyword should be provided in a non-standard way.
Note
Developer note:
It is inspired by irfpy.util.filepairv
.
Difference is that
This module is for more generic purpose.
irfpy.util.filepairv
is only for reading and caching file.
Which to recommend to user? As of 2016-11-15, the irfpy.util.filepairv
is more tested and
used so that it is more stable. This module may replace the
irfpy.util.filepairv
module in the future.
- class irfpy.util.intermfile.IntermFileMetaData(version, creation_time)¶
Bases:
tuple
Class to store the intermediate file’s meta data.
- Parameters:
version – The version number
creation_time – Creation time
Create new instance of IntermFileMetaData(version, creation_time)
- creation_time¶
Alias for field number 1
- version¶
Alias for field number 0
- irfpy.util.intermfile.intermfile(intermediate_filename, processing_function, args=None, kwds=None, version=0, refresh=False, expire=inf, compresslevel=9)[source]¶
Process the data, and save to the given file, or read from the file.
- Parameters:
intermediate_filename – Intermediate file name.
processing_function – A function that should be called.
args (None or
list
) – Argument to be given to processing function.kwds (None or
dict
) – Keyword to be given to processing function.version – Version number of the intermediate data file. If the version number in the data file is different from the given version, reprocessed.
refresh – If set to True, reprocessed.
expire – After the given value (in seconds), the intermediate file will be invalid.
compresslevel – Gzip / Bzip2 compression level. It is only valid if the intermediate file name ends with “.gz” or “.bz2”.
- Returns:
The resulting object, either the object read from the
intermediate_filename
or the results ofprocessing_function
The call
from irfpy.util.intermfile import intermfile dat = intermfile('datfile.pickle', my_function)
is equivalent to
import pickle if os.path.exists('datafile.pickle'): with open('datafile.pickle') as fp: dat = pickle.load(fp) else: dat = my_function() with open('datafile.pickle', 'wb') as fp: pickle.dump(dat, fp)
- irfpy.util.intermfile.interm(filename, version=0, expire=inf, compresslevel=9)[source]¶
Decorator version of intermediat file.
- Parameters:
filename – The name of the intermediate file.
version – Specify the version of the program. If the given version is different from the version in the intermediate file (
filename
), the function is re-run.expire – Expiration time in seconds.
compresslevel – For .gz or .bz2, compression level is settled.
The
interm()
is the decoration version ofintermfile()
.Assume you have a function
user_function
, returning some of the data.def user_function(a, b, c=0, d=0): return (a + b, c * d)
This function can be decorated such as
from irfpy.util.intermfile import interm @interm('user_function.dat', version=1, expire=86400) def user_function(a, b, c=0, d=0): return (a + b, c * d)
Then, the
user_function()
return the tuple, but in addition, dump the pickle file of the returned value touser_function.dat
for the first call.> user_function(1, 5, c=10, d=40) # => (6, 400)
% ls ... user_function.dat ...
The second call of the
user_function
then rely on the pickle file.> user_function(1, 5, c=10, d=40) # => (6, 400)
Indeed, this tuple is read from the pickle file.
It means that changing the argument/keyword will not reflect the results as long as
user_function.dat
file exists.> user_function(2, 7, c=1, d=4) # => (6, 400)
You may expect to get
(9, 4)
for this call, however, it is not. It is because, again, the data is read from the intermediate file.