irfpy.util.intermfile

Intermediate file handling.

Code author: Futaana

It handles the intermediate file. Here intermediate file means any data file that stores the results of any calculation or processing.

Example use cases:

  • You read a hugedata set but needed only a part of it in the next processing. Therefore, the needed part should be saved into a separated file in my local harddisk.

  • (More will come)

Sample usage:

  1. Using intermfile() function:

from irfpy.util.intermfile import intermfile

def my_function(a, b, c=0, d=0):
    return (a + b, c * d)

dat = intermfile('datfile.pickle', my_function, args=[5, 10], kwds={'c': 3, 'd': 7})
  1. Using decorator, interm():

from irfpy.util.intermfile import interm

@interm('datafile.pickle')
def my_function(a, b, c=0, d=0):
    return (a + b, c * d)

my_function(5, 10, c=3, d=7)

In either cases, the call my_function returns (15, 21) always. The value is obtaine from running my_function() if the ‘datafile.pickle’ is not found, or obained from the intermediate file ‘datafile.pickle’.

Which to choose? Function (option 1) is flexible but less intuitive. Decorator (option 2) is easy but less flexible.

Note

  • Good point of using interm() decorator is the simple syntax for users. The call is trivial. However, the decorator is less flexible, because the meta data (intermediate filename, version, or expiration time) should be defined at compile of function, not runtime.

  • Good point of using intermfile() function is flexibility. User can determine the intermediate file name at runtime, indicating that the user can dump multiple intermediate files from the same function (e.g. using different parameters). The weak point is that the syntax is less trivial, and the function’s argumenets or keyword should be provided in a non-standard way.

Note

Developer note:

It is inspired by irfpy.util.filepairv.

Difference is that

  • This module is for more generic purpose.

  • irfpy.util.filepairv is only for reading and caching file.

Which to recommend to user? As of 2016-11-15, the irfpy.util.filepairv is more tested and used so that it is more stable. This module may replace the irfpy.util.filepairv module in the future.

class irfpy.util.intermfile.IntermFileMetaData(version, creation_time)

Bases: tuple

Class to store the intermediate file’s meta data.

Parameters:
  • version – The version number

  • creation_time – Creation time

Create new instance of IntermFileMetaData(version, creation_time)

creation_time

Alias for field number 1

version

Alias for field number 0

irfpy.util.intermfile.intermfile(intermediate_filename, processing_function, args=None, kwds=None, version=0, refresh=False, expire=inf, compresslevel=9)[source]

Process the data, and save to the given file, or read from the file.

Parameters:
  • intermediate_filename – Intermediate file name.

  • processing_function – A function that should be called.

  • args (None or list) – Argument to be given to processing function.

  • kwds (None or dict) – Keyword to be given to processing function.

  • version – Version number of the intermediate data file. If the version number in the data file is different from the given version, reprocessed.

  • refresh – If set to True, reprocessed.

  • expire – After the given value (in seconds), the intermediate file will be invalid.

  • compresslevel – Gzip / Bzip2 compression level. It is only valid if the intermediate file name ends with “.gz” or “.bz2”.

Returns:

The resulting object, either the object read from the intermediate_filename or the results of processing_function

The call

from irfpy.util.intermfile import intermfile
dat = intermfile('datfile.pickle', my_function)

is equivalent to

import pickle

if os.path.exists('datafile.pickle'):
    with open('datafile.pickle') as fp:
        dat = pickle.load(fp)
else:
    dat = my_function()
    with open('datafile.pickle', 'wb') as fp:
        pickle.dump(dat, fp)
irfpy.util.intermfile.interm(filename, version=0, expire=inf, compresslevel=9)[source]

Decorator version of intermediat file.

Parameters:
  • filename – The name of the intermediate file.

  • version – Specify the version of the program. If the given version is different from the version in the intermediate file (filename), the function is re-run.

  • expire – Expiration time in seconds.

  • compresslevel – For .gz or .bz2, compression level is settled.

The interm() is the decoration version of intermfile().

Assume you have a function user_function, returning some of the data.

def user_function(a, b, c=0, d=0):
    return (a + b, c * d)

This function can be decorated such as

from irfpy.util.intermfile import interm
@interm('user_function.dat', version=1, expire=86400)
def user_function(a, b, c=0, d=0):
    return (a + b, c * d)

Then, the user_function() return the tuple, but in addition, dump the pickle file of the returned value to user_function.dat for the first call.

> user_function(1, 5, c=10, d=40)
# => (6, 400)
% ls
...   user_function.dat  ...

The second call of the user_function then rely on the pickle file.

> user_function(1, 5, c=10, d=40)
# => (6, 400)

Indeed, this tuple is read from the pickle file.

It means that changing the argument/keyword will not reflect the results as long as user_function.dat file exists.

> user_function(2, 7, c=1, d=4)
# => (6, 400)

You may expect to get (9, 4) for this call, however, it is not. It is because, again, the data is read from the intermediate file.