Databoxes
Databoxes
extend the standard dict
class (technically, they are a
subclass), serving as a universal tool for storing and manipulating
unstructured data organized as key-value pairs. The values stored within
Databoxes
can be of any type.
Databoxes
can use any methods implemented for the standard dict
objects, and have additional functionalities for data item manipulation,
batch processing, importing and exporting data, and more.
Categorical list of functions
Creating a new Databox
Function | Description |
---|---|
Databox.empty |
Create an empty Databox |
Databox.from_array |
Create a new Databox from a numpy array |
Databox.from_csv |
Create a new Databox by reading time series from a CSV file |
Databox.from_dict |
Create a new Databox from a dict |
steady |
Create a steady-state Databox for a model |
Copying and converting a Databox
Function | Description |
---|---|
shallow |
Create a shallow copy of the Databox |
to_dict |
Convert a Databox to a plain dictionary |
Acquiring data via third-party APIs
Function | Description |
---|---|
from_fred |
Download time series from FRED (St Louis Fed Database) |
Getting information about a Databox
Function | Description |
---|---|
filter |
Filter items in a Databox |
get_description |
Get the description attached an Iris Pie object |
get_missing_names |
Identify names not present in a Databox |
get_names |
Get all item names from a Databox |
get_series_names_by_frequency |
Retrieve time series names by frequency |
get_span_by_frequency |
Retrieve the date span for time series by frequency |
set_description |
Set the description for an Iris Pie object |
Manipulating a Databox
Function | Description |
---|---|
apply |
Apply a function to items in a Databox |
clip |
Clip the span of time series in a Databox |
copy |
Create a copy of the Databox |
keep |
Keep specified items in a Databox |
minus_control |
Subtract control values from a Databox |
remove |
Remove specified items from a Databox |
rename |
Rename items in a Databox |
Evaluating a Databox
Function | Description |
---|---|
evaluate_expression |
Evaluate an expression within a Databox context |
Manipulating multiple Databoxes
Function | Description |
---|---|
merge |
Merge Databoxes |
overlay |
Overlay another Databox time series onto the ones in the current Databox |
prepend |
Prepend time series data to a Databox |
underlay |
Underlay another Databox time series beneath those in the current Databox |
Importing and exporting a Databox
Function | Description |
---|---|
from_pickle |
Read a Databox from a pickled file |
to_csv |
Write Databox time series to a CSV file |
to_json |
Save a Databox to a JSON file |
to_pickle |
Write Databox to a pickle file |
Directly accessible properties
Property | Description |
---|---|
num_items |
Number of items in the databox |
☐ Databox.empty
Create an empty Databox
Generate a new, empty Databox instance. This class method is useful for initializing a Databox without any pre-existing data.
Databox.empty()
Input arguments
No input arguments are required for this method.
Returns
Databox
Returns a new instance of an empty Databox.
☐ Databox.from_array
Create a new Databox
from a numpy array
Convert a two-dimensional numpy array data into a Databox, with the individual time series created from the rows or columns of the numeric array.
self = Databox.from_array(
array,
names,
*,
descriptions=None,
periods=None,
start=None,
target_db=None,
orientation="vertical",
)
Input arguments
array
A numpy array containing the data to be included in the Databox.
names
A sequence of names corresponding to the series in the array.
descriptions
Descriptions for each series in the array.
periods
An iterable of time periods corresponding to the rows of the array. Used if the data represents time series.
start
The start period for the time series data. Used if 'periods' is not provided.
target_db
An existing Databox to which the array data will be added. If None
, a new
Databox is created.
orientation
The orientation of the array, indicating how time series are arranged:
-
"horizontal"
means each row is a time series; -
"vertical"
means each column is a time series.
Returns
self
Returns a new Databox populated with the data from the numpy array.
☐ Databox.from_csv
Create a new Databox by reading time series from a CSV file
self = Databox.from_csv(
file_name,
*,
period_from_string=None,
start_period_only=False,
description_row=False,
delimiter=",",
csv_reader_settings={},
numpy_reader_settings={},
name_row_transform=None,
)
Input arguments
file_name
Path to the CSV file to be read.
period_from_string
A callable for creating date objects from string representations. If None
,
a default method based on the SDMX string format is used.
start_period_only
If True
, only the start date of each time series is parsed from the CSV;
subsequent periods are inferred based on frequency.
description_row
Indicates if the CSV contains a row for descriptions of the time series.
Defaults to False
.
delimiter
Character used to separate values in the CSV file.
name_row_transform
A function to transform names in the name row of the CSV.
csv_reader_settings
Additional settings for the CSV reader.
numpy_reader_settings
Settings for reading data into numpy arrays.
databox_settings
Settings for the Databox constructor.
Returns
self
An Databox
populated with time series from the CSV file.
☐ Databox.from_dict
Create a new Databox
from a dict
Create a new Databox instance populated with data from a provided dictionary. This class method can be used to convert a standard Python dictionary into a Databox, incorporating all its functionalities.
self = Databox.from_dict(_dict)
Input arguments
_dict
A dictionary containing the data to populate the new Databox. Each key-value pair in the dictionary will be an item in the Databox.
Returns
self
Returns a new Databox populated with the contents of the provided dictionary.
☐ steady
Create a steady-state Databox for a model
Create a Databox with steady-state values for a model, based on the provided model object and the time span. This method generates steady-state time series data for each item in the model. This constructor can be used for models that have well-defined steady state, i.e. Simultaneous models and VectorAutoregression models.
steady_databox = self.steady(
model,
span,
deviation=False,
)
Input arguments
model
The model object for which to generate steady-state time series data.
span
The time span for which to generate steady-state time series data.
deviation
If True
, the steady-state values are generated as deviations from the
steady state in the form depending on the log status of each variable. If
False
, the steady-state values are generated in their original level form.
Returns
steady_databox
A Databox containing steady-state time series for the model
.
☐ apply
Apply a function to items in a Databox
Apply a function to selected Databox items, either in place or by reassigning the results.
self.apply(
func,
source_names=None,
in_place=True,
when_fails="critical",
strict_names=False,
)
Input arguments
func
The function to apply to each selected item in the Databox.
source_names
Names of the items to which the function will be applied. Can be a list of
names, a single name, a callable returning True’ for names to include, or
None` to apply to all items.
in_place
Determines if the results of the function should be assigned back to the
items in-place. If True
, items are updated in-place; if False
, the
results are reassigned to the items.
when_fails
Specifies the action to take if applying the function fails. Options are "critical", "error", "warning", or "silent".
strict_names
If set to True
, strictly adheres to the provided names, raising an error
if any source name is not found in the Databox.
Returns
None
Modifies items in the Databox in-place (note that the in_place
input
argument only applies to the Databox items, and not the Databox itself)
and does not return a value. Errors are handled based on the
`when_fails’ setting.
☐ clip
Clip the span of time series in a Databox
Adjust the time series in a Databox by clipping them to a new specified start and/or end date. This allows for refining the data span within which the series operate, based on given periods.
self.clip(
new_start_date=None,
new_end_date=None,
)
Input arguments
new_start_date
The new start date for clipping the series. If None
, only new_end_date
is considered.
new_end_date
The new end date for clipping the series. If None
, only new_start_date
is considered.
Returns
This method modifies the databox in place and returns None
.
Details
The clip
method adjusts only those time series in the Databox that match the
time frequency of the new_start_date
and/or new_end_date
. All other series
are left unchanged.
☐ copy
Create a copy of the Databox
Produce a deep copy of the Databox, with options to filter and rename items during the duplication process.
new_databox = self.copy(
source_names=None,
target_names=None,
strict_names=False,
)
Input arguments
source_names
Names of the items to include in the copy. Can be a list of names, a single
name, a callable returning True
for names to include, or None
to copy
all items.
target_names
New names for the copied items, corresponding to 'source_names'. Can be a list of names, a single name, or a callable function taking a source name and returning the new target name.
strict_names
If set to `True’, strictly adheres to the provided names, raising an error if any source name is not found in the Databox.
Returns
new_databox
A new Databox instance that is a deep copy of the current one, containing either all items or only those specified.
☐ evaluate_expression
Evaluate an expression within a Databox context
Evaluate a given string expression using the entries in the Databox as
contextual variables. This method first checks if the expression directly
matches an entry name within the Databox; if not, it attempts to evaluate the
expression using Python's eval()
with the current entries as the variable
context.
result = self.evaluate_expression(
expression,
context=None,
)
Shortcut syntax:
result = self(expression, context=None)
Input arguments
expression
The string expression to evaluate. If the expression matches an item name in the Databox, the corresponding item is returned without further evaluation.
context
An optional dictionary providing additional context for evaluation. Can include variables that are not present directly in the Databox.
Returns
result
The result of the evaluated expression, which can be any valid Python data type based on the content of the expression and available context.
☐ filter
Filter items in a Databox
Select Databox items based on custom name or value test functions.
filtered_names = self.filter(
name_test=None,
value_test=None,
)
Input arguments
name_test
A callable function to test each item's name. Returns True
for names that
meet the specified condition.
value_test
A callable function to test each item's value. Returns True
for values that
meet the specified condition.
Returns
filtered_names
A tuple of item names that meet the specified conditions.
☐ from_fred
Download time series from FRED (St Louis Fed Database)
This method downloads time series data from the FRED database. The data is
downloaded using the FRED API. The method requires an API key, which is provided
by the FRED website. The API key is stored in the _API_KEY
variable in the
_fred.py
module. The method downloads the data for the specified series IDs
and returns a Databox
object with the downloaded series.
db = Databox.from_fred(
mapper,
)
Input arguments
mapper
A dictionary or list of series IDs to download from FRED. If a dictionary is
provided, the keys are used as the FRED codes and the values are used for
the names of the time series in the Databox. If list of strings is provided,
the series IDs are used as the names of the series in the Databox
object.
Returns
db
A Databox
object containing the downloaded time series data.
☐ from_pickle
Read a Databox from a pickled file
self = Databox.from_pickle(
file_name,
**kwargs,
)
Input arguments
file_name
Path to the pickled file to be read.
kwargs
Additional keyword arguments to pass to the pickle.load
function.
Returns
self
A Databox
object read from the pickled file.
☐ get_description
Get the description attached an Iris Pie object
description = self.get_description()
Input arguments
self
An Iris Pie object from which to get the description.
Returns
description
The description attached to the Iris Pie object.
☐ get_missing_names
Identify names not present in a Databox
Find and return the names from a provided list that are not present in the Databox. This method is helpful for checking which items are missing or have yet to be added to the Databox.
missing_names = self.get_missing_names(names)
Input arguments
names
An iterable of names to check against the Databox's items.
Returns
missing_names
A tuple of names that are not found in the Databox.
☐ get_names
Get all item names from a Databox
names = self.get_names()
Input arguments
No input arguments are required for this method.
Returns
names
A tuple containing all the names of items in the Databox.
☐ get_series_names_by_frequency
Retrieve time series names by frequency
Obtain a list of time series names that match a specified frequency.
time_series_names = self.get_series_names_by_frequency(frequency)
Input arguments
self
The Databox object from which to retrieve time series names.
frequency
The frequency to filter the time series names by. It should be a valid
frequency from the irispie.Frequency
enumeration.
Returns
time_series_names
A list of time series names in the Databox that match the specified frequency.
☐ get_span_by_frequency
Retrieve the date span for time series by frequency
Get the encompassing date span for all time series with a specified frequency.
date_span = self.get_span_by_frequency(frequency)
Input arguments
self
The Databox object from which to retrieve the date span.
frequency
The frequency for which to determine the date span. Can be an instance of
irispie.Frequency
or a plain integer representing the frequency.
Returns
date_span
The date span, as a Span
object, encompassing all time series in the
Databox that match the specified frequency.
☐ keep
Keep specified items in a Databox
Retain selected items in a Databox, removing all others. Specify the items to
keep using keep_names
, which can be a list of names, a single name, or a
callable function determining which items to retain.
self.keep(
keep_names=None,
strict_names=False,
)
Input arguments
keep_names
The names of the items to be retained in the Databox. Can be a list of names, a single name, or a callable function determining the items to keep.
strict_names
If set to True
, enforces strict adherence to the provided names, with an
error raised for any name not found in the Databox.
Returns
None
Modifies the Databox in-place, keeping only the specified items, and does not return a value.
☐ merge
Merge Databoxes
Combine one or more databoxes into a single databox using a specified merge strategy to handle potential conflicts between duplicate keys.
self.merge(
other,
merge_strategy="stack",
)
Input arguments
other
The databox or iterable of databoxes to merge into the current databox. If merging a single databox, it should be passed directly; for multiple databoxes, pass an iterable containing all.
merge_strategy
Determines how to process keys that exist in more than one databox. The
default strategy is "stack"
.
-
"stack"
: Stack values; this means combine time series into multiple columns, or combine lists, or convert non-lists to lists for stacking. -
"replace"
: Replace existing values with new values. -
"discard"
and"silent"
: Retain original values and ignore new values. -
"warning"
: Behave like"discard"
but issue a warning for each conflict. -
"error"
: Raise an error on encountering the first duplicate key. -
"critical"
: Raise a critical error on encountering the first duplicate key.
Returns
This method modifies the databox in place and returns `None`.
☐ minus_control
Subtract control values from a Databox
Subtract control values (usually steady-state values or control simulation values) from the corresponding time series in the Databox.
self.minus_control(
model,
control_databox,
)
Input arguments
model
The underlying model object based on which the self
and control_databox
were created.
control_databox
The Databox containing control values to subtract from the corresponding
time series in self
.
Returns
This method modifies the Databox in place and returns None
.
☐ overlay
Overlay another Databox time series onto the ones in the current Databox
Overlay another Databox's time series onto the corresponding time series in the
current Databox, aligning and incorporating data series using the overlay
method defined in the Series class. This operation modifies the current Databox
in-place by applying the overlay technique to each individual series that exists
in both Databoxes.
self.overlay(
other,
names=None,
strict_names=False,
)
Input arguments
self
The Databox onto which the overlay will be applied. It contains the original time series data.
other
The Databox that provides the time series to overlay onto self
. Only
series present in both Databoxes will be affected.
names
An optional iterable of names to overlay. If None
, the overlay operation
is attempted on all time series present in both Databoxes.
strict_names
If True
, the names provided in names
are strictly adhered to, and an
error is raised if any name is not found in both Databoxes.
Returns
This method modifies the Databox in place and returns None
.
Details
The overlay
method ensures that corresponding time series in both the source
Databox and the other Databox are merged based on the overlay logic determined
by the Series class.
☐ prepend
Prepend time series data to a Databox
Add time series data from another Databox to the beginning of the current Databox, up to a specified end date.
self.prepend(
other,
end_prepending,
)
Input arguments
self
The Databox to which the time series data will be added.
other
The Databox containing the time series data to prepend to self
.
end_prepending
The end date up to which the time series data from the other
Databox will
be added to self
.
Returns
This method modifies the Databox in place and returns None
.
Details
This method uses the underlay
method to add the time series data from the
other
Databox to the beginning of the self
Databox.
☐ remove
Remove specified items from a Databox
Remove specified items from the Databox based on the provided names or a
filtering function. Items to be removed can be specified as a list of names, a
single name, a callable that returns True
for names to be removed, or None
.
self.remove(
remove_names=None,
*,
strict_names=False,
)
Input arguments
remove_names
Names of the items to be removed from the Databox. Can be a list of names, a
single name, a callable that returns True
for names to be removed, or
None
. If None
, no items are removed.
strict_names
If True
, strictly adheres to the provided names, raising an error if any
name is not found in the Databox.
Returns
Returns None
; self
is modified in place.
☐ rename
Rename items in a Databox
Rename existing items in a Databox by specifying source_names
and
target_names
. The source_names
can be a list of names, a single name, or a
callable function returning True
for names to be renamed. Define target_names
as the new names for these items, either as a corresponding list, a single name,
or a callable function taking a source name and returning the new target name.
self.rename(
source_names=None,
target_names=None,
strict_names=False,
)
Input arguments
source_names
The current names of the items to be renamed. Accepts a list of names, a single name, or a callable that generates new names based on the given ones.
target_names
The new names for the items. Should align with 'source_names'. Can be a list of names, a single name, or a callable function taking each source name and returning the corresponding target name.
strict_names
If set to True
, enforces strict adherence to the provided names, with an
error raised for any source name not found in the Databox.
Returns
Returns None
; self
is modified in place.
☐ set_description
Set the description for an Iris Pie object
self.set_description(
description,
)
Input arguments
self
An Iris Pie object to which to attach the description.
description
The description to attach to the Iris Pie object.
Returns
This method modifies the Iris Pie object in place and returns None
.
☐ shallow
Create a shallow copy of the Databox
Generate a shallow copy of the Databox, with options to filter and rename items during the duplication process. A shallow copy retains the original items and references, but does not copy the items themselves.
shallow_databox = self.shallow(
source_names=None,
target_names=None,
strict_names=False,
)
Input arguments
self
The Databox object to copy.
source_names
Names of the items to include in the copy. Can be a list of names, a single
name, a callable returning True
for names to include, or None
to copy
all items.
target_names
New names for the copied items, corresponding to 'source_names'. Can be a list of names, a single name, or a callable function taking a source name and returning the new target name.
strict_names
If set to True
, strictly adheres to the provided names, raising an error
if any source name is not found in the Databox.
Returns
shallow_databox
A new Databox instance that is a shallow copy of the current one, containing either all items or only those specified.
☐ to_csv
Write Databox time series to a CSV file
self.to_csv(
file_name,
*,
frequency_span=None,
names=None,
description_row=False,
frequency=None,
numeric_format="g",
nan_str="",
delimiter=",",
round=None,
date_formatter=None,
csv_writer_settings={},
when_empty="warning",
)
Input arguments
file_name
Name of the CSV file where the data will be written.
frequency_span
Specifies the frequencies and their corresponding date ranges for exporting
data. If None
, exports data for all available frequencies and their full
date ranges in the databox.
names
A list of series names to export. If None
, exports all series for the
specified frequencies.
description_row
If True
, include a row of series descriptions in the CSV.
frequency
Frequency of the data to export.
numeric_format
The numeric format for data values, e.g., 'g', 'f', etc.
nan_str
String representation for NaN values in the output.
delimiter
Character to separate columns in the CSV file.
round
Number of decimal places to round numeric values.
date_formatter
Function to format date values. If None
, SDMX string formatter is used.
csv_writer_settings
Additional settings for the CSV writer.
when_empty
Behavior when no data is available for export. Can be "error", "warning", or "silent".
Returns
info
A dictionary with details about the export:
names_exported
: Names of the series exported to the CSV file.
☐ to_dict
Convert a Databox to a plain dictionary
Convert a Databox to a standard Python dictionary, with the keys and values retained. This method is useful for converting a Databox to a format that can be used with other Python libraries or functions.
diction = self.to_dict()
Input arguments
self
The Databox object to convert to a dictionary.
Returns
diction
A dictionary containing the items from the Databox.
☐ to_json
Save a Databox to a JSON file
Save a Databox to a JSON file, preserving the structure and data of the Databox object. This method is useful for storing Databoxes in a format that can be easily shared or imported into other applications.
self.to_json(
file_name,
**kwargs,
)
Input arguments
self
The Databox object to save to a JSON file.
file_name
Path to the JSON file where the Databox will be saved.
**kwargs
Additional keyword arguments to pass to the JSON encoder.
Returns
Returns None
; the Databox is saved to the specified JSON file.
☐ to_pickle
Write Databox to a pickle file
self.to_pickle(
file_name,
**kwargs,
)
Input arguments
file_name
Path to the pickle file where the data will be written.
kwargs
Additional keyword arguments for the pickle writer.
Returns
This method returns None
.
☐ underlay
Underlay another Databox time series beneath those in the current Databox
Underlay another Databox's time series beneath the corresponding times series in
the current Databox, aligning and incorporating data series using the underlay
method defined in the Series class. This operation modifies the current Databox
in-place by applying the underlay technique to each individual series that
exists in both Databoxes.
self.underlay(
other,
names=None,
strict_names=False,
)
Input arguments
self
The Databox beneath which the underlay will be applied. It contains the original time series data.
other
The Databox that provides the time series to underlay beneta self
. Only
series present in both Databoxes will be affected.
names
An optional iterable of names to underlay. If None
, the underlay operation
is attempted on all time series present in both Databoxes.
strict_names
If True
, the names provided in names
are strictly adhered to, and an
error is raised if any name is not found in both Databoxes.
Returns
This method modifies the Databox in place and returns None
.
Details
The underlay
method ensures that corresponding time series in both the source
Databox and the other Databox are merged based on the underlay logic determined
by the Series class.