recpack.datasets.DummyDataset
- class recpack.datasets.DummyDataset(path: str = 'data', filename: Optional[str] = None, use_default_filters=True, seed=None, num_users=100, num_items=20, num_interactions=500, min_t=0, max_t=500)
Small randomly generated dummy dataset that allows testing of pipelines and other components without needing to load a full scale dataset.
- Parameters
path (str, optional) – The path to the data directory. UNUSED because dataset is generated and not read from file. Defaults to data
filename (str, optional) – UNUSED because dataset is generated and not read from file.
use_default_filters (bool, optional) – Should a default set of filters be initialised? Defaults to True
seed (int, optional) – Seed for the random data generation. Defaults to None.
num_users (int, optional) – The amount of users to use when generating data, defaults to 100
num_items (int, optional) – The number of items to use when generating data, defaults to 20
num_interactions (int, optional) – The number of interactions to generate, defaults to 500
min_t (int, optional) – The minimum timestamp when generating data, defaults to 0
max_t (int, optional) – The maximum timestamp when generating data, defaults to 500
Methods
add_filter
(_filter[, index])Add a filter to be applied when loading the data.
fetch_dataset
([force])Check if dataset is present, if not download
load
()Loads data into an InteractionMatrix object.
Attributes
DEFAULT_FILENAME
Default filename that will be used if it is not specified by the user.
Name of the column in the DataFrame that contains item identifiers.
Name of the column in the DataFrame that contains time of interaction in seconds since epoch.
Name of the column in the DataFrame that contains user identifiers.
The fully classified path to the file from which dataset will be loaded.
- ITEM_IX = 'item_id'
Name of the column in the DataFrame that contains item identifiers.
- TIMESTAMP_IX = 'timestamp'
Name of the column in the DataFrame that contains time of interaction in seconds since epoch.
- USER_IX = 'user_id'
Name of the column in the DataFrame that contains user identifiers.
- add_filter(_filter: recpack.preprocessing.filters.Filter, index=None)
Add a filter to be applied when loading the data.
If the index is specified, the filter is inserted at the specified index. Otherwise it is appended.
- Parameters
_filter (Filter) – Filter to be applied to the loaded DataFrame processing to interaction matrix.
index (int) – The index to insert the filter at, None will append the filter. Defaults to None
- fetch_dataset(force=False)
Check if dataset is present, if not download
- Parameters
force (bool, optional) – If True, dataset will be downloaded, even if the file already exists. Defaults to False.
- property file_path
The fully classified path to the file from which dataset will be loaded.
- load() recpack.matrix.interaction_matrix.InteractionMatrix
Loads data into an InteractionMatrix object.
Data is loaded into a DataFrame using the _load_dataframe function. Resulting DataFrame is parsed into an InteractionMatrix object. During parsing the filters are applied in order.
- Returns
The resulting InteractionMatrix
- Return type