recpack.datasets.MovieLens100K
- class recpack.datasets.MovieLens100K(path: str = 'data', filename: Optional[str] = None, use_default_filters=True)
Handles Movielens 100K dataset.
All information on the dataset can be found at https://grouplens.org/datasets/movielens/100k/. Uses the u.data file to generate an interaction matrix.
Default processing is done as in “Variational autoencoders for collaborative filtering.” Liang, Dawen, et al.:
Ratings above or equal to 4 are interpreted as implicit feedback
Each remaining item has been interacted with by at least 5 users
To use another value as minimal rating to mark interaction as positive, you have to manually set the preprocessing filters.:
from recpack.preprocessing.filters import MinRating, MinItemsPerUser, MinUsersPerItem from recpack.datasets import MovieLens100K d = MovieLens100K(path='path/to/', use_default_filters=False) d.add_filter(MinRating(3, d.RATING_IX, 3)) d.add_filter(MinItemsPerUser(3, d.ITEM_IX, d.USER_IX)) d.add_filter(MinUsersPerItem(5, d.ITEM_IX, d.USER_IX))
- Parameters
path (str, optional) – The path to the data directory. Defaults to data
filename (str, optional) – Name of the file, if no name is provided the dataset default will be used if known.
use_default_filters (bool, optional) – Should a default set of filters be initialised? Defaults to True
Methods
add_filter(_filter[, index])Add a filter to be applied when loading the data.
fetch_dataset([force])Check if dataset is present, if not download
load()Loads data into an InteractionMatrix object.
Attributes
DATASETURLDefault filename that will be used if it is not specified by the user.
ITEM_IXName of the column in the DataFrame that contains item identifiers.
RATING_IXName of the column in the DataFrame that contains the rating a user gave to the item.
REMOTE_FILENAMEName of the file containing user ratings on the MovieLens server.
REMOTE_ZIPNAMEName of the zip-file on the MovieLens server.
TIMESTAMP_IXName of the column in the DataFrame that contains time of interaction in seconds since epoch.
USER_IXName of the column in the DataFrame that contains user identifiers.
The fully classified path to the file from which dataset will be loaded.
- property DEFAULT_FILENAME: str
Default filename that will be used if it is not specified by the user.
- add_filter(_filter: recpack.preprocessing.filters.Filter, index=None)
Add a filter to be applied when loading the data.
If the index is specified, the filter is inserted at the specified index. Otherwise it is appended.
- Parameters
_filter (Filter) – Filter to be applied to the loaded DataFrame processing to interaction matrix.
index (int) – The index to insert the filter at, None will append the filter. Defaults to None
- fetch_dataset(force=False)
Check if dataset is present, if not download
- Parameters
force (bool, optional) – If True, dataset will be downloaded, even if the file already exists. Defaults to False.
- property file_path
The fully classified path to the file from which dataset will be loaded.
- load() recpack.matrix.interaction_matrix.InteractionMatrix
Loads data into an InteractionMatrix object.
Data is loaded into a DataFrame using the _load_dataframe function. Resulting DataFrame is parsed into an InteractionMatrix object. During parsing the filters are applied in order.
- Returns
The resulting InteractionMatrix
- Return type