recpack.datasets.MovieLens25M
- class recpack.datasets.MovieLens25M(path: str = 'data', filename: Optional[str] = None, use_default_filters=True)
Handles Movielens 25M dataset.
All information on the dataset can be found at https://grouplens.org/datasets/movielens/25m/. Uses the ratings.csv file to generate an interaction matrix.
Default processing is done as in “Variational autoencoders for collaborative filtering.” Liang, Dawen, et al.:
Ratings above or equal to 4 are interpreted as implicit feedback
Each remaining item has been interacted with by at least 5 users
You can also manually set the preprocessing filters, e.g.,:
from recpack.preprocessing.filters import MinRating, MinItemsPerUser, MinUsersPerItem from recpack.datasets import MovieLens25M d = MovieLens25M(path='path/to/', use_default_filters=False) d.add_filter(MinRating(3, d.RATING_IX, 3)) d.add_filter(MinItemsPerUser(3, d.ITEM_IX, d.USER_IX)) d.add_filter(MinUsersPerItem(5, d.ITEM_IX, d.USER_IX))
- Parameters
path (str, optional) – The path to the data directory. Defaults to data
filename (str, optional) – Name of the file, if no name is provided the dataset default will be used if known.
use_default_filters (bool, optional) – Should a default set of filters be initialised? Defaults to True
Methods
add_filter
(_filter[, index])Add a filter to be applied when loading the data.
fetch_dataset
([force])Check if dataset is present, if not download
load
()Loads data into an InteractionMatrix object.
Attributes
DATASETURL
Default filename that will be used if it is not specified by the user.
ITEM_IX
Name of the column in the DataFrame that contains item identifiers.
RATING_IX
Name of the column in the DataFrame that contains the rating a user gave to the item.
REMOTE_FILENAME
Name of the file containing user ratings on the MovieLens server.
REMOTE_ZIPNAME
Name of the zip-file on the MovieLens server.
TIMESTAMP_IX
Name of the column in the DataFrame that contains time of interaction in seconds since epoch.
USER_IX
Name of the column in the DataFrame that contains user identifiers.
The fully classified path to the file from which dataset will be loaded.
- property DEFAULT_FILENAME: str
Default filename that will be used if it is not specified by the user.
- add_filter(_filter: recpack.preprocessing.filters.Filter, index=None)
Add a filter to be applied when loading the data.
If the index is specified, the filter is inserted at the specified index. Otherwise it is appended.
- Parameters
_filter (Filter) – Filter to be applied to the loaded DataFrame processing to interaction matrix.
index (int) – The index to insert the filter at, None will append the filter. Defaults to None
- fetch_dataset(force=False)
Check if dataset is present, if not download
- Parameters
force (bool, optional) – If True, dataset will be downloaded, even if the file already exists. Defaults to False.
- property file_path
The fully classified path to the file from which dataset will be loaded.
- load() recpack.matrix.interaction_matrix.InteractionMatrix
Loads data into an InteractionMatrix object.
Data is loaded into a DataFrame using the _load_dataframe function. Resulting DataFrame is parsed into an InteractionMatrix object. During parsing the filters are applied in order.
- Returns
The resulting InteractionMatrix
- Return type