recpack.datasets.MovieLens10M
- class recpack.datasets.MovieLens10M(path: str = 'data', filename: Optional[str] = None, use_default_filters=True)
- Handles Movielens 10M dataset. - All information on the dataset can be found at https://grouplens.org/datasets/movielens/10m/. Uses the ratings.dat file to generate an interaction matrix. - Default processing is done as in “Variational autoencoders for collaborative filtering.” Liang, Dawen, et al.: - Ratings above or equal to 4 are interpreted as implicit feedback 
- Each remaining item has been interacted with by at least 5 users 
 - To use another value as minimal rating to mark interaction as positive, you have to manually set the preprocessing filters.: - from recpack.preprocessing.filters import MinRating, MinItemsPerUser, MinUsersPerItem from recpack.datasets import MovieLens10M d = MovieLens10M(path='path/to/', use_default_filters=False) d.add_filter(MinRating(3, d.RATING_IX, 3)) d.add_filter(MinItemsPerUser(3, d.ITEM_IX, d.USER_IX)) d.add_filter(MinUsersPerItem(5, d.ITEM_IX, d.USER_IX)) - Parameters
- path (str, optional) – The path to the data directory. Defaults to data 
- filename (str, optional) – Name of the file, if no name is provided the dataset default will be used if known. 
- use_default_filters (bool, optional) – Should a default set of filters be initialised? Defaults to True 
 
 - Methods - add_filter(_filter[, index])- Add a filter to be applied when loading the data. - fetch_dataset([force])- Check if dataset is present, if not download - load()- Loads data into an InteractionMatrix object. - Attributes - DATASETURL- Default filename that will be used if it is not specified by the user. - ITEM_IX- Name of the column in the DataFrame that contains item identifiers. - RATING_IX- Name of the column in the DataFrame that contains the rating a user gave to the item. - REMOTE_FILENAME- Name of the file containing user ratings on the MovieLens server. - REMOTE_ZIPNAME- Name of the zip-file on the MovieLens server. - TIMESTAMP_IX- Name of the column in the DataFrame that contains time of interaction in seconds since epoch. - USER_IX- Name of the column in the DataFrame that contains user identifiers. - The fully classified path to the file from which dataset will be loaded. - property DEFAULT_FILENAME: str
- Default filename that will be used if it is not specified by the user. 
 - add_filter(_filter: recpack.preprocessing.filters.Filter, index=None)
- Add a filter to be applied when loading the data. - If the index is specified, the filter is inserted at the specified index. Otherwise it is appended. - Parameters
- _filter (Filter) – Filter to be applied to the loaded DataFrame processing to interaction matrix. 
- index (int) – The index to insert the filter at, None will append the filter. Defaults to None 
 
 
 - fetch_dataset(force=False)
- Check if dataset is present, if not download - Parameters
- force (bool, optional) – If True, dataset will be downloaded, even if the file already exists. Defaults to False. 
 
 - property file_path
- The fully classified path to the file from which dataset will be loaded. 
 - load() recpack.matrix.interaction_matrix.InteractionMatrix
- Loads data into an InteractionMatrix object. - Data is loaded into a DataFrame using the _load_dataframe function. Resulting DataFrame is parsed into an InteractionMatrix object. During parsing the filters are applied in order. - Returns
- The resulting InteractionMatrix 
- Return type