recpack.preprocessing.filters.Deduplicate
- class recpack.preprocessing.filters.Deduplicate(item_ix: str, user_ix: str, timestamp_ix: Optional[str] = None)
Deduplicate entries with the same user and item.
Removes all but one of each user-item pair in the DataFrame. If timestamps are available, the first interaction is kept.
- Parameters
item_ix (str) – Name of the column in which item identifiers are listed.
user_ix (str) – Name of the column in which user identifiers are listed.
timestamp_ix (str, optional) – Name of the column in which timestamps are listed, defaults to None
Methods
apply
(df)Apply Filter to the DataFrame passed.
apply_all
(*dfs)Apply the filter to a list of pandas DataFrames.
- apply(df)
Apply Filter to the DataFrame passed.
- Parameters
df (pd.DataFrame) – DataFrame to filter
- apply_all(*dfs: pandas.core.frame.DataFrame) List[pandas.core.frame.DataFrame]
Apply the filter to a list of pandas DataFrames.
The filter is applied independently to each of the DataFrames.
- Returns
The list of processed DataFrames
- Return type
List[pd.DataFrame]