recpack.preprocessing.filters.Deduplicate

class recpack.preprocessing.filters.Deduplicate(item_ix: str, user_ix: str, timestamp_ix: Optional[str] = None)

Deduplicate entries with the same user and item.

Removes all but one of each user-item pair in the DataFrame. If timestamps are available, the first interaction is kept.

Parameters
  • item_ix (str) – Name of the column in which item identifiers are listed.

  • user_ix (str) – Name of the column in which user identifiers are listed.

  • timestamp_ix (str, optional) – Name of the column in which timestamps are listed, defaults to None

Methods

apply(df)

Apply Filter to the DataFrame passed.

apply_all(*dfs)

Apply the filter to a list of pandas DataFrames.

apply(df)

Apply Filter to the DataFrame passed.

Parameters

df (pd.DataFrame) – DataFrame to filter

apply_all(*dfs: pandas.core.frame.DataFrame) List[pandas.core.frame.DataFrame]

Apply the filter to a list of pandas DataFrames.

The filter is applied independently to each of the DataFrames.

Returns

The list of processed DataFrames

Return type

List[pd.DataFrame]