recpack.matrix.InteractionMatrix
- class recpack.matrix.InteractionMatrix(df: pandas.core.frame.DataFrame, item_ix: str, user_ix: str, timestamp_ix: Optional[str] = None, shape: Optional[Tuple[int, int]] = None)
An InteractionMatrix contains interactions between users and items at a certain time.
It provides a number of properties and methods for easy manipulation of this interaction data.
Note
The InteractionMatrix does not assume binary user-item pairs. If a user interacts with an item more than once, there will be two entries for this user-item pair.
- Parameters
df (pd.DataFrame) – Dataframe containing user-item interactions. Must contain at least item ids and user ids.
item_ix (string) – Item ids column name.
user_ix (string) – User ids column name.
timestamp_ix (string, optional) – Interaction timestamps column name.
shape (Tuple[int, int], optional) – The desired shape of the matrix, i.e. the number of users and items. If no shape is specified, the number of users will be equal to the maximum user id plus one, the number of items to the maximum item id plus one.
Methods
copy
()Create a deep copy of this InteractionMatrix.
eliminate_timestamps
([inplace])Remove all timestamp information.
Create an InteractionMatrix from a csr_matrix containing interactions.
get_timestamp
(interaction_id)Return the timestamp of a specific interaction by interaction ID.
indices_in
(u_i_lists[, inplace])Select interactions between the specified user-item combinations.
interactions_in
(interaction_ids[, inplace])Select the interactions by their interaction ids
load
(file_prefix)Create a new interaction matrix instance from saved file.
nonzero
()save
(file_prefix)Save the interaction matrix to files.
timestamps_gt
(timestamp[, inplace])Select interactions after a given timestamp.
timestamps_gte
(timestamp[, inplace])Select interactions after and including a given timestamp.
timestamps_lt
(timestamp[, inplace])Select interactions up to a given timestamp.
timestamps_lte
(timestamp[, inplace])Select interactions up to and including a given timestamp.
union
(im)Combine events from this InteractionMatrix with another.
users_in
(U[, inplace])Keep only interactions by one of the specified users.
Attributes
INTERACTION_IX
ITEM_IX
TIMESTAMP_IX
USER_IX
The set of all users with at least one interaction.
The unique items interacted with, per user.
All user-item interactions as a sparse, binary matrix of size (users, items).
The density of the interaction matrix.
Boolean indicating whether instance has timestamp information.
Returns a tuple of lists of user IDs and item IDs corresponding to interactions.
The interactions per user
A sparse matrix with the last timestamp for each user, item pair.
The number of users with at least one interaction.
The total number of interactions.
properties
The interaction IDs per user, sorted by timestamp (ascending).
The items the user interacted with for every user sorted by timestamp (ascending).
Timestamps of interactions as a pandas Series, indexed by user ID and item ID.
All user-item interactions as a sparse matrix of size
(|users|, |items|)
.- class InteractionMatrixProperties(num_users: int, num_items: int, has_timestamps: bool)
- property active_users: Set[int]
The set of all users with at least one interaction.
- Returns
Set of user IDs with at least one interaction.
- Return type
Set[int]
- property binary_item_history: Iterator[Tuple[int, List[int]]]
The unique items interacted with, per user.
- Yield
Tuples of user ID, list of distinct item IDs the user interacted with.
- Return type
List[Tuple[int, List[int]]]
- property binary_values: scipy.sparse._csr.csr_matrix
All user-item interactions as a sparse, binary matrix of size (users, items).
An entry is 1 if there is at least one interaction between that user and item. In all other cases the entry is 0.
- Returns
Binary csr_matrix of interactions.
- Return type
csr_matrix
- copy() recpack.matrix.interaction_matrix.InteractionMatrix
Create a deep copy of this InteractionMatrix.
- Returns
Deep copy of this InteractionMatrix.
- Return type
- property density: float
The density of the interaction matrix.
The density is computed as the fraction of user item pairs that have an interaction
- Returns
The density.
- Return type
float
- eliminate_timestamps(inplace: bool = False) Optional[recpack.matrix.interaction_matrix.InteractionMatrix]
Remove all timestamp information.
- Parameters
inplace (bool) – Modify the data matrix in place. If False, returns a new object.
- classmethod from_csr_matrix(X: scipy.sparse._csr.csr_matrix) recpack.matrix.interaction_matrix.InteractionMatrix
Create an InteractionMatrix from a csr_matrix containing interactions.
Warning
No timestamps can be passed this way!
- Returns
InteractionMatrix constructed from the csr_matrix.
- Return type
- get_timestamp(interaction_id: int) int
Return the timestamp of a specific interaction by interaction ID.
- Parameters
interaction_id (int) – the interaction ID in the DataFrame to fetch the timestamp of.
- Raises
AttributeError – Raised if the object does not have timestamps.
- Returns
The timestamp of the interaction.
- Return type
int
- property has_timestamps: bool
Boolean indicating whether instance has timestamp information.
- Returns
True if timestamps information is available, False otherwise.
- Return type
bool
- property indices: Tuple[List[int], List[int]]
Returns a tuple of lists of user IDs and item IDs corresponding to interactions.
- Returns
Tuple of lists of user IDs and item IDs that correspond to at least one interaction.
- Return type
Tuple[List[int], List[int]]
- indices_in(u_i_lists: Tuple[List[int], List[int]], inplace=False) Optional[recpack.matrix.interaction_matrix.InteractionMatrix]
Select interactions between the specified user-item combinations.
- Parameters
u_i_lists (Tuple[List[int], List[int]]) – Two lists as a tuple, the first list are the indices of users, and the second are indices of items, both should be of the same length.
inplace (bool, optional) – Apply the selection in place to the object, defaults to False
- Returns
None if inplace is True, otherwise a new InteractionMatrix object with the selection of events.
- Return type
Union[InteractionMatrix, None]
- property interaction_history: Iterator[Tuple[int, List[int]]]
The interactions per user
- Yield
Tuples of user ID, list of interaction IDs.
- Return type
List[Tuple[int, List[int]]]
- interactions_in(interaction_ids: List[int], inplace: bool = False) Optional[recpack.matrix.interaction_matrix.InteractionMatrix]
Select the interactions by their interaction ids
- Parameters
interaction_ids (List[int]) – A list of interaction ids
inplace (bool, optional) – Apply the selection in place, or return a new InteractionMatrix object, defaults to False
- Returns
None if inplace, otherwise new InteractionMatrix object with the selected interactions
- Return type
Union[None, InteractionMatrix]
- property last_timestamps_matrix: scipy.sparse._csr.csr_matrix
A sparse matrix with the last timestamp for each user, item pair.
By using the maximal timestamp for each pair, we make it possible to use non deduplicated datasets.
- classmethod load(file_prefix) recpack.matrix.interaction_matrix.InteractionMatrix
Create a new interaction matrix instance from saved file.
- Parameters
file_prefix (str) – The prefix of the files to load, should end in the filename, but without extension (no .csv or such).
- Returns
InteractionMatrix created from file.
- Return type
- property num_active_users: int
The number of users with at least one interaction.
- Returns
Number of active users.
- Return type
int
- property num_interactions: int
The total number of interactions.
- Returns
Total interaction count.
- Return type
int
- save(file_prefix: str) None
Save the interaction matrix to files.
Creates two files one at
<file_prefix>.csv
with the raw dataframe, and a second at<file_prefix>_properties.yaml
which contains the properties of the interaction matrix.- Parameters
file_prefix (str) – The prefix of the files to save, should end in the filename, but without extension (no .csv or such).
- property sorted_interaction_history: Iterator[Tuple[int, List[int]]]
The interaction IDs per user, sorted by timestamp (ascending).
- Raises
AttributeError – If there is no timestamp column can’t sort
- Yield
tuple of user ID, list of interaction IDs sorted by timestamp
- Return type
List[Tuple[int, List[int]]]
- property sorted_item_history: Iterator[Tuple[int, List[int]]]
The items the user interacted with for every user sorted by timestamp (ascending).
- Raises
AttributeError – If there is no timestamp column.
- Yield
Tuple of user ID, list of item IDs sorted by timestamp.
- Return type
List[Tuple[int, List[int]]]
- property timestamps: pandas.core.series.Series
Timestamps of interactions as a pandas Series, indexed by user ID and item ID.
- Raises
AttributeError – If there is no timestamp column.
- Returns
Series of interactions with multi-index on (user ID, item ID)
- Return type
pd.Series
- timestamps_gt(timestamp: float, inplace: bool = False) Optional[recpack.matrix.interaction_matrix.InteractionMatrix]
Select interactions after a given timestamp.
- Parameters
timestamp (float) – The timestamp with which the interactions timestamp is compared.
inplace (bool, optional) – Apply the selection in place if True, defaults to False
- Returns
None if inplace, otherwise returns a new InteractionMatrix object
- Return type
Union[InteractionMatrix, None]
- timestamps_gte(timestamp: float, inplace: bool = False) Optional[recpack.matrix.interaction_matrix.InteractionMatrix]
Select interactions after and including a given timestamp.
- Parameters
timestamp (float) – The timestamp with which the interactions timestamp is compared.
inplace (bool, optional) – Apply the selection in place if True, defaults to False
- Returns
None if inplace, otherwise returns a new InteractionMatrix object
- Return type
Union[InteractionMatrix, None]
- timestamps_lt(timestamp: float, inplace: bool = False) Optional[recpack.matrix.interaction_matrix.InteractionMatrix]
Select interactions up to a given timestamp.
- Parameters
timestamp (float) – The timestamp with which the interactions timestamp is compared.
inplace (bool, optional) – Apply the selection in place if True, defaults to False
- Returns
None if inplace, otherwise returns a new InteractionMatrix object
- Return type
Union[InteractionMatrix, None]
- timestamps_lte(timestamp: float, inplace: bool = False) Optional[recpack.matrix.interaction_matrix.InteractionMatrix]
Select interactions up to and including a given timestamp.
- Parameters
timestamp (float) – The timestamp with which the interactions timestamp is compared.
inplace (bool, optional) – Apply the selection in place if True, defaults to False
- Returns
None if inplace, otherwise returns a new InteractionMatrix object
- Return type
Union[InteractionMatrix, None]
- union(im: recpack.matrix.interaction_matrix.InteractionMatrix) recpack.matrix.interaction_matrix.InteractionMatrix
Combine events from this InteractionMatrix with another.
The matrices need to have the same shape and either both have timestamps or neither.
- Parameters
im (InteractionMatrix) – InteractionMatrix to union with.
- Returns
Union of interactions in this InteractionMatrix and the other.
- Return type
- users_in(U: Union[Set[int], List[int]], inplace=False) Optional[recpack.matrix.interaction_matrix.InteractionMatrix]
Keep only interactions by one of the specified users.
- Parameters
U (Union[Set[int], List[int]]) – A Set or List of users to select the interactions from.
inplace (bool, optional) – Apply the selection in place or not, defaults to False
- Returns
None if inplace, otherwise returns a new InteractionMatrix object
- Return type
Union[InteractionMatrix, None]
- property values: scipy.sparse._csr.csr_matrix
All user-item interactions as a sparse matrix of size
(|users|, |items|)
.Each entry is the number of interactions between that user and item. If there are no interactions between a user and item, the entry is 0.
- Returns
Interactions between users and items as a csr_matrix.
- Return type
csr_matrix