recpack.matrix.InteractionMatrix

class recpack.matrix.InteractionMatrix(df: pandas.core.frame.DataFrame, item_ix: str, user_ix: str, timestamp_ix: Optional[str] = None, shape: Optional[Tuple[int, int]] = None)

An InteractionMatrix contains interactions between users and items at a certain time.

It provides a number of properties and methods for easy manipulation of this interaction data.

Note

The InteractionMatrix does not assume binary user-item pairs. If a user interacts with an item more than once, there will be two entries for this user-item pair.

Parameters
  • df (pd.DataFrame) – Dataframe containing user-item interactions. Must contain at least item ids and user ids.

  • item_ix (string) – Item ids column name.

  • user_ix (string) – User ids column name.

  • timestamp_ix (string, optional) – Interaction timestamps column name.

  • shape (Tuple[int, int], optional) – The desired shape of the matrix, i.e. the number of users and items. If no shape is specified, the number of users will be equal to the maximum user id plus one, the number of items to the maximum item id plus one.

Methods

copy()

Create a deep copy of this InteractionMatrix.

eliminate_timestamps([inplace])

Remove all timestamp information.

from_csr_matrix(X)

Create an InteractionMatrix from a csr_matrix containing interactions.

get_timestamp(interaction_id)

Return the timestamp of a specific interaction by interaction ID.

indices_in(u_i_lists[, inplace])

Select interactions between the specified user-item combinations.

interactions_in(interaction_ids[, inplace])

Select the interactions by their interaction ids

load(file_prefix)

Create a new interaction matrix instance from saved file.

nonzero()

save(file_prefix)

Save the interaction matrix to files.

timestamps_gt(timestamp[, inplace])

Select interactions after a given timestamp.

timestamps_gte(timestamp[, inplace])

Select interactions after and including a given timestamp.

timestamps_lt(timestamp[, inplace])

Select interactions up to a given timestamp.

timestamps_lte(timestamp[, inplace])

Select interactions up to and including a given timestamp.

union(im)

Combine events from this InteractionMatrix with another.

users_in(U[, inplace])

Keep only interactions by one of the specified users.

Attributes

INTERACTION_IX

ITEM_IX

TIMESTAMP_IX

USER_IX

active_users

The set of all users with at least one interaction.

binary_item_history

The unique items interacted with, per user.

binary_values

All user-item interactions as a sparse, binary matrix of size (users, items).

density

The density of the interaction matrix.

has_timestamps

Boolean indicating whether instance has timestamp information.

indices

Returns a tuple of lists of user IDs and item IDs corresponding to interactions.

interaction_history

The interactions per user

last_timestamps_matrix

A sparse matrix with the last timestamp for each user, item pair.

num_active_users

The number of users with at least one interaction.

num_interactions

The total number of interactions.

properties

sorted_interaction_history

The interaction IDs per user, sorted by timestamp (ascending).

sorted_item_history

The items the user interacted with for every user sorted by timestamp (ascending).

timestamps

Timestamps of interactions as a pandas Series, indexed by user ID and item ID.

values

All user-item interactions as a sparse matrix of size (|users|, |items|).

class InteractionMatrixProperties(num_users: int, num_items: int, has_timestamps: bool)
property active_users: Set[int]

The set of all users with at least one interaction.

Returns

Set of user IDs with at least one interaction.

Return type

Set[int]

property binary_item_history: Iterator[Tuple[int, List[int]]]

The unique items interacted with, per user.

Yield

Tuples of user ID, list of distinct item IDs the user interacted with.

Return type

List[Tuple[int, List[int]]]

property binary_values: scipy.sparse._csr.csr_matrix

All user-item interactions as a sparse, binary matrix of size (users, items).

An entry is 1 if there is at least one interaction between that user and item. In all other cases the entry is 0.

Returns

Binary csr_matrix of interactions.

Return type

csr_matrix

copy() recpack.matrix.interaction_matrix.InteractionMatrix

Create a deep copy of this InteractionMatrix.

Returns

Deep copy of this InteractionMatrix.

Return type

InteractionMatrix

property density: float

The density of the interaction matrix.

The density is computed as the fraction of user item pairs that have an interaction

Returns

The density.

Return type

float

eliminate_timestamps(inplace: bool = False) Optional[recpack.matrix.interaction_matrix.InteractionMatrix]

Remove all timestamp information.

Parameters

inplace (bool) – Modify the data matrix in place. If False, returns a new object.

classmethod from_csr_matrix(X: scipy.sparse._csr.csr_matrix) recpack.matrix.interaction_matrix.InteractionMatrix

Create an InteractionMatrix from a csr_matrix containing interactions.

Warning

No timestamps can be passed this way!

Returns

InteractionMatrix constructed from the csr_matrix.

Return type

InteractionMatrix

get_timestamp(interaction_id: int) int

Return the timestamp of a specific interaction by interaction ID.

Parameters

interaction_id (int) – the interaction ID in the DataFrame to fetch the timestamp of.

Raises

AttributeError – Raised if the object does not have timestamps.

Returns

The timestamp of the interaction.

Return type

int

property has_timestamps: bool

Boolean indicating whether instance has timestamp information.

Returns

True if timestamps information is available, False otherwise.

Return type

bool

property indices: Tuple[List[int], List[int]]

Returns a tuple of lists of user IDs and item IDs corresponding to interactions.

Returns

Tuple of lists of user IDs and item IDs that correspond to at least one interaction.

Return type

Tuple[List[int], List[int]]

indices_in(u_i_lists: Tuple[List[int], List[int]], inplace=False) Optional[recpack.matrix.interaction_matrix.InteractionMatrix]

Select interactions between the specified user-item combinations.

Parameters
  • u_i_lists (Tuple[List[int], List[int]]) – Two lists as a tuple, the first list are the indices of users, and the second are indices of items, both should be of the same length.

  • inplace (bool, optional) – Apply the selection in place to the object, defaults to False

Returns

None if inplace is True, otherwise a new InteractionMatrix object with the selection of events.

Return type

Union[InteractionMatrix, None]

property interaction_history: Iterator[Tuple[int, List[int]]]

The interactions per user

Yield

Tuples of user ID, list of interaction IDs.

Return type

List[Tuple[int, List[int]]]

interactions_in(interaction_ids: List[int], inplace: bool = False) Optional[recpack.matrix.interaction_matrix.InteractionMatrix]

Select the interactions by their interaction ids

Parameters
  • interaction_ids (List[int]) – A list of interaction ids

  • inplace (bool, optional) – Apply the selection in place, or return a new InteractionMatrix object, defaults to False

Returns

None if inplace, otherwise new InteractionMatrix object with the selected interactions

Return type

Union[None, InteractionMatrix]

property last_timestamps_matrix: scipy.sparse._csr.csr_matrix

A sparse matrix with the last timestamp for each user, item pair.

By using the maximal timestamp for each pair, we make it possible to use non deduplicated datasets.

classmethod load(file_prefix) recpack.matrix.interaction_matrix.InteractionMatrix

Create a new interaction matrix instance from saved file.

Parameters

file_prefix (str) – The prefix of the files to load, should end in the filename, but without extension (no .csv or such).

Returns

InteractionMatrix created from file.

Return type

InteractionMatrix

property num_active_users: int

The number of users with at least one interaction.

Returns

Number of active users.

Return type

int

property num_interactions: int

The total number of interactions.

Returns

Total interaction count.

Return type

int

save(file_prefix: str) None

Save the interaction matrix to files.

Creates two files one at <file_prefix>.csv with the raw dataframe, and a second at <file_prefix>_properties.yaml which contains the properties of the interaction matrix.

Parameters

file_prefix (str) – The prefix of the files to save, should end in the filename, but without extension (no .csv or such).

property sorted_interaction_history: Iterator[Tuple[int, List[int]]]

The interaction IDs per user, sorted by timestamp (ascending).

Raises

AttributeError – If there is no timestamp column can’t sort

Yield

tuple of user ID, list of interaction IDs sorted by timestamp

Return type

List[Tuple[int, List[int]]]

property sorted_item_history: Iterator[Tuple[int, List[int]]]

The items the user interacted with for every user sorted by timestamp (ascending).

Raises

AttributeError – If there is no timestamp column.

Yield

Tuple of user ID, list of item IDs sorted by timestamp.

Return type

List[Tuple[int, List[int]]]

property timestamps: pandas.core.series.Series

Timestamps of interactions as a pandas Series, indexed by user ID and item ID.

Raises

AttributeError – If there is no timestamp column.

Returns

Series of interactions with multi-index on (user ID, item ID)

Return type

pd.Series

timestamps_gt(timestamp: float, inplace: bool = False) Optional[recpack.matrix.interaction_matrix.InteractionMatrix]

Select interactions after a given timestamp.

Parameters
  • timestamp (float) – The timestamp with which the interactions timestamp is compared.

  • inplace (bool, optional) – Apply the selection in place if True, defaults to False

Returns

None if inplace, otherwise returns a new InteractionMatrix object

Return type

Union[InteractionMatrix, None]

timestamps_gte(timestamp: float, inplace: bool = False) Optional[recpack.matrix.interaction_matrix.InteractionMatrix]

Select interactions after and including a given timestamp.

Parameters
  • timestamp (float) – The timestamp with which the interactions timestamp is compared.

  • inplace (bool, optional) – Apply the selection in place if True, defaults to False

Returns

None if inplace, otherwise returns a new InteractionMatrix object

Return type

Union[InteractionMatrix, None]

timestamps_lt(timestamp: float, inplace: bool = False) Optional[recpack.matrix.interaction_matrix.InteractionMatrix]

Select interactions up to a given timestamp.

Parameters
  • timestamp (float) – The timestamp with which the interactions timestamp is compared.

  • inplace (bool, optional) – Apply the selection in place if True, defaults to False

Returns

None if inplace, otherwise returns a new InteractionMatrix object

Return type

Union[InteractionMatrix, None]

timestamps_lte(timestamp: float, inplace: bool = False) Optional[recpack.matrix.interaction_matrix.InteractionMatrix]

Select interactions up to and including a given timestamp.

Parameters
  • timestamp (float) – The timestamp with which the interactions timestamp is compared.

  • inplace (bool, optional) – Apply the selection in place if True, defaults to False

Returns

None if inplace, otherwise returns a new InteractionMatrix object

Return type

Union[InteractionMatrix, None]

union(im: recpack.matrix.interaction_matrix.InteractionMatrix) recpack.matrix.interaction_matrix.InteractionMatrix

Combine events from this InteractionMatrix with another.

The matrices need to have the same shape and either both have timestamps or neither.

Parameters

im (InteractionMatrix) – InteractionMatrix to union with.

Returns

Union of interactions in this InteractionMatrix and the other.

Return type

InteractionMatrix

users_in(U: Union[Set[int], List[int]], inplace=False) Optional[recpack.matrix.interaction_matrix.InteractionMatrix]

Keep only interactions by one of the specified users.

Parameters
  • U (Union[Set[int], List[int]]) – A Set or List of users to select the interactions from.

  • inplace (bool, optional) – Apply the selection in place or not, defaults to False

Returns

None if inplace, otherwise returns a new InteractionMatrix object

Return type

Union[InteractionMatrix, None]

property values: scipy.sparse._csr.csr_matrix

All user-item interactions as a sparse matrix of size (|users|, |items|).

Each entry is the number of interactions between that user and item. If there are no interactions between a user and item, the entry is 0.

Returns

Interactions between users and items as a csr_matrix.

Return type

csr_matrix