recpack.scenarios.TimedLastItemPrediction

class recpack.scenarios.TimedLastItemPrediction(t: float, t_validation: Optional[float] = None, n_most_recent_in: Optional[int] = 2147483647, delta_out: int = 2147483647, validation: bool = False, seed: Optional[int] = None)

Predict users’ last interaction, given information about historical interactions.

Scenario frequently used in evaluation of sequential recommendation algorithms.

Warning

The scenario can only be used when the dataset has timestamp information, because the order of interactions is needed to correctly split the data.

The scenario splits the data such that the last interaction of a user is the target for prediction, while the earlier ones are used for training and as history. Only users with an interaction after t (or t_validation) are considered for evaluation.

full_training_data contains all events from all users before t
validation_training_data contains all events from all users before t_validation
validation_data_out contains the most recent interaction of a user in the interval [t_validation, min(t, t_validation + t_delta_out)[.
validaton_data_in contains the earlier interactions of the validation users. The n_most_recent_in are used per user.
test_data_out contains the most recent interaction of a user in the interval [t, t + delta_out].
test_data_in contains the earlier interactions of the test users. The n_most_recent_in are used per user.

Example

As an example, splitting following data with t = 4, t_validation = 2 and validation = True:

time    0   1   2   3   4   5
Alice   X   X
Bob         X   X   X   X
Carol       X   X   X

would yield full_training_data:

time    0   1   2   3   4   5
Alice   X   X
Bob         X   X   X
Carol       X   X   X

validation_training_data:

time    0   1   2   3   4   5
Alice   X   X
Bob         X
Carol       X

validation_data_in:

time    0   1   2   3   4   5
Bob         X   X
Carol       X   X

validation_data_out:

time    0   1   2   3   4   5
Bob                 X
Carol               X

test_data_in:

time    0   1   2   3   4   5
Bob         X   X   X

test_data_out:

time    0   1   2   3   4   5
Bob                     X

Parameters

t – Timestamp for splitting full training data and test data. Simulates a training moment in real time evaluation.
t_validation – timestamp for splitting validation training and evaluation data. Only required if validation is True.
n_most_recent_in (int, optional) – The number of interactions to use as history. Most recent interactions are taken per user. Defaults to max integer value.
delta_out (int, optional) – Seconds past t. Upper bound on the timestamp of interactions in the target datasets return value. Defaults to np.iinfo(np.int32).max (infinity).
validation (boolean, optional) – Assign a portion of the full training dataset to validation data if True, else split without validation data into only a training and test dataset. Defaults to False.
seed (int, optional) – Seed for randomisation parts of the scenario. This scenario is deterministic, so changing seed should not matter. Defaults to None, so random seed will be generated.

Methods

split(data_m)

Splits data_m according to the scenario.

Attributes

`full_training_data`	The full training dataset, which should be used for a final training after hyper parameter optimisation.
`test_data`	The test dataset.
`test_data_in`	Fold-in part of the test dataset
`test_data_out`	Held-out part of the test dataset
`validation_data`	The validation dataset.
`validation_data_in`	Fold-in part of the validation dataset
`validation_data_out`	Held-out part of the validation dataset
`validation_training_data`	The training data to be used during validation.

property full_training_data: recpack.matrix.interaction_matrix.InteractionMatrix

The full training dataset, which should be used for a final training after hyper parameter optimisation.

Returns: Interaction Matrix of training interactions.
Return type: InteractionMatrix

split(data_m: recpack.matrix.interaction_matrix.InteractionMatrix) → None

Splits data_m according to the scenario.

After splitting properties training_data, validation_data and test_data can be used to retrieve the splitted data.

Parameters: data_m – Interaction matrix that should be split.

property test_data: Tuple[recpack.matrix.interaction_matrix.InteractionMatrix, recpack.matrix.interaction_matrix.InteractionMatrix]

The test dataset. Consist of a fold-in and hold-out set of interactions.

Data is processed such that both matrices contain the exact same users. Users that were present in only one of the matrices and not in the other are removed.

Returns: Test data matrices as InteractionMatrix in, InteractionMatrix out.
Return type: Tuple[InteractionMatrix, InteractionMatrix]

property test_data_in: Fold-in part of the test dataset

property test_data_out: Held-out part of the test dataset

property validation_data: Optional[Tuple[recpack.matrix.interaction_matrix.InteractionMatrix, recpack.matrix.interaction_matrix.InteractionMatrix]]

The validation dataset. Consist of a fold-in and hold-out set of interactions.

Data is processed such that both matrices contain the exact same users. Users that were present in only one of the matrices and not in the other are removed.

Returns: Validation data matrices as InteractionMatrix in, InteractionMatrix out.
Return type: Tuple[InteractionMatrix, InteractionMatrix]

property validation_data_in: Fold-in part of the validation dataset

property validation_data_out: Held-out part of the validation dataset

property validation_training_data: recpack.matrix.interaction_matrix.InteractionMatrix: The training data to be used during validation.