recpack.scenarios.StrongGeneralizationTimedMostRecent

class recpack.scenarios.StrongGeneralizationTimedMostRecent(t: float, t_validation: Optional[float] = None, n_most_recent_out: int = 1, validation: bool = False, seed=None)

Predict the next interaction(s) for previously unseen users.

  • full_training_data contains events from all users whose most recent interaction was before t

Test data contains all users whose most recent interactions was after t:

  • test_data_out contains the n_most_recent_out most recent interactions of a user whose most recent interactions was after t.

  • test_data_in contains all earlier interactions of the test users.

If validation data is requested, validation evaluation users are those training users whose most recent interaction occurred after t_validation.

  • validation_training_data contains users whose most recent interaction happened before t_validation.

  • validation_data_out contains the n_most_recent_out most recent interactions of a user whose most recent interactions was in the interval [t_validation, t[.

  • validaton_data_in contains all earlier interactions of the validation_evaluation users.

Warning

The scenario can only be used when the dataset has timestamp information.

Example

As an example, splitting following data with t = 4, t_validation = 2, n_most_recent_out = 1 and validation = True:

time    0   1   2   3   4   5
Alice   X   X
Bob         X   X   X   X
Carol       X   X   X

would yield full_training_data:

time    0   1   2   3   4   5
Alice   X   X
Carol       X   X   X

validation_training_data:

time    0   1   2   3   4   5
Alice   X   X

validation_data_in:

time    0   1   2   3   4   5
Carol       X

validation_data_out:

time    0   1   2   3   4   5
Carol           X   X

test_data_in:

time    0   1   2   3   4   5
Bob         X   X

test_data_out:

time    0   1   2   3   4   5
Bob                 X   X
Parameters
  • t – Users whose last action has time >= t are placed in the test set, all other users are placed in the training or validation sets.

  • t_validation – Users whose last action has time >= t_validation and time < t are put in the validation set. Users whose last action has time < t_validation are put in train. Only required if validation is True.

  • n_most_recent_out – The number of user actions to consider as target. Defaults to 1.

  • validation (boolean, optional) – Assign a portion of the full training dataset to validation data if True, else split without validation data into only a training and test dataset.

  • seed (int, optional) – Seed for randomisation parts of the scenario. This scenario is deterministic, so changing seed should not matter. Defaults to None, so random seed will be generated.

Methods

split(data_m)

Splits data_m according to the scenario.

Attributes

full_training_data

The full training dataset, which should be used for a final training after hyper parameter optimisation.

test_data

The test dataset.

test_data_in

Fold-in part of the test dataset

test_data_out

Held-out part of the test dataset

validation_data

The validation dataset.

validation_data_in

Fold-in part of the validation dataset

validation_data_out

Held-out part of the validation dataset

validation_training_data

The training data to be used during validation.

property full_training_data: recpack.matrix.interaction_matrix.InteractionMatrix

The full training dataset, which should be used for a final training after hyper parameter optimisation.

Returns

Interaction Matrix of training interactions.

Return type

InteractionMatrix

split(data_m: recpack.matrix.interaction_matrix.InteractionMatrix) None

Splits data_m according to the scenario.

After splitting properties training_data, validation_data and test_data can be used to retrieve the splitted data.

Parameters

data_m – Interaction matrix that should be split.

property test_data: Tuple[recpack.matrix.interaction_matrix.InteractionMatrix, recpack.matrix.interaction_matrix.InteractionMatrix]

The test dataset. Consist of a fold-in and hold-out set of interactions.

Data is processed such that both matrices contain the exact same users. Users that were present in only one of the matrices and not in the other are removed.

Returns

Test data matrices as InteractionMatrix in, InteractionMatrix out.

Return type

Tuple[InteractionMatrix, InteractionMatrix]

property test_data_in

Fold-in part of the test dataset

property test_data_out

Held-out part of the test dataset

property validation_data: Optional[Tuple[recpack.matrix.interaction_matrix.InteractionMatrix, recpack.matrix.interaction_matrix.InteractionMatrix]]

The validation dataset. Consist of a fold-in and hold-out set of interactions.

Data is processed such that both matrices contain the exact same users. Users that were present in only one of the matrices and not in the other are removed.

Returns

Validation data matrices as InteractionMatrix in, InteractionMatrix out.

Return type

Tuple[InteractionMatrix, InteractionMatrix]

property validation_data_in

Fold-in part of the validation dataset

property validation_data_out

Held-out part of the validation dataset

property validation_training_data: recpack.matrix.interaction_matrix.InteractionMatrix

The training data to be used during validation.