recpack.scenarios.StrongGeneralizationTimed

class recpack.scenarios.StrongGeneralizationTimed(frac_users_in, t, t_validation=None, delta_out=None, delta_in=None, validation=False, seed=None)

Predict future interactions for previously unseen users.

full_training_data contains interactions from frac_users_in of the users. Only interactions whose timestamps are in the interval [t - delta_in, t[ are used.
test_data_in contains data from the 1-frac_users_in users for which the events’ timestamps are in [t - delta_in, t[.
test_data_out contains interactions of the test users with timestamps in [t, t + delta_out [

If validation data is requested, 80% of the full training users are used as validation training users, the remaining 20% are used for validation evaluation:

validation_training_data contains interactions of validation training users with timestamps in [t_validation - delta_in, t_validation[
validation_data_in contains all interactions of the validation evaluation users with timestamps in [t_validation - delta_in, t_validation[.
validation_data_out are the interactions of the validation evaluation users, with timestamps in [t_validation, min(t, t_validation + delta_out)[

Warning

The scenario can only be used when the dataset has timestamp information.

Example

As an example, we split this data with frac_users_in = 5/6, t = 4, t_validation = 2, delta_in = None (infinity), delta_out = 2 and validation = True:

time    0   1   2   3   4   5
Alice   X   X
Bob         X   X   X   X
Carol   X   X       X       X
Dave    X       X   X
Erin    X   X   X           X
Frank   X   X   X   X

would yield full_training_data:

time    0   1   2   3   4   5
Alice   X   X
Bob         X   X   X   X
Dave    X       X   X
Erin    X   X   X
Frank   X   X   X   X

validation_training_data:

time    0   1   2   3   4   5
Alice   X   X
Dave    X       X   X
Erin    X   X   X
Frank   X   X   X   X

validation_data_in:

time    0   1   2   3   4   5
Bob         X

validation_data_out:

time    0   1   2   3   4   5
Bob             X   X

test_data_in:

time    0   1   2   3   4   5
Carol   X   X       X

test_data_out:

time    0   1   2   3   4   5
Carol                       X

Parameters

frac_users_in (float) – The fraction of users to use as training users.
t (int) – Timestamp to split the interactions of the test users into test_data_out and test_data_in; and select full_training_data out of all interactions of the training users.
t_validation (int, optional) – Timestamp to split the interactions of the validation users into validation_data_out and validation_data_in; and select validation_training_data out of all interactions of the training users. Required if validation is True.
delta_out (int, optional) – Size of interval in seconds for the target datasets. Both sets will contain interactions that occurred within delta_out seconds after the splitting timestamp. Defaults to None (all interactions past the splitting timestamp).
delta_in (int, optional) – Size of interval in seconds for full_training_data, validation_training_data, validation_data_in and test_data_in. All sets will contain interactions that occurred within delta_out seconds before the splitting timestamp. Defaults to None (all interactions past the splitting timestamp).
validation (boolean, optional) – Assign a portion of the training dataset to validation data if True, else split without validation data into only a training and test dataset.
seed (int, optional) – The seed to use for the random components of the splitter. If None, a random seed will be used. Defaults to None

Methods

split(data_m)

Splits data_m according to the scenario.

Attributes

`full_training_data`	The full training dataset, which should be used for a final training after hyper parameter optimisation.
`test_data`	The test dataset.
`test_data_in`	Fold-in part of the test dataset
`test_data_out`	Held-out part of the test dataset
`validation_data`	The validation dataset.
`validation_data_in`	Fold-in part of the validation dataset
`validation_data_out`	Held-out part of the validation dataset
`validation_training_data`	The training data to be used during validation.

property full_training_data: recpack.matrix.interaction_matrix.InteractionMatrix

The full training dataset, which should be used for a final training after hyper parameter optimisation.

Returns: Interaction Matrix of training interactions.
Return type: InteractionMatrix

split(data_m: recpack.matrix.interaction_matrix.InteractionMatrix) → None

Splits data_m according to the scenario.

After splitting properties training_data, validation_data and test_data can be used to retrieve the splitted data.

Parameters: data_m – Interaction matrix that should be split.

property test_data: Tuple[recpack.matrix.interaction_matrix.InteractionMatrix, recpack.matrix.interaction_matrix.InteractionMatrix]

The test dataset. Consist of a fold-in and hold-out set of interactions.

Data is processed such that both matrices contain the exact same users. Users that were present in only one of the matrices and not in the other are removed.

Returns: Test data matrices as InteractionMatrix in, InteractionMatrix out.
Return type: Tuple[InteractionMatrix, InteractionMatrix]

property test_data_in: Fold-in part of the test dataset

property test_data_out: Held-out part of the test dataset

property validation_data: Optional[Tuple[recpack.matrix.interaction_matrix.InteractionMatrix, recpack.matrix.interaction_matrix.InteractionMatrix]]

The validation dataset. Consist of a fold-in and hold-out set of interactions.

Data is processed such that both matrices contain the exact same users. Users that were present in only one of the matrices and not in the other are removed.

Returns: Validation data matrices as InteractionMatrix in, InteractionMatrix out.
Return type: Tuple[InteractionMatrix, InteractionMatrix]

property validation_data_in: Fold-in part of the validation dataset

property validation_data_out: Held-out part of the validation dataset

property validation_training_data: recpack.matrix.interaction_matrix.InteractionMatrix: The training data to be used during validation.