recpack.scenarios.StrongGeneralization

class recpack.scenarios.StrongGeneralization(frac_users_train: float = 0.8, frac_interactions_in: float = 0.8, validation: bool = False, seed: Optional[int] = None)

Predict (randomly) held-out interactions of previously unseen users.

During splitting each user is randomly assigned to one of three groups of users: training, validation and testing.

The full training data contains frac_users_train of the users. If validation data is requested validation_training data contains frac_users_train * 0.8 of the users, the remaining 20% are assigned to validation data.

Test and validation users’ interactions are split into a data_in (fold-in) and data_out (held-out) set. frac_interactions_in of interactions are assigned to fold-in, the remainder to the held-out set.

Strong generalization is considered more robust, harder and more realistic than recpack.splitters.scenarios.WeakGeneralization

Example

As an example, we split this interaction matrix with frac_users_train = 5/6, frac_interactions_in = 0.75 and validation = True:

        0   1   2   3   4   5
Alice   X   X
Bob         X   X   X   X
Carol   X   X       X       X
Dave    X       X   X
Erin    X   X   X
Frank   X   X   X   X

would yield full_training_data:

        0   1   2   3   4   5
Alice   X   X
Carol   X   X       X       X
Dave    X       X   X
Erin    X   X   X
Frank   X   X   X   X

validation_training_data:

        0   1   2   3   4   5
Alice   X   X
Carol   X   X       X       X
Erin    X   X   X
Frank   X   X   X   X

validation_data_in:

        0   1   2   3   4   5
Dave    X           X

validation_data_out:

        0   1   2   3   4   5
Dave            X

test_data_in:

        0   1   2   3   4   5
Bob         X       X   X

test_data_out:

        0   1   2   3   4   5
Bob             X

Parameters

frac_users_train (float, optional) – Fraction of users assigned to the full training dataset. Between 0 and 1. Defaults to 0.8.
frac_interactions_in (float, optional) – Fraction of the users’ interactions to be used as fold-in set (user history). Between 0 and 1. Defaults to 0.8.
validation (boolean, optional) – Assign a portion of the full training dataset to validation datasets if True, else split without validation data into only a training and test dataset.
seed (int, optional) – The seed to use for the random components of the splitter. If None, a random seed will be used. Defaults to None

Methods

split(data_m)

Splits data_m according to the scenario.

Attributes

`full_training_data`	The full training dataset, which should be used for a final training after hyper parameter optimisation.
`test_data`	The test dataset.
`test_data_in`	Fold-in part of the test dataset
`test_data_out`	Held-out part of the test dataset
`validation_data`	The validation dataset.
`validation_data_in`	Fold-in part of the validation dataset
`validation_data_out`	Held-out part of the validation dataset
`validation_training_data`	The training data to be used during validation.

property full_training_data: recpack.matrix.interaction_matrix.InteractionMatrix

The full training dataset, which should be used for a final training after hyper parameter optimisation.

Returns: Interaction Matrix of training interactions.
Return type: InteractionMatrix

split(data_m: recpack.matrix.interaction_matrix.InteractionMatrix) → None

Splits data_m according to the scenario.

After splitting properties training_data, validation_data and test_data can be used to retrieve the splitted data.

Parameters: data_m – Interaction matrix that should be split.

property test_data: Tuple[recpack.matrix.interaction_matrix.InteractionMatrix, recpack.matrix.interaction_matrix.InteractionMatrix]

The test dataset. Consist of a fold-in and hold-out set of interactions.

Data is processed such that both matrices contain the exact same users. Users that were present in only one of the matrices and not in the other are removed.

Returns: Test data matrices as InteractionMatrix in, InteractionMatrix out.
Return type: Tuple[InteractionMatrix, InteractionMatrix]

property test_data_in: Fold-in part of the test dataset

property test_data_out: Held-out part of the test dataset

property validation_data: Optional[Tuple[recpack.matrix.interaction_matrix.InteractionMatrix, recpack.matrix.interaction_matrix.InteractionMatrix]]

The validation dataset. Consist of a fold-in and hold-out set of interactions.

Data is processed such that both matrices contain the exact same users. Users that were present in only one of the matrices and not in the other are removed.

Returns: Validation data matrices as InteractionMatrix in, InteractionMatrix out.
Return type: Tuple[InteractionMatrix, InteractionMatrix]

property validation_data_in: Fold-in part of the validation dataset

property validation_data_out: Held-out part of the validation dataset

property validation_training_data: recpack.matrix.interaction_matrix.InteractionMatrix: The training data to be used during validation.