recpack.scenarios.WeakGeneralization
- class recpack.scenarios.WeakGeneralization(frac_data_in: float = 0.8, validation: bool = False, seed: Optional[int] = None)
Predict (randomly) held-out interactions for all users, with remaining data used for training.
For each user their events
I_u
are distributed over the datasets as follows:full_training_data
(IT_u
) containsfrac_data_in * |I_u|
(rounded up) of the user’s interactions in the full dataset.test_data_in
contains the same events asfull_training_data
.test_data_out
contains the remaining(1 - frac_data_in) * |I_u|
of the user’s interactions in the full dataset.validation_training_data
containsfrac_data_in * |IT_u|
(rounded up) of the user’s interactions in the full training dataset.validation_data_in
contains the same events asvalidation_training_data
validation_data_out
contains the remaining(1 - frac_data_in) * |IT_u|
(rounded down) of the user’s interactions in the full training dataset.
Example
As an example, splitting following data with
data_in_frac = 0.5
, andvalidation = True
:item 0 1 2 3 4 5 Alice X X X Bob X X X X
would yield full_training_data:
item 0 1 2 3 4 5 Alice X X Bob X X
validation_training_data:
item 0 1 2 3 4 5 Alice X Bob X
validation_data_in:
item 0 1 2 3 4 5 Alice X Bob X
validation_data_out:
item 0 1 2 3 4 5 Alice X Bob X
test_data_in:
item 0 1 2 3 4 5 Alice X X Bob X X
test_data_out:
item 0 1 2 3 4 5 Alice X Bob X X
- Parameters
frac_data_in (float, optional) – Fraction of interactions per user used for training. The interactions are randomly chosen. Defaults to 0.8.
validation (boolean, optional) – Assign a portion of the training dataset to validation data if True, else split without validation data into only a training and test dataset.
seed (int, optional) – The seed to use for the random components of the splitter. If None, a random seed will be used. Defaults to None
Methods
split
(data_m)Splits
data_m
according to the scenario.Attributes
The full training dataset, which should be used for a final training after hyper parameter optimisation.
The test dataset.
Fold-in part of the test dataset
Held-out part of the test dataset
The validation dataset.
Fold-in part of the validation dataset
Held-out part of the validation dataset
The training data to be used during validation.
- property full_training_data: recpack.matrix.interaction_matrix.InteractionMatrix
The full training dataset, which should be used for a final training after hyper parameter optimisation.
- Returns
Interaction Matrix of training interactions.
- Return type
- split(data_m: recpack.matrix.interaction_matrix.InteractionMatrix) None
Splits
data_m
according to the scenario.After splitting properties
training_data
,validation_data
andtest_data
can be used to retrieve the splitted data.- Parameters
data_m – Interaction matrix that should be split.
- property test_data: Tuple[recpack.matrix.interaction_matrix.InteractionMatrix, recpack.matrix.interaction_matrix.InteractionMatrix]
The test dataset. Consist of a fold-in and hold-out set of interactions.
Data is processed such that both matrices contain the exact same users. Users that were present in only one of the matrices and not in the other are removed.
- Returns
Test data matrices as InteractionMatrix in, InteractionMatrix out.
- Return type
Tuple[InteractionMatrix, InteractionMatrix]
- property test_data_in
Fold-in part of the test dataset
- property test_data_out
Held-out part of the test dataset
- property validation_data: Optional[Tuple[recpack.matrix.interaction_matrix.InteractionMatrix, recpack.matrix.interaction_matrix.InteractionMatrix]]
The validation dataset. Consist of a fold-in and hold-out set of interactions.
Data is processed such that both matrices contain the exact same users. Users that were present in only one of the matrices and not in the other are removed.
- Returns
Validation data matrices as InteractionMatrix in, InteractionMatrix out.
- Return type
Tuple[InteractionMatrix, InteractionMatrix]
- property validation_data_in
Fold-in part of the validation dataset
- property validation_data_out
Held-out part of the validation dataset
- property validation_training_data: recpack.matrix.interaction_matrix.InteractionMatrix
The training data to be used during validation.