recpack.scenarios.Timed
- class recpack.scenarios.Timed(t, t_validation: Optional[int] = None, delta_out: int = 2147483647, delta_in: int = 2147483647, validation: bool = False, seed: Optional[int] = None)
Predict users’ future interactions, given information about historical interactions.
full_training_data
is constructed by using all interactions whose timestamps are in the interval[t - delta_in, t[
test_data_in
are events with timestamps in[t - delta_in, t[
.test_data_out
are events with timestamps in[t, t + delta_out[
.validation_training_data
are all interactions with timestamps in[t_validation - delta_in, t_validation[
.validation_data_in
are interactions with timestamps in[t_validation - delta_in, t_validation[
validation_data_out
are interactions with timestamps in[t_validation, min(t, t_validation + delta_out)[
.
Warning
The scenario can only be used when the dataset has timestamp information.
Example
As an example, we split this data with
t = 4
,t_validation = 2
``delta_in = None (infinity)
, delta_out = 2``, andvalidation = True
:time 0 1 2 3 4 5 6 Alice X X X Bob X X X X Carol X X X X X
would yield full_training_data:
time 0 1 2 3 4 5 6 Alice X X Bob X X X Carol X X X
validation_training_data:
time 0 1 2 3 4 5 6 Alice X X Bob X Carol X X
validation_data_in:
time 0 1 2 3 4 5 6 Bob X Carol X X
validation_data_out:
time 0 1 2 3 4 5 6 Bob X X Carol X
test_data_in:
time 0 1 2 3 4 5 6 Alice X X Carol X X X
test_data_out:
time 0 1 2 3 4 5 6 Alice X Carol X X
- Parameters
t (int) – Timestamp to split target dataset
test_data_out
from the remainder of the data.t_validation (int, optional) – Timestamp to split
validation_data_out
fromvalidation_training_data
. Required if validation is True.delta_out (int, optional) – Size of interval in seconds for both
validation_data_out
andtest_data_out
. Both sets will contain interactions that occurred withindelta_out
seconds after the splitting timestamp. Defaults to maximal integer value (acting as infinity).delta_in (int, optional) – Size of interval in seconds for
full_training_data
,validation_training_data
,validation_data_in
andtest_data_in
. All sets will contain interactions that occurred withindelta_out
seconds before the splitting timestamp. Defaults to maximal integer value (acting as infinity).validation (boolean, optional) – Assign a portion of the full training dataset to validation data if True, else split without validation data into only a training and test dataset.
seed (int, optional) – Seed for randomisation parts of the scenario. Timed scenario is deterministic, so changing seed should not matter. Defaults to None, so random seed will be generated.
Methods
split
(data_m)Splits
data_m
according to the scenario.Attributes
The full training dataset, which should be used for a final training after hyper parameter optimisation.
The test dataset.
Fold-in part of the test dataset
Held-out part of the test dataset
The validation dataset.
Fold-in part of the validation dataset
Held-out part of the validation dataset
The training data to be used during validation.
- property full_training_data: recpack.matrix.interaction_matrix.InteractionMatrix
The full training dataset, which should be used for a final training after hyper parameter optimisation.
- Returns
Interaction Matrix of training interactions.
- Return type
- split(data_m: recpack.matrix.interaction_matrix.InteractionMatrix) None
Splits
data_m
according to the scenario.After splitting properties
training_data
,validation_data
andtest_data
can be used to retrieve the splitted data.- Parameters
data_m – Interaction matrix that should be split.
- property test_data: Tuple[recpack.matrix.interaction_matrix.InteractionMatrix, recpack.matrix.interaction_matrix.InteractionMatrix]
The test dataset. Consist of a fold-in and hold-out set of interactions.
Data is processed such that both matrices contain the exact same users. Users that were present in only one of the matrices and not in the other are removed.
- Returns
Test data matrices as InteractionMatrix in, InteractionMatrix out.
- Return type
Tuple[InteractionMatrix, InteractionMatrix]
- property test_data_in
Fold-in part of the test dataset
- property test_data_out
Held-out part of the test dataset
- property validation_data: Optional[Tuple[recpack.matrix.interaction_matrix.InteractionMatrix, recpack.matrix.interaction_matrix.InteractionMatrix]]
The validation dataset. Consist of a fold-in and hold-out set of interactions.
Data is processed such that both matrices contain the exact same users. Users that were present in only one of the matrices and not in the other are removed.
- Returns
Validation data matrices as InteractionMatrix in, InteractionMatrix out.
- Return type
Tuple[InteractionMatrix, InteractionMatrix]
- property validation_data_in
Fold-in part of the validation dataset
- property validation_data_out
Held-out part of the validation dataset
- property validation_training_data: recpack.matrix.interaction_matrix.InteractionMatrix
The training data to be used during validation.