recpack.pipelines.Pipeline
- class recpack.pipelines.Pipeline(results_directory: str, algorithm_entries: List[recpack.pipelines.registries.AlgorithmEntry], metric_entries: List[recpack.pipelines.registries.MetricEntry], full_training_data: recpack.matrix.interaction_matrix.InteractionMatrix, validation_training_data: Optional[recpack.matrix.interaction_matrix.InteractionMatrix], validation_data: Optional[Tuple[recpack.matrix.interaction_matrix.InteractionMatrix, recpack.matrix.interaction_matrix.InteractionMatrix]], test_data: Tuple[recpack.matrix.interaction_matrix.InteractionMatrix, recpack.matrix.interaction_matrix.InteractionMatrix], optimisation_metric_entry: Optional[recpack.pipelines.registries.OptimisationMetricEntry], post_processor: recpack.postprocessing.postprocessors.Postprocessor, remove_history: bool)
Performs hyperparameter optimisation, training, prediction and evaluation.
Pipeline is run per algorithm. First, if an optimisation_metric is specified, hyperparameters are optimised by training on validation_training_data and evaluating using validation_data. Next, unless the model is based on
recpack.algorithms.TorchMLAlgorithm
, the model with optimised parameters is retrained on full_training_data. The final evaluation happens using the test_data.Results can be accessed via the
get_metrics()
method.- Parameters
results_directory (string) – Path to a directory in which to save results of the pipeline when save_metrics() is called.
algorithm_entries (List[AlgorithmEntry]) – List of AlgorithmEntry objects to evaluate in this pipeline. An AlgorithmEntry defines which algorithm to train, with which fixed parameters (params) and which parameters to optimize (grid).
metric_entries (List[MetricEntry]) – List of MetricEntry objects to evaluate each algorithm on. A MetricEntry defines which metric and value of the parameter K (number of recommendations).
full_training_data (InteractionMatrix) – The data to train models on, in the final evaluation.
validation_training_data (InteractionMatrix) – The data to train models on when optimising parameters.
validation_data (Union[Tuple[InteractionMatrix, InteractionMatrix], None]) – The data to use for optimising parameters, can be None only if none of the algorithms require optimisation.
test_data – The data to perform evaluation, as (test_in, test_out) tuple.
optimisation_metric (Union[OptimisationMetricEntry, None]) – The metric to optimise each algorithm on.
post_processor (Postprocessor) – A postprocessor instance to apply filters on the recommendation scores.
remove_history (Boolean) – Boolean to configure if the recommendations can include items that were previously interacted with.
- Type
Tuple[InteractionMatrix, InteractionMatrix]
Methods
get_metrics
([short])Get the metrics for the pipeline.
Get the amount of users used in the evaluation.
run
()Runs the pipeline.
Save the metrics in a json file
Attributes
Contains a result for each of the hyperparameter combinations tried out, for each of the algorithms evaluated.
- get_metrics(short: Optional[bool] = False) pandas.core.frame.DataFrame
Get the metrics for the pipeline.
- Parameters
short (bool, optional) – If short is True, only the algorithm names are returned, and not the parameters. Defaults to False
- Returns
Algorithms and their respective performance.
- Return type
pd.DataFrame
- get_num_users() int
Get the amount of users used in the evaluation.
- Returns
The number of users used in the evaluation.
- Return type
int
- property optimisation_results
Contains a result for each of the hyperparameter combinations tried out, for each of the algorithms evaluated.
- run()
Runs the pipeline.
- save_metrics() None
Save the metrics in a json file
The file will be saved in the experiment directory.