recpack.pipelines.Pipeline

class recpack.pipelines.Pipeline(results_directory: str, algorithm_entries: List[recpack.pipelines.registries.AlgorithmEntry], metric_entries: List[recpack.pipelines.registries.MetricEntry], full_training_data: recpack.matrix.interaction_matrix.InteractionMatrix, validation_training_data: Optional[recpack.matrix.interaction_matrix.InteractionMatrix], validation_data: Optional[Tuple[recpack.matrix.interaction_matrix.InteractionMatrix, recpack.matrix.interaction_matrix.InteractionMatrix]], test_data: Tuple[recpack.matrix.interaction_matrix.InteractionMatrix, recpack.matrix.interaction_matrix.InteractionMatrix], optimisation_metric_entry: Optional[recpack.pipelines.registries.OptimisationMetricEntry], post_processor: recpack.postprocessing.postprocessors.Postprocessor, remove_history: bool)

Performs hyperparameter optimisation, training, prediction and evaluation.

Pipeline is run per algorithm. First, if an optimisation_metric is specified, hyperparameters are optimised by training on validation_training_data and evaluating using validation_data. Next, unless the model is based on recpack.algorithms.TorchMLAlgorithm, the model with optimised parameters is retrained on full_training_data. The final evaluation happens using the test_data.

Results can be accessed via the get_metrics() method.

Parameters

results_directory (string) – Path to a directory in which to save results of the pipeline when save_metrics() is called.
algorithm_entries (List[AlgorithmEntry]) – List of AlgorithmEntry objects to evaluate in this pipeline. An AlgorithmEntry defines which algorithm to train, with which fixed parameters (params) and which parameters to optimize (grid).
metric_entries (List[MetricEntry]) – List of MetricEntry objects to evaluate each algorithm on. A MetricEntry defines which metric and value of the parameter K (number of recommendations).
full_training_data (InteractionMatrix) – The data to train models on, in the final evaluation.
validation_training_data (InteractionMatrix) – The data to train models on when optimising parameters.
validation_data (Union[Tuple[InteractionMatrix, InteractionMatrix], None]) – The data to use for optimising parameters, can be None only if none of the algorithms require optimisation.
test_data – The data to perform evaluation, as (test_in, test_out) tuple.
optimisation_metric (Union[OptimisationMetricEntry, None]) – The metric to optimise each algorithm on.
post_processor (Postprocessor) – A postprocessor instance to apply filters on the recommendation scores.
remove_history (Boolean) – Boolean to configure if the recommendations can include items that were previously interacted with.

Type

Tuple[InteractionMatrix, InteractionMatrix]

Methods

`get_metrics`([short])	Get the metrics for the pipeline.
`get_num_users`()	Get the amount of users used in the evaluation.
`run`()	Runs the pipeline.
`save_metrics`()	Save the metrics in a json file

Attributes

optimisation_results

Contains a result for each of the hyperparameter combinations tried out, for each of the algorithms evaluated.

get_metrics(short: Optional[bool] = False) → pandas.core.frame.DataFrame

Get the metrics for the pipeline.

Parameters: short (bool, optional) – If short is True, only the algorithm names are returned, and not the parameters. Defaults to False
Returns: Algorithms and their respective performance.
Return type: pd.DataFrame

get_num_users() → int

Get the amount of users used in the evaluation.

Returns: The number of users used in the evaluation.
Return type: int

property optimisation_results: Contains a result for each of the hyperparameter combinations tried out, for each of the algorithms evaluated.

run(): Runs the pipeline.

save_metrics() → None

Save the metrics in a json file

The file will be saved in the experiment directory.