recpack.pipelines.Pipeline

class recpack.pipelines.Pipeline(results_directory: str, algorithm_entries: List[recpack.pipelines.registries.AlgorithmEntry], metric_entries: List[recpack.pipelines.registries.MetricEntry], full_training_data: recpack.matrix.interaction_matrix.InteractionMatrix, validation_training_data: Optional[recpack.matrix.interaction_matrix.InteractionMatrix], validation_data: Optional[Tuple[recpack.matrix.interaction_matrix.InteractionMatrix, recpack.matrix.interaction_matrix.InteractionMatrix]], test_data: Tuple[recpack.matrix.interaction_matrix.InteractionMatrix, recpack.matrix.interaction_matrix.InteractionMatrix], optimisation_metric_entry: Optional[recpack.pipelines.registries.OptimisationMetricEntry], post_processor: recpack.postprocessing.postprocessors.Postprocessor, remove_history: bool)

Performs optimisation, training, prediction and evaluation, keeping track of results.

Pipeline is run per algorithm. First grid parameters are optimised by training on validation_training_data and evaluating using validation_data. Next, unless the model is based on recpack.algorithms.TorchMLAlgorithm, the model with optimised parameters is retrained on full_training_data. The final evaluation happens using the test_data.

Results can be accessed via the get_metrics() method.

Parameters
  • results_directory (string) – Path to a directory in which to save results of the pipeline when save_metrics() is called.

  • algorithm_entries (List[AlgorithmEntry]) – List of AlgorithmEntry objects to evaluate in this pipeline. An AlgorithmEntry defines which algorithm to train, with which fixed parameters (params) and which parameters to optimize (grid).

  • metric_entries (List[MetricEntry]) – List of MetricEntry objects to evaluate each algorithm on. A MetricEntry defines which metric and value of the parameter K (number of recommendations).

  • full_training_data (InteractionMatrix) – The data to train models on, in the final evaluation.

  • validation_training_data (InteractionMatrix) – The data to train models on when optimising parameters.

  • validation_data (Union[Tuple[InteractionMatrix, InteractionMatrix], None]) – The data to use for optimising parameters, can be None only if none of the algorithms require optimisation.

  • test_data – The data to perform evaluation, as (test_in, test_out) tuple.

  • optimisation_metric (Union[OptimisationMetricEntry, None]) – The metric to optimise each algorithm on.

  • post_processor (Postprocessor) – A postprocessor instance to apply filters on the recommendation scores.

  • remove_history (Boolean) – Boolean to configure if the recommendations can include already interacted with items.

Type

Tuple[InteractionMatrix, InteractionMatrix]

Methods

get_metrics([short])

Get the metrics for the pipeline.

get_num_users()

Get the amount of users used in the evaluation.

run()

Runs the pipeline.

save_metrics()

Save the metrics in a json file

Attributes

optimisation_results

Contains a result for each of the hyperparameter combinations tried out, for each of the algorithms evaluated.

get_metrics(short: Optional[bool] = False) pandas.core.frame.DataFrame

Get the metrics for the pipeline.

Parameters

short (bool, optional) – If short is True, only the algorithm names are returned, and not the parameters. Defaults to False

Returns

Algorithms and their respective performance.

Return type

pd.DataFrame

get_num_users() int

Get the amount of users used in the evaluation.

Returns

The number of users used in the evaluation.

Return type

int

property optimisation_results

Contains a result for each of the hyperparameter combinations tried out, for each of the algorithms evaluated.

run()

Runs the pipeline.

save_metrics() None

Save the metrics in a json file

The file will be saved in the experiment directory.