recpack.pipelines

Pipelines for easy running of experiments.

`PipelineBuilder`([folder_name, base_path])	Builder to facilitate construction of pipelines.
`Pipeline`(results_directory, ...)	Performs hyperparameter optimisation, training, prediction and evaluation.
`registries.AlgorithmRegistry`()	Registry for easy retrieval of algorithm types by name.
`registries.MetricRegistry`()	Registry for easy retrieval of metric types by name.

In order to simplify running an experiment, the pipelines module contains a pipeline which makes sure the necessary steps are performed in the right order. To define a pipeline, you should use the PipelineBuilder class, which makes it possible to construct the pipeline with a few intuitive functions.

An example of usage is:

import recpack.pipelines

# Construct pipeline_builder and add data.
# Assumes a scenario has been created and split before.
pipeline_builder = recpack.pipelines.PipelineBuilder('demo')
pipeline_builder.set_data_from_scenario(scenario)

# We'll have the pipeline optimise the K parameter from the given values.
pipeline_builder.add_algorithm('ItemKNN', params={'K': 100})

# Add NDCG and Recall to be evaluated at 10, 20, 50 and 100
pipeline_builder.add_metric('NDCGK', [10, 20, 50, 100])
pipeline_builder.add_metric('RecallK', [10, 20, 50, 100])

# Construct pipeline
pipeline = pipeline_builder.build()

# Run pipeline, will first do optimisation, and then evaluation
pipeline.run()

# Get the metric results.
# This will be a dict with the results of the run.
# Turning it into a dataframe makes reading easier
pd.DataFrame.from_dict(pipeline.get_metrics())

If you want to use the pipelines with your own algorithms or metrics, you should register them using the ALGORITHM_REGISTRY and METRIC_REGISTRY respectively. For info on the functions see registries.AlgorithmRegistry and registries.MetricRegistry.

Example to register an algorithm:

from recpack.pipelines import ALGORITHM_REGISTRY
from recpack.algorithms import ItemKNN

# Create a new algorithm, that is just a copy of ItemKNN
class NewAlgorithm(ItemKNN):
    pass

ALGORITHM_REGISTRY.register('NewAlgorithm', NewAlgorithm)

# Construct a NewAlgorithm object from the registry
algo = ALGORITHM_REGISTRY.get('NewAlgorithm')(K=20)

Example to register a metric:

from recpack.pipelines import METRIC_REGISTRY
from recpack.algorithms import Recall

# Define a new metric (that is just a copy of Recall)
class NewMetric(Recall):
    pass

METRIC_REGISTRY.register('NewMetric', NewMetric)

# Construct a NewMetric object with parameter K=20
algo = METRIC_REGISTRY.get('NewMetric')(K=20)

Optimising Hyperparameters

Hyperparameter optimisation is a fundamental part of a recommendation pipeline. You want to see which set of hyperparameters performs the best. When adding an algorithm to the pipeline_builder, you can specify optimisation info, which will tell RecPack to optimise the hyperparameters for that algorithm.

RecPack supports either an exhaustive gridsearch, or an optimised random search using a Tree of Parzen Estimator as implemented in the hyperopt library.

An example of using optimisation:

import recpack.pipelines

# Construct pipeline_builder and add data.
# Assumes a scenario has been created and split before.
pipeline_builder = recpack.pipelines.PipelineBuilder('demo')
pipeline_builder.set_data_from_scenario(scenario)

# We'll have the pipeline optimise the K parameter from the given values.
pipeline_builder.add_algorithm('ItemKNN', grid={'K': [100, 200, 300]})
pipeline_builder.set_optimisation_metric('NDCGK', K=10)

# Add NDCG and Recall to be evaluated at 10, 20, 50 and 100
pipeline_builder.add_metric('NDCGK', [10, 20, 50, 100])
pipeline_builder.add_metric('RecallK', [10, 20, 50, 100])

# Construct pipeline
pipeline = pipeline_builder.build()

# Run pipeline, will first do optimisation, and then evaluation
pipeline.run()

# Get the optimisation metric values
pipeline.optimisation_results

# Get the metric results of the algorithm with optimised hyperparameters.
pipeline.get_metrics()

`OptimisationInfo`()	Base class for Optimisation Info.
`GridSearchInfo`(params)	Info for a grid search optimisation.
`HyperoptInfo`(space[, timeout, max_evals])	Information for hyperopt parameter optimisation