recpack.pipelines.PipelineBuilder

class recpack.pipelines.PipelineBuilder(folder_name: Optional[str] = None, base_path: Optional[str] = None)

Builder to facilitate construction of pipelines.

The builder contains functions to set specific values for the pipeline. Save and Load make it possible to easily recreate pipelines.

To disable history filtering in the pipeline, set the remove_history attribute to False.:

pipeline_builder.remove_history = False
Parameters
  • folder_name (str, optional) – The name of the folder where pipeline information will be stored. If no name is specified, the timestamp of creation is used.

  • base_path (str, optional) – The base_path to store pipeline in, defaults to the current working directory.

Methods

add_algorithm(algorithm[, grid, params])

Add an algorithm to use in the pipeline.

add_metric(metric, K)

Register a metric to evaluate

add_post_filter(filter)

Add a filter which will be applied

build()

Construct a pipeline object, given the set values.

set_data_from_scenario(scenario)

Set the train, validation and test data based by extracting them from the scenario.

set_full_training_data(train_data)

Set the full_training dataset.

set_optimisation_metric(metric, K[, minimise])

Set the metric for optimisation of parameters in algorithms.

set_test_data(test_data)

Set the test datasets.

set_validation_data(validation_data)

Set the validation datasets.

set_validation_training_data(train_data)

Set the validation training dataset.

Attributes

remove_history

True to enable removal of a user's previous interactions, `False` to disable.

add_algorithm(algorithm: Union[str, type], grid: Optional[Dict[str, List]] = None, params: Optional[Dict[str, Any]] = None)

Add an algorithm to use in the pipeline.

If the algorithm is not implemented by default in recpack, you should register it in the ALGORITHM_REGISTRY

Parameters
  • algorithm (Union[str, type]) – Algorithm class name or type of the algorithm to add.

  • grid (Dict[str, List], optional) – Parameters to optimise, the dict will be turned into a grid such that each combination of values is used. Defaults to None

  • params (Dict[str, Any], optional) – The fixed parameters for running the algorithm, represented as a key-value dictionary. Defaults to None

Raises

ValueError – If the passed algorithm can’t be resolved to a key in the ALGORITHM_REGISTRY.

add_metric(metric: Union[str, type], K: Union[List, int])

Register a metric to evaluate

Parameters
  • metric (Union[str, type]) – Metric name or type.

  • K (Union[List, int]) – The K value(s) used to construct metrics. If it is a list, for each value a metric is added.

Raises

ValueError – If metric can’t be resolved to a key in the METRIC_REGISTRY.

add_post_filter(filter: recpack.postprocessing.filters.PostFilter) None
Add a filter which will be applied

on the recommendation scores before prediction.

Parameters

filter (PostFilter) – Filter to apply, cannot be of type RemoveHistory

build() recpack.pipelines.pipeline.Pipeline

Construct a pipeline object, given the set values.

If required fields are not set, raises an error.

Returns

The constructed pipeline.

Return type

Pipeline

property remove_history

True to enable removal of a user’s previous interactions, `False` to disable. Defaults to True.

set_data_from_scenario(scenario: recpack.scenarios.scenario_base.Scenario)

Set the train, validation and test data based by extracting them from the scenario.

set_full_training_data(train_data: recpack.matrix.interaction_matrix.InteractionMatrix)

Set the full_training dataset. This dataset is used for the final training before evaluation on the test dataset.

Parameters

train_data (InteractionMatrix) – The interaction matrix to use for training.

set_optimisation_metric(metric: Union[str, type], K: int, minimise=False)

Set the metric for optimisation of parameters in algorithms.

If the metric is not implemented by default in recpack, you should register it in the METRIC_REGISTRY

Parameters
  • metric (Union[str, type]) – metric name or metric type

  • K (int) – The K value for the metric

  • minimise (bool, optional) – If True minimal value for metric is better, defaults to False

Raises

ValueError – If metric can’t be resolved to a key in the METRIC_REGISTRY.

set_test_data(test_data: Tuple[recpack.matrix.interaction_matrix.InteractionMatrix, recpack.matrix.interaction_matrix.InteractionMatrix])

Set the test datasets.

Test data should be a tuple of InteractionMatrices.

Parameters

test_data (Tuple[InteractionMatrix, InteractionMatrix]) – The tuple of test data, as (test_in, test_out) tuple.

Raises

ValueError – If tuple does not contain two InteractionMatrices.

set_validation_data(validation_data: Tuple[recpack.matrix.interaction_matrix.InteractionMatrix, recpack.matrix.interaction_matrix.InteractionMatrix])

Set the validation datasets.

Validation data should be a tuple of InteractionMatrices.

Parameters

validation_data (Tuple[InteractionMatrix, InteractionMatrix]) – The tuple of validation data, as (validation_in, validation_out) tuple.

Raises

ValueError – If tuple does not contain two InteractionMatrices.

set_validation_training_data(train_data: recpack.matrix.interaction_matrix.InteractionMatrix)

Set the validation training dataset. This dataset is used for training models during parameter optimisation, or for incrementally trained models.

Parameters

train_data (InteractionMatrix) – The interaction matrix to use for training.