recpack.pipelines.PipelineBuilder

class recpack.pipelines.PipelineBuilder(folder_name: Optional[str] = None, base_path: Optional[str] = None)

Builder to facilitate construction of pipelines.

The builder contains functions to set specific values for the pipeline. Save and Load make it possible to easily recreate pipelines.

To disable history filtering in the pipeline, set the remove_history attribute to False.:

pipeline_builder.remove_history = False

Parameters

folder_name (str, optional) – The name of the folder where pipeline information will be stored. If no name is specified, the timestamp of creation is used.
base_path (str, optional) – The base_path to store pipeline in, defaults to the current working directory.

Methods

`add_algorithm`(algorithm[, grid, params, ...])	Add an algorithm to use in the pipeline.
`add_metric`(metric[, K])	Register a metric to evaluate
`add_post_filter`(filter)	Add a filter which will be applied
`build`()	Construct a pipeline object, given the set values.
`set_data_from_scenario`(scenario)	Set the train, validation and test data based by extracting them from the scenario.
`set_full_training_data`(train_data)	Set the full_training dataset.
`set_optimisation_metric`(metric, K[, minimise])	Set the metric for optimisation of parameters in algorithms.
`set_test_data`(test_data)	Set the test datasets.
`set_validation_data`(validation_data)	Set the validation datasets.
`set_validation_training_data`(train_data)	Set the validation training dataset.

Attributes

remove_history

True to enable removal of a user's previous interactions, `False` to disable.

add_algorithm(algorithm: Union[str, type], grid: Optional[Dict[str, List]] = None, params: Optional[Dict[str, Any]] = None, optimisation_info: Optional[recpack.pipelines.hyperparameter_optimisation.OptimisationInfo] = None)

Add an algorithm to use in the pipeline.

If the algorithm is not implemented by default in recpack, you should register it in the ALGORITHM_REGISTRY

Parameters

algorithm (Union[str, type]) – Algorithm class name or type of the algorithm to add.
grid (Dict[str, List], optional) – [DEPRECATED] Parameters to optimise, the dict will be turned into a grid such that each combination of values is used. Defaults to None
params (Dict[str, Any], optional) – The fixed parameters for running the algorithm, represented as a key-value dictionary. Defaults to None
optimisation_info (OptimisationInfo) – Optimisation info, contains information for the optimiser to define the parameter space.

Raises

ValueError – If the passed algorithm can’t be resolved to a key in the ALGORITHM_REGISTRY.

add_metric(metric: Union[str, type], K: Optional[Union[List, int]] = None)

Register a metric to evaluate

Parameters

metric (Union[str, type]) – Metric name or type.
K (Optional[Union[List, int]], optional) – The K value(s) used to construct metrics. If it is a list, for each value a metric is added.

Raises

ValueError – If metric can’t be resolved to a key in the METRIC_REGISTRY.

add_post_filter(filter: recpack.postprocessing.filters.PostFilter) → None

Add a filter which will be applied: on the recommendation scores before prediction.

Parameters: filter (PostFilter) – Filter to apply, cannot be of type RemoveHistory

build() → recpack.pipelines.pipeline.Pipeline

Construct a pipeline object, given the set values.

If required fields are not set, raises an error.

Returns: The constructed pipeline.
Return type: Pipeline

property remove_history: True to enable removal of a user’s previous interactions, `False` to disable. Defaults to True.

set_data_from_scenario(scenario: recpack.scenarios.scenario_base.Scenario): Set the train, validation and test data based by extracting them from the scenario.

set_full_training_data(train_data: recpack.matrix.interaction_matrix.InteractionMatrix)

Set the full_training dataset. This dataset is used for the final training before evaluation on the test dataset.

Parameters: train_data (InteractionMatrix) – The interaction matrix to use for training.

set_optimisation_metric(metric: Union[str, type], K: int, minimise=False)

Set the metric for optimisation of parameters in algorithms.

If the metric is not implemented by default in recpack, you should register it in the METRIC_REGISTRY

Parameters

metric (Union[str, type]) – metric name or metric type
K (int) – The K value for the metric
minimise (bool, optional) – If True minimal value for metric is better, defaults to False

Raises

ValueError – If metric can’t be resolved to a key in the METRIC_REGISTRY.

set_test_data(test_data: Tuple[recpack.matrix.interaction_matrix.InteractionMatrix, recpack.matrix.interaction_matrix.InteractionMatrix])

Set the test datasets.

Test data should be a tuple of InteractionMatrices.

Parameters: test_data (Tuple[InteractionMatrix, InteractionMatrix]) – The tuple of test data, as (test_in, test_out) tuple.
Raises: ValueError – If tuple does not contain two InteractionMatrices.

set_validation_data(validation_data: Tuple[recpack.matrix.interaction_matrix.InteractionMatrix, recpack.matrix.interaction_matrix.InteractionMatrix])

Set the validation datasets.

Validation data should be a tuple of InteractionMatrices.

Parameters: validation_data (Tuple[InteractionMatrix, InteractionMatrix]) – The tuple of validation data, as (validation_in, validation_out) tuple.
Raises: ValueError – If tuple does not contain two InteractionMatrices.

set_validation_training_data(train_data: recpack.matrix.interaction_matrix.InteractionMatrix)

Set the validation training dataset. This dataset is used for training models during parameter optimisation, or for incrementally trained models.

Parameters: train_data (InteractionMatrix) – The interaction matrix to use for training.