recpack.algorithms

The algorithms module in recpack contains a wide array of state-of-the-art collaborative filtering algorithms. Also included are some baseline algorithms, as well as several reusable building blocks such as commonly used loss functions and sampling methods.

Example of use:

from scipy.sparse import csr_matrix
from recpack.algorithms import Random

X = csr_matrix(np.array([[1, 0, 1], [1, 1, 0], [1, 1, 0]]))

# Set hyper-parameter values
algo = Random(K=3)

# Fit algorithm
algo.fit(X)

# Get random recos for each nonzero user
predictions = algo.predict(X)

# Predictions is a csr matrix, inspecting the scores with
predictions.toarray()

Baselines

In recpack, baseline algorithms are algorithms that are not personalized. Use these baselines if you wish to quickly test a pipeline, or for comparison in experiments.

Popularity([K])

Baseline algorithm recommending the most popular items in training data.

Random([K, seed, use_only_interacted_items])

Uniform random algorithm, each item has an equal chance of getting recommended.

Item Similarity Algorithms

Item similarity algorithms exploit relationships between items to make recommendations. At prediction time, the user is represented by the items they have interacted with.

SLIM([l1_reg, l2_reg, fit_intercept, ...])

Implementation of the SLIM model.

ItemKNN([K, similarity, pop_discount, ...])

Item K Nearest Neighbours model.

ItemPNN([K, similarity, pop_discount, ...])

Item Probabilistic Nearest Neighbours model.

NMFItemToItem([num_components, seed])

Computes similarities between items as the similarity between their NMF item embeddings.

SVDItemToItem([num_components, seed])

Computes similarities between items as the similarity between their SVD embeddings.

Prod2Vec([num_components, num_negatives, ...])

Prod2Vec algorithm from the paper: "E-commerce in Your Inbox: Product Recommendations at Scale".

Prod2VecClustered([num_components, ...])

Clustered Prod2Vec implementation outlined in: E-commerce in Your Inbox: Product Recommendations at Scale (https://arxiv.org/abs/1606.07154)

Hybrid Similarity Algorithms

Hybrid similarity algorithms use a combination of user and item similarities to generate recommendations.

KUNN([Ku, Ki])

Unified Nearest Neighbour algorithm combining user and item neighbourhood methods.

Factorization Algorithms

Factorization algorithms factorize the interaction matrix into a user embeddings (U) and item embeddings (V) matrix, that can be user to reconstruct the original interaction matrix R = UV^T.

NMF([num_components, seed, alpha, l1_ratio])

Non negative matrix factorization.

SVD([num_components, seed])

Singular Value Decomposition used as a matrix factorization algorithm.

WeightedMatrixFactorization([...])

WMF Algorithm by Yifan Hu, Yehuda Koren and Chris Volinsky et al.

BPRMF([num_components, lambda_h, lambda_w, ...])

Implements Matrix Factorization by using the BPR-OPT objective and SGD optimization.

Autoencoder Algorithms

Autoencoder algorithms aim to learn a function f, such that X = f(X). More information on autoencoders can be found on Wikipedia

RecVAE([batch_size, max_epochs, ...])

RecVAE Algorithm as first discussed in 'RecVAE: a New Variational Autoencoder for Top-NRecommendations with Implicit Feedback', I.

MultVAE([batch_size, max_epochs, ...])

MultVAE Algorithm as first discussed in 'Variational Autoencoders for Collaborative Filtering', D.

EASE([l2, alpha, density])

Implementation of the EASEr algorithm.

Session-Based Algorithms

GRU4RecNegSampling([num_layers, ...])

A recurrent neural network for session-based recommendations.

GRU4RecCrossEntropy([num_layers, ...])

A recurrent neural network for session-based recommendations.

STAN([K, interaction_decay, session_decay, ...])

Sequence and Time Aware Neighbourhoods algorithm.

SequentialRules([K, max_steps])

Recommends the item that most likely follows a user's last interaction.

Time Aware Algorithms

TARSItemKNN([K, fit_decay, predict_decay, ...])

Framework for time aware variants of the ItemKNN algorithm.

TARSItemKNNDing([K, predict_decay, similarity])

Time aware variant of ItemKNN which uses an exponential decay function at prediction time and cosine similarity.

TARSItemKNNLee([K, w, similarity])

Time aware variant of ItemKNN which uses a hard-coded decay matrix and cosine or pearson similarity.

TARSItemKNNLiu([K, fit_decay, predict_decay])

Time aware variant of ItemKNN which uses an exponential decay function and cosine similarity.

TARSItemKNNLiu2012([K, decay])

Time aware variant of ItemKNN which uses a logarithmic decay function.

TARSItemKNNVaz([K, fit_decay, predict_decay])

Time aware variant of ItemKNN which uses a exponential decay function and pearson similarity.

TARSItemKNNCoocDistance([K, fit_decay, ...])

Framework for time aware variants of ItemKNN that consider the time between two interactions when computing similarity between two items.

TARSItemKNNHermann([K, decay_interval])

Time aware variant of ItemKNN that considers the time between two interactions when computing similarity between two items, as well as the age of an event.

TARSItemKNNXia([K, fit_decay, ...])

Time aware variant of ItemKNN that considers the time between two interactions when computing similarity between two items.

Abstract Base Classes

Recpack algorithm implementations inherit from one of these base classes. These base classes provide the basic building blocks to easily create new algorithm implementations that can be used within the recpack evaluation framework.

For more information on how to create your own recpack algorithm, see Creating your own algorithms.

Algorithm()

Base class for all recpack algorithm implementations.

ItemSimilarityMatrixAlgorithm()

Base algorithm for algorithms that fit an item to item similarity model

TopKItemSimilarityMatrixAlgorithm(K)

Base algorithm for algorithms that fit an item to item similarity model with K similar items for every item

FactorizationAlgorithm([num_components])

Base class for factorization algorithms

TorchMLAlgorithm(batch_size, max_epochs, ...)

Base class for PyTorch algorithms optimized by means of gradient descent/ascent

Stopping Criterion

When creating an algorithm that learns a model iteratively, we need a way to decide which is the best model, and when to stop. The Stopping Criterion module provides this functionality.

StoppingCriterion(loss_function[, minimize, ...])

StoppingCriterion provides a wrapper around any loss function used in the validation stage of an iterative algorithm.

EarlyStoppingException

Raised when Early Stopping condition is met.

Loss Functions

Recommendation models learned iteratively by means of gradient descent (or ascent) require a loss function. in this module you will find some of the most common loss functions that can be used with any TorchMLAlgorithm.

To use these loss functions in a StoppingCriterion, we also provide metric wrappers around the raw loss functions.

covariance_loss(H, W)

Covariance loss.

warp_loss(dist_pos_interaction, ...)

WARP loss

warp_loss_wrapper(X_true, X_pred[, ...])

Metric wrapper around the warp_loss() function.

bpr_loss(positive_sim, negative_sim)

Bayesian Personalized Ranking loss.

bpr_loss_wrapper(X_true, X_pred[, ...])

Wrapper around bpr_loss() function for use with recpack.algorithms.stopping_criterion.StoppingCriterion.

vae_loss(reconstructed_X, mu, logvar, X[, ...])

VAE loss function for use with Auto Encoders.

bpr_max_loss(positive_scores, negative_scores)

Bayesian Personalized Ranking Max Loss.

top1_loss(positive_scores, negative_scores)

TOP1 Loss.

top1_max_loss(positive_scores, negative_scores)

TOP1 Max Loss.

Samplers

In multiple recommendation algorithms (e.g. BPRMF) sampling methods play an important role. As such recpack contains a number of commonly used sampling methods.

PositiveNegativeSampler([num_negatives, ...])

Samples linked positive and negative interactions for users.

BootstrapSampler([num_negatives, ...])

Sampler that samples positives with replacement.

WarpSampler([num_negatives, batch_size, exact])

Samples num_negatives negatives for each positive.

SequenceMiniBatchSampler(pad_token[, batch_size])

Samples batches of user, input sequences.

SequenceMiniBatchPositivesTargetsNegativesSampler(...)

Samples num_negatives negatives for every positive in a sequence.

Utility Functions

The util module contains a number of utility functions used across algorithms. Use these to simplify certain tasks (such as batching) when creating a new algorithm.

get_batches(iterable[, batch_size])

Get batches from an iterable.

sample_rows(*args[, sample_size])

Samples rows from the matrices

naive_sparse2tensor(data)

Naively converts sparse csr_matrix to torch Tensor.

naive_tensor2sparse(tensor)

Converts torch Tensor to sparse csr_matrix.