.. _guides-algorithms:

Creating your own algorithms
==============================

Recpack's goal is to make it easy to set up an offline
recommendation experiment to evaluate the merits of a recommendation algorithm.
That is why we've made it super easy to create your own algorithm from scratch,
or adapt from one of the many algorithm implementations included. 

We have created a number of base classes that provide the necessary 
plumbing for your algorithm to fit within the recpack evaluation framework
so that you can focus on what really matters: the algorithm itself!

We explain when and how to use these base classes by means of four example algorithms. 

After reading through these examples you should be able to 

- Know which base class to pick for a new algorithm
- Which functions to overwrite to implement the algorithm

.. _guides-algorithms-pop-softmax:

Randomized Softmax Popularity Algorithm
---------------------------------------

Description
^^^^^^^^^^^

The Softmax Popularity Algorithm samples recommended items relative to a "temperature weighted item popularity".

.. math:: 

    P(i) = \frac{exp(q(i)/\tau)}{\sum\limits_{j \in I} exp(q(i)/\tau)}

Where :math:`q(i)` is the popularity score of item i, computed as the log of the amount of times the item is visited
and :math:`\tau` is the temperature parameter.

With a temperature of 1, we end up with the standard softmax function. 
For a temperature of 0 we always choose the best action, whereas high values of 
temperature starts to resemble uniformly random sampling. 
For more information on the softmax function, check out `Wikipedia <https://en.wikipedia.org/wiki/Softmax_function>`_.

We limit our recommendations to the K most popular items, to avoid ever recommending a very unpopular item.

At prediction time, recommendations are sampled according to their probability (as defined by the softmax). 
The algorithm can thus be considered randomized or stochastic: subsequent calls of the ``predict`` method will
not necessarily yield the same results. 

Implementation
^^^^^^^^^^^^^^

We start by selecting a base class as parent. 
We have no need for PyTorch, so :class:`recpack.algorithms.base.TorchMLAlgorithm` 
is easily discarded.
Likewise, this is neither a factorization nor item-similarity algorithm, 
so we use the base class of base classes: :class:`recpack.algorithms.base.Algorithm`.

Even this base class already implements quite a bit:

- ``fit(X)`` provides a wrapper around the ``_fit`` function we need to implement
  and handles type management and some additional checks.
- ``predict(X)`` provides a wrapper around the ``_predict`` function we need to implement.
- ``_transform_predict_input(X)`` and ``_transform_fit_input(X)`` are used by ``fit`` and ``predict`` 
  to convert their input matrices (X) into the required types. By default, this base class 
  transform the data into a csr_matrix, which suits our purpose perfectly as we have no need 
  of timestamps.
- ``_check_fit_complete()`` is called at the end of the ``fit`` method to make sure 
  fitting was successful.
- ``_check_prediction(X_pred, X)`` is called at the end of ``predict``,
  to make sure the output predicted is as expected. If not it will log a warning.

All we really need to do is thus implement ``__init__`` to set the hyper-parameters,
``_fit`` and ``_predict``.

__init__
""""""""

Let's get started, and define our class, add a docstring for reference and
implement the ``__init__`` function.

Our algorithm has two hyper parameters:

- ``K``: The number of items that can be considered for recommendations.
- ``tau``: The temperature parameter. 

::

    import numpy as np
    from scipy.sparse import csr_matrix

    from recpack.algorithms.base import Algorithm

    class RandomizedSoftmaxPopularity(Algorithm):
        """Recommend items using softmax on the natural logarithm of item counts.
        
        Recommendations are sampled from the probability distribution
        created by taking the softmax of the natural logarithm of item counts. 
        Items are scored such that the distance between the item in first place
        and the item in second place is the same as between all other items.
        
        :param K: Only the K most frequent items are considered for recommendation
        :param tau: Temperature in the softmax computation
        """
        def __init__(self, K, tau):
            self.K = K
            self.tau = tau

_fit
""""

Next we implement the ``_fit`` function. 
Our input is the matrix of interactions considered for training. 
We compute the natural logarithm of the number of times users interacted 
with the item, then take the softmax of the K most popular items. 

::

    def _fit(self, X: csr_matrix):
        # compute pop by taking logarithm of the raw counts
        # A1 puts it into a 1d array, making all subsequent operations easy
        pop = np.log(np.sum(X, axis=0)).A1
        
        max_pop = np.max(pop)
        
        # Cut to top K
        self.top_k_pop_items_ = np.argsort(pop)[-self.K:]
        top_k_pop = pop[self.top_k_pop_items_]

        # To make softmax numerically stable, we compute exp((pop - max(pop))/self.tau)
        # instead of exp(pop/self.tau):
        # 
        # softmax for item i can then be computed as 
        # e^((pop[i] - max(pop))/tau) / sum([e^(pop[j] - max(pop))/self.tau for j in topK])
        top_k_pop_minus_max = (top_k_pop - max_pop)/self.tau
        
        top_k_exp = np.exp(top_k_pop_minus_max)
        
        top_k_pop_sum = np.sum(top_k_exp)
        
        self.softmax_scores_ = top_k_exp / top_k_pop_sum

After fitting, the model is ready for prediction.

_predict
""""""""

Finally we implement ``_predict``.
Here we sample recommendations for each user with at least one interaction
in the matrix of interactions. 
Remember that sampling probabilities were stored in ``softmax_scores_`` during fitting.

::

    def _predict(self, X:csr_matrix):
        # Randomly sample items, with weights decided by the softmax scores
        users = X.nonzero()[0]

        # The resulting score = (K - ix)/K
        # The first sampled item gets score 1, and the last sampled item score 1/K
        score_list = [
            (u, i, (self.K-ix)/self.K)
            for u in set(users)
            for ix, i in enumerate(
                np.random.choice(
                    self.top_k_pop_items_,
                    size=self.K,
                    replace=False,
                    p=self.softmax_scores_
                )
            )
        ]
        user_idxs, item_idxs, scores = list(zip(*score_list))
        score_matrix = csr_matrix((scores, (user_idxs, item_idxs)), shape=X.shape)

        return score_matrix

This algorithm can now be used in evaluation pipelines 
just like any other algorithm already available in recpack!

.. _guides-algorithms-recency:

Recency
---------

Description
^^^^^^^^^^^^

Next we create an algorithm that recommends the items that
have been interacted with most recently. 
This algorithm can be considered a baseline, as it is not personalized.


Implementation
^^^^^^^^^^^^^^

Again, we start from :class:`recpack.algorithms.base.Algorithm`.
This new algorithm is different from :ref:`guides-algorithms-pop-softmax` in that 
it needs the time of interaction to be able to make recommendations.
Thankfully, the recpack data format :class:`recpack.data.matrix.InteractionMatrix`
has a ``timestamps`` attribute that stores the time of interaction. 

Our algorithm has no hyperparameters, so we have no use for an ``__init__`` method. 

_transform_fit_input
""""""""""""""""""""

To make sure we receive a :class:`recpack.data.matrix.InteractionMatrix` at fitting time, 
we update ``_transform_fit_input``.

::

    import numpy as np
    from scipy.sparse import csr_matrix, lil_matrix

    from recpack.algorithms.base import Algorithm
    from recpack.data.matrix import InteractionMatrix

    class Recency(Algorithm):
        def _transform_fit_input(self, X):
            # X needs to be an InteractionMatrix for us to have access to
            # the time of interaction at fitting time
            self._assert_is_interaction_matrix(X)
            # X needs to have timestamps available
            self._assert_has_timestamps(X)
            # No transformation needed
            return X

_fit
"""""

Now that we have asserted that ``_fit`` receives an object of type :class:`recpack.data.matrix.InteractionMatrix`,
we fit our algorithm by extracting for each item, its most recent time of interaction.
We then scale this to the interval [0, 1] using minmax normalisation. 

::

    def _fit(self, X:InteractionMatrix):
        # X.timestamps gives a pandas MultiIndex object, indexed by user and item,
        # we drop the index, and group by just the item index
        # then we select the maximal timestamp from this groupby
        max_ts_per_item = X.timestamps.reset_index().groupby('iid')['ts'].max()

        # apply min_max normalisation
        recency = np.zeros(X.shape[1])
        recency[max_ts_per_item.index] = max_ts_per_item.values
        
        most_recent = np.max(recency)
        least_recent = np.min(recency)
        
        recency = (recency - least_recent) / (most_recent - least_recent)
        self.recency_ = recency.copy()

At fitting time, the base class' ``fit`` method calls both ``_transform_fit_input`` and 
``_fit``.
The model is then ready for use, with attribute ``self.recency_`` which contains the recommendation
scores per item.

_predict
"""""""""

Prediction is now easy: for each nonzero user in the input matrix
we set the item's score equal to the recency score we computed in ``_fit``.

::

    def _predict(self, X: csr_matrix):
        results = lil_matrix(X.shape)
        
        users = get_users(X)
        
        results[users] = self.recency_
        
        return results.tocsr()

Here we go, another algorithm ready for use in evaluation!

.. _guides-algorithms-svd:

Singular Value Decomposition
------------------------------

Description
^^^^^^^^^^^^

Let's now implement SVD, a well-known matrix factorization algorithm.
Singular Value Decomposition decomposes a matrix of interactions into three matrices which
when multiplied together approximately reconstruct the original matrix , :math:`X = U \times \Sigma \times V`.
If matrix :math:`X` is of shape ``(|users| x |items|)``,
then :math:`U` is of shape ``(|users| x num_components)``,
:math:`\Sigma` is a ``(num_components x num_components)`` matrix,
and finally :math:`V` is a ``(num_components x |items|)`` matrix.

Implementation
^^^^^^^^^^^^^^^

Rather than implement the SVD computation ourselves, 
we adapt the optimised TruncatedSVD implementation in sklearn
so that it matches the recpack interface.

As the name suggests, it makes sense to use :class:`recpack.algorithms.base.FactorizationAlgorithm`
as base class in this example.
In addition to the methods implemented in :class:`recpack.algorithms.base.Algorithm` 
which we have highlighted in :ref:`guides-algorithms-pop-softmax`, this class provides:

- ``_predict`` generates recommendations by multiplying the user embeddings of nonzero users with all item embeddings.
- ``_check_fit_complete`` performs an additional check on the dimensions of the embeddings 

All that remains for us to implement is ``__init__`` 
to set hyperparameters and ``_fit`` to compute the embeddings.

__init__
"""""""""

For simplicity we use only one hyperparameter: ``num_components``, which defines the dimension of the embedding.
We also add a parameter ``random_state``, also a parameter of ``TruncatedSVD``, to ensure reproducibility.

.. warning:: 
    The random_state parameter should not be considered a hyperparameter, i.e. we 
    should not perform a parameter search to determine its optimal value.

::

    import numpy as np
    from scipy.sparse import csr_matrix, lil_matrix, diags
    from sklearn.decomposition import TruncatedSVD

    from recpack.algorithms.base import FactorizationAlgorithm

    class SVD(FactorizationAlgorithm):
        """Singular Value Decomposition as dimension reduction recommendation algorithm.

        SVD computed using the TruncatedSVD implementation from sklearn.
        U x Sigma x V = X
        U are the user features, and the item features are computed as Sigma x V.

        :param num_components: The size of the latent dimension
        :type num_components: int

        :param random_state: The seed for the random state to allow for comparison
        :type random_state: int
        """

        def __init__(self, num_components=100, random_state=42):
            super().__init__(num_components=num_components)

            self.random_state = random_state

_fit
"""""

In ``_fit`` we initialize an object of type TruncatedSVD.
For simplicity's sake we expose only ``num_components`` in our algorithm.
All other hyperparameter are left at their default values.

SVD decomposes the matrix into three matrices, while the 
:class:`recpack.algorithms.base.FactorizationAlgorithm` class expects only two: 
a user and item embedding.
Therefore we take the item embedding to be the product of :math:`\Sigma` and :math:`V`. 
Since :math:`\Sigma` is a square matrix this does not change the matrix dimension:
:math:`\Sigma \times V` is still a ``(num_components x |items|)`` matrix. 

::

    def _fit(self, X: csr_matrix):
        model = TruncatedSVD(
            n_components=self.num_components, n_iter=7, random_state=self.random_state
        )
        # Factorization computes U x Sigma x V
        # U are the user features,
        # Sigma x V are the item features.
        self.user_embedding_ = model.fit_transform(X)

        V = model.components_
        sigma = diags(model.singular_values_)
        self.item_embedding_ = sigma @ V

        return self


This concludes the modification of the TruncatedSVD algorithm for use in recpack!

.. _guides-algorithms-silly-mf:

SillyMF (Gradient Descent Algorithm)
--------------------------------------

Description
^^^^^^^^^^^^

In this example we implement a very silly, iterative matrix factorization algorithm in PyTorch. 
It is by no means sophisticated or even guaranteed to converge, 
but serves well for our illustration purposes.

The model learns the weights of a matrix factorization of the initial matrix :math:`X` as 
:math:`X = U \times V^T`.

Implementation
^^^^^^^^^^^^^^^
Because we are now dealing with an algorithm optimised
by means of gradient descent, it makes sense to use :class:`recpack.algorithms.base.TorchMLAlgorithm`
as base class in this example.
This base class comes with quite a bit more plumbing that the others:

- ``_predict`` generates recommendations by calling ``_batch_predict`` for batches of users (to keep the memory footprint low).
- ``_check_fit_complete`` performs an additional check of the dimensions of the embeddings.
- ``_check_prediction`` makes sure predictions were made for all nonzero users.
- ``fit(X, validation_data)`` performs a number of training epochs, each followed by an evaluation step on the full dataset. 
  Unlike the other base classes, it now takes an additional ``validation_data`` argument to perform this evaluation step.
- ``save`` saves the current PyTorch model to disk.
- ``load`` loads a PyTorch model from file.
- ``filename`` generates a unique filename for the current best model.
- ``_transform_predict_input`` transforms the input matrix to a ``csr_matrix`` by default.
- ``_transform_fit_input`` transforms the input matrices to a ``csr_matrix`` by default.
- ``_evaluate`` performs one evaluation step, which consists of making predictions.
  for the validation data and subsequently updating the stopping criterion.
- ``_load_best`` loads the best model encountered during training as the final model used to make predictions. 
- ``_save_best`` saves the best model encountered during training to a temporary file.

Which leaves ``__init__``, ``_init_model``, ``_train_epoch``, ``my_loss`` and ``_batch_predict``
to be implemented, as well as the actual PyTorch ``nn.Module`` that is the PyTorch model.

MFModule
""""""""

First we create a PyTorch model that encodes this factorization. 
The ``forward`` method is also used to make recommendations at prediction time.

::

    from typing import List

    import numpy as np
    from scipy.sparse import csr_matrix, lil_matrix
    import torch
    import torch.optim as optim
    import torch.nn as nn

    from recpack.algorithms.base import TorchMLAlgorithm
    from recpack.algorithms.stopping_criterion import StoppingCriterion


    class MFModule(nn.Module):
        """MF torch module, encodes the embeddings and the forward functionality.

        :param num_users: the amount of users
        :type num_users: int
        :param num_items: the amount of items
        :type num_items: int
        :param num_components: The size of the embedding per user and item, defaults to 100
        :type num_components: int, optional
        """

        def __init__(self, num_users, num_items, num_components=100):
            super().__init__()

            self.num_components = num_components
            self.num_users = num_users
            self.num_items = num_items

            self.user_embedding = nn.Embedding(num_users, num_components)  # User embedding
            self.item_embedding = nn.Embedding(num_items, num_components)  # Item embedding

            self.std = 1 / num_components ** 0.5
            # Initialise embeddings to a random start
            nn.init.normal_(self.user_embedding.weight, std=self.std)
            nn.init.normal_(self.item_embedding.weight, std=self.std)

        def forward(
            self, user_tensor: torch.Tensor, item_tensor: torch.Tensor
        ) -> torch.Tensor:
            """
            Compute dot-product of user embedding (w_u) and item embedding (h_i)
            for every user and item pair in user_tensor and item_tensor.

            :param user_tensor: [description]
            :type user_tensor: [type]
            :param item_tensor: [description]
            :type item_tensor: [type]
            """
            w_u = self.user_embedding(user_tensor)
            h_i = self.item_embedding(item_tensor)

            return w_u.matmul(h_i.T)

__init__
"""""""""

Next, we create the actual :class:`recpack.algorithms.base.TorchMLAlgorithm`.
The ``__init__`` of any TorchMLAlgorithm expects at least the following
default hyperparameters to be defined:

- ``batch_size`` which dictactes how many users make up a training batch.
- ``max_epochs`` which defines the maximum number of training epochs to run.
- ``learning_rate`` which determines how much to change the model with every update.
- ``stopping_criterion`` to define how to evaluate the model's performance, and if and when to stop early.

Other optional hyperparameters in the baseclass can be found in the documentation.

We define one additional hyperparameter:

- ``num_components`` which is the dimension of our embeddings for both users and items.

For the sake of example we use a fixed random seed. 
The random seed is set to guarantee reproducibility of results. 

As :class:`recpack.algorithms.stopping_criterion.StoppingCriterion` we use `Recall`, by default computed on the top 50 recommendations.
By default, early stopping is disabled.

::

    class SillyMF(TorchMLAlgorithm):
        def __init__(self, batch_size, max_epochs, learning_rate, num_components=100):
            super().__init__(
                batch_size=batch_size, 
                max_epochs=max_epochs,
                learning_rate=learning_rate,
                stopping_criterion="recall",
                seed=42
            )
            self.num_components = num_components


_init_model
""""""""""""

Next we implement ``_init_model``. We cannot initialize ``MFModule`` as part of SillyMF's
``__init__``, because at this stage, we're unaware of the dimensions of the interaction matrix.

We then define the optimizer.
Here we use simple SGD, but any PyTorch optimizer can be used.

::
            
    def _init_model(self, X:csr_matrix):
        num_users, num_items = X.shape
        self.model_ = MFModule(
            num_users, num_items, num_components=self.num_components
        ).to(self.device)
        
        # We'll use a basic SGD optimiser
        self.optimizer = optim.SGD(self.model_.parameters(), lr=self.learning_rate)
        self.steps = 0
            
_train_epoch
"""""""""""""

Next we implement training.
First, we need to define a loss function to indicate how well our 
current embeddings are able to perform at the task we set.
As mentioned in the description, this task is to reconstruct the matrix :math:`X`.
Our loss function computes the average of the absolute error between :math:`U \times V` 
and the original matrix :math:`X` per user.

.. note::
    For an overview of commonly used loss functions, 
    check out `PyTorch <https://pytorch.org/docs/stable/nn.html#loss-functions>`_ loss functions.
            
::

    def my_loss(true_sim, predicted_sim):
        """Computes the total absolute error from predicted compared to true, 
        and averages over all users
        """
        return torch.mean(torch.sum(torch.abs(true_sim - predicted_sim), axis=1))

Now we can continue with ``_train_epoch``. 
Every epoch loops through the entire dataset (most often in batches).
For every batch, the loss and resulting gradients are computed and the embeddings are updated. 

::

    def _train_epoch(self, X):
        losses = []
        item_tensor = torch.arange(X.shape[1]).to(self.device)
        for users in get_batches(get_users(X), batch_size=self.batch_size):
            self.optimizer.zero_grad()
            user_tensor = torch.LongTensor(users).to(self.device)
            scores = self.model_.forward(user_tensor, item_tensor)
            expected_scores = naive_sparse2tensor(X[users])
            loss = my_loss(expected_scores, scores)
            
            # Backwards propagation of the loss
            loss.backward()
            losses.append(loss.item())
            # Update the weight according to the gradients.
            # All automated thanks to torch.
            self.optimizer.step()
            self.steps += 1
        return losses


_batch_predict
"""""""""""""""

Now we can move on to the final step of our implementation: prediction.
Predicting items is the same here as in :ref:`guides-algorithms-svd`: 
the user embeddings of nonzero users are multiplied with item embeddings.

As mentioned in the definition of ``MFModule``, ``MFModule.forward`` is used to make predictions.
This method takes a PyTorch ``Tensor`` of userids and a ``Tensor`` of itemids as inputs.
It then computes the matrix multiplication of its embeddings.

::

    def _batch_predict(self, X: csr_matrix, users: List[int] = None) -> np.ndarray:
        """Predict scores for matrix X, given the selected users.

        If there are no selected users, you can assume X is a full matrix,
        and users can be retrieved as the nonzero indices in the X matrix.

        :param X: Matrix of user item interactions
        :type X: csr_matrix
        :param users: users selected for recommendation
        :type users: List[int]
        :return: dense matrix of scores per user item pair.
        :rtype: np.ndarray
        """
        X_pred = lil_matrix(X.shape)
        if users is None:
            users = get_users(X)

        # Turn the np arrays and lists to torch tensors
        user_tensor = torch.LongTensor(users).to(self.device)
        item_tensor = torch.arange(X.shape[1]).to(self.device)

        X_pred[users] = self.model_(user_tensor, item_tensor).detach().cpu().numpy()
        return X_pred.tocsr()

And that's all for implementing SillyMF!

.. _guides-algorithms-use-pipeline:

Compare your algorithm to the state of the art
----------------------------------------------

Now that you have learned how to create your own algorithm, you obviously want to know how well it performs compared to state of the art recommendation algorithms.
Recpack provides pipeline functionality, which simplifies running experiments as well as making them reproducible.

Because we want you to use your own algorithms with the recpack pipelines, we have made it easy to set up a pipeline with your own algorithm.

The first (and only) step to using a new algorithm is to make sure it is registered in the :class:`recpack.pipelines.ALGORITHM_REGISTRY`.
Registering a new algorithm is done using the :attr:`register` function. This function takes two arguments: the name of the algorithm to register and the class.

::

    from recpack.pipelines import ALGORITHM_REGISTRY

    ALGORITHM_REGISTRY.register(SillyMF.__name__, SillyMF)

Once the algorithm is registered, you can use it when constructing a pipeline.
As an example we will compare the SillyMF algorithm to an ItemKNN algorithm, and the EASE algorithm.

::

    from recpack.data.datasets import MovieLens25M
    from recpack.pipelines import PipelineBuilder
    from recpack.splitters.scenarios import StrongGeneralization
    from recpack.pipelines import ALGORITHM_REGISTRY

    ALGORITHM_REGISTRY.register(SillyMF.__name__, SillyMF) 

    # Get data to test on
    dataset = MovieLens25M("data/ml25.csv", use_default_filters=False)
    # This will apply default preprocessing
    im = dataset.load_interaction_matrix()

    # Data splitting scenario
    scenario = StrongGeneralization(frac_users_train=0.7, frac_interactions_in=0.8, validation=True)

    # Construct our pipeline object
    pipeline_builder = PipelineBuilder()

    pipeline_builder.set_train_data(scenario.training_data)
    pipeline_builder.set_test_data(scenario.test_data)
    pipeline_builder.set_validation_data(scenario.validation_data)

    # Add the baseline algorithms
    # Grid parameters will be optimised using grid search before final evaluation
    pipeline_builder.add_algorithm('ItemKNN', grid={'K': [100, 200, 400, 800]})
    pipeline_builder.add_algorithm('EASE', grid={'l2': [10, 100, 1000], 'alpha': [0, 0.1, 0.5]})

    # Add our new algorithm
    # Optimising learning rate and num_components
    # setting fixed values for max_epochs and batch_size
    pipeline_builder.add_algorithm(
        'SillyMF',
        grid={
            'learning_rate': [0.1, 0.01, 0.3], 
            'num_components': [100, 200, 400]
        },
        params={
            'max_epochs': 5,
            'batch_size': 1024
        }
    )

    # Add NDCG and Recall to be evaluated at 10, 20, 50 and 100
    pipeline_builder.add_metric('NDCGK', [10, 20, 50, 100])
    pipeline_builder.add_metric('RecallK', [10, 20, 50, 100])

    # Set the optimisation metric, this metric will be used to select the best values from grid for each algorithm
    pipeline_builder.set_optimisation_metric('RecallK', 20)

    # Construct pipeline
    pipeline = pipeline_builder.build()

    # Run pipeline, will first do optimisation, and then evaluation
    pipeline.run()

    # Get the metric results.
    # This will be a dict with the results of the run.
    # Turning it into a dataframe makes reading easier
    pd.DataFrame.from_dict(pipeline.get_metrics()).T

And there you have it, hopefully the new algorithm is better than the baseline algorithms!
For more information on how to use pipelines, see :class:`recpack.pipelines`.