class recpack.algorithms.GRU4RecNegSampling(num_layers: int = 1, hidden_size: int = 100, num_components: int = 250, dropout: float = 0.0, loss_fn: str = 'bpr', optimization_algorithm: str = 'adagrad', momentum: float = 0.0, clipnorm: float = 1.0, bptt: int = 1, num_negatives: int = 50, batch_size: int = 512, max_epochs: int = 5, learning_rate: float = 0.03, stopping_criterion: str = 'recall', stop_early: bool = False, max_iter_no_change: int = 5, min_improvement: float = 0.0, seed: Optional[int] = None, save_best_to_file: bool = False, keep_last: bool = False, predict_topK: Optional[int] = None, validation_sample_size: Optional[int] = None)

A recurrent neural network for session-based recommendations.

The algorithm, also known as GRU4Rec, was introduced in the 2016 and 2018 papers “Session-based Recommendations with Recurrent Neural Networks” and “Recurrent Neural Networks with Top-k Gains for Session-based Recommendations”

This version implements the Negative Sampling variant of the algorithm. For cross-entropy, see GRU4RecCrossEntropy.

The algorithm makes recommendations by training a recurrent neural network to predict the next action of a user, and using the most likely next actions as recommendations. At the heart of it is a Gated Recurrent Unit (GRU), a recurrent network architecture that is able to form long-term memories.

Predictions are made by processing a user’s actions so far one by one, in chronological order:

0 --> [ GRU ] --> [ GRU ] --> [ GRU ]
         |           |           |
       iid_0       iid_1       iid_2

here ‘iid’ are item ids, which can represent page views, purchases, or some other action. The GRU builds up a memory of the actions so far and predicts what the next action will be based on what other users with similar histories did next. While originally devised to make recommendations based on (often short) user sessions, the algorithm can be used with long user histories as well.

For the mathematical details of GRU see “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling” by Chung et al.

Note: Cross-Entropy loss was mentioned in the paper, but omitted for implementation reasons.

  • num_layers (int, optional) – Number of hidden layers in the RNN. Defaults to 1

  • hidden_size (int, optional) – Number of neurons in the hidden layer(s). Defaults to 100

  • num_components (int, optional) – Size of item embeddings. Defaults to 250

  • dropout (float) – Dropout applied to embeddings and hidden layer(s). Defauls to 0 (no dropout)

  • loss_fn (str, optional) – Loss function. One of “top1”, “top1-max”, “bpr”, “bpr-max”. Defaults to “bpr”

  • optimization_algorithm (str, optional) – Gradient descent optimizer, one of “sgd”, “adagrad”. Defaults to “adagrad”

  • momentum (float, optional) – Momentum when using the sgd optimizer. Defaults to 0.0

  • clipnorm (float, optional) – Clip the gradient’s l2 norm, None for no clipping. Defaults to 1.0

  • bptt (int, optional) – Number of backpropagation through time steps. Defaults to 1

  • num_negatives (int, optional) – Number of negatives to sample for every positive. Defaults to 50

  • batch_size (int, optional) – Number of examples in a mini-batch. Defaults to 512.

  • max_epochs (int, optional) – Max training runs through entire dataset. Defaults to 5

  • learning_rate (float, optional) – Gradient descent initial learning rate Defaults to 0.03

  • stopping_criterion (str, optional) – Name of the stopping criterion to use for training. For available values, check recpack.algorithms.stopping_criterion.StoppingCriterion.FUNCTIONS

  • stop_early (bool, optional) – If True, early stopping is enabled, and after max_iter_no_change iterations where improvement of loss function is below min_improvement the optimisation is stopped, even if max_epochs is not reached. Defaults to False

  • max_iter_no_change (int, optional) – If early stopping is enabled, stop after this amount of iterations without change. Defaults to 5

  • min_improvement (float, optional) – If early stopping is enabled, no change is detected, if the improvement is below this value. Defaults to 0.0

  • seed (int, optional) – Seed to the randomizers, useful for reproducible results, defaults to None

  • save_best_to_file (bool, optional) – If true, the best model will be saved after training, defaults to False

  • keep_last (bool, optional) – Retain last model, rather than best (according to stopping criterion value on validation data), defaults to False

  • predict_topK (int, optional) – The topK recommendations to keep per row in the matrix. Use when the user x item output matrix would become too large for RAM. Defaults to None, which results in no filtering.

  • validation_sample_size (int, optional) – Amount of users that will be sampled to calculate validation loss and stopping criterion value. This reduces computation time during validation, such that training times are strongly reduced. If None, all nonzero users are used. Defaults to None.


fit(X, validation_data)

Fit the parameters of the model.


Get parameters for this estimator.


Load torch model from file.


Predicts scores, given the interactions in X


Save the current model to disk.


Set the parameters of the estimator.



Name of the file at which save(self) will write the current best model.


Name of the object.


Name of the object's class.

property filename

Name of the file at which save(self) will write the current best model.

fit(X: Union[recpack.matrix.interaction_matrix.InteractionMatrix, scipy.sparse._csr.csr_matrix], validation_data: Tuple[Union[recpack.matrix.interaction_matrix.InteractionMatrix, scipy.sparse._csr.csr_matrix], Union[recpack.matrix.interaction_matrix.InteractionMatrix, scipy.sparse._csr.csr_matrix]]) recpack.algorithms.base.TorchMLAlgorithm

Fit the parameters of the model.

Interaction Matrix X will be used for training, the validation data tuple will be used to compute the evaluate scores.

This function provides the generic framework for training a PyTorch algorithm, such that each child class only needs to implement the _transform_fit_input(), _init_model(), _train_epoch() and _evaluate() functions.

The function will:

  • Transform input data to the expected types

  • Initialize the model using _init_model()

  • Iterate for each epoch until max epochs, or when early stopping conditions are met.

    • Training step using _train_epoch()

    • Evaluation step using _evaluate()

Once the model has been fit, the best model is stored to disk, if specified during init.


self, fitted algorithm

Return type



Get parameters for this estimator.


deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.


params – Parameter names mapped to their values.

Return type


property identifier

Name of the object.

Name is made by combining the class name with the parameters passed at construction time.

Constructed by recreating the initialisation call. Example: Algorithm(param_1=value)


Load torch model from file.


filename (str) – File to load the model from

property name

Name of the object’s class.

predict(X: Union[recpack.matrix.interaction_matrix.InteractionMatrix, scipy.sparse._csr.csr_matrix]) scipy.sparse._csr.csr_matrix

Predicts scores, given the interactions in X

Recommends items for each nonzero user in the X matrix.

This function is a wrapper around the _predict() method, and performs checks on in- and output data to guarantee proper computation.

  • Checks that model is fitted correctly

  • checks the output using _check_prediction() function


X (Matrix) – interactions to predict from.


The recommendation scores in a sparse matrix format.

Return type



Save the current model to disk.

filename of the file to save model in is defined by the filename property.


Set the parameters of the estimator.


params (dict) – Estimator parameters