recpack.scenarios

The scenarios module contains many of the most commonly encountered evaluation scenarios in recommendation.

A scenario consists of a training and test dataset, and sometimes also a validation dataset. Both validation and test dataset are made up of two components: a fold-in set of interactions that is used to predict another held-out set of interactions.

Each scenario describes a complex situation, e.g. “Train on all user interactions before time T, predict interactions after T+10 using interactions from T+5 until T+10”.

Scenario([validation, seed])

Base class for defining an evaluation scenario.

Timed(t[, t_validation, delta_out, ...])

Predict users' future interactions, given information about historical interactions.

LastItemPrediction([validation, seed, ...])

Predict a user's next interaction.

TimedLastItemPrediction(t[, t_validation, ...])

Predict users’ last interaction, given information about historical interactions.

WeakGeneralization([frac_data_in, ...])

Predict (randomly) held-out interactions for all users, with remaining data used for training.

StrongGeneralization([frac_users_train, ...])

Predict (randomly) held-out interactions of previously unseen users.

StrongGeneralizationTimed(frac_users_in, t)

Predict future interactions for previously unseen users.

StrongGeneralizationTimedMostRecent(t[, ...])

Predict the next interaction(s) for previously unseen users.

A scenario is stateful. At initialization the parameters for the scenario are passed. Only after calling Scenario.split given a recpack.matrix.InteractionMatrix, can splits be retrieved under Scenario.full_training_data, Scenario.validation_training_data, Scenario.validation_data and Scenario.test_data.

Splitters

Splitters are the building blocks for the scenarios. A splitter performs a simple split into two InteractionMatrices according to one, simple criterion, e.g. “Fold in is all interactions before T, hold out all interactions after T”.

Should you want to implement a new scenario that is not yet supported, these splitters facilitate easy implementation.

UserSplitter(users_in, users_out)

Split data by the user identifiers of the interactions.

FractionInteractionSplitter(in_frac[, seed])

Split data randomly, such that in_fraction of interactions are assigned to the first return value and the remainder to the second.

TimestampSplitter(t[, delta_out, delta_in])

Split data so that the first return value contains interactions in [t-delta_in, t[, and the second those in [t, t+delta_out[.

StrongGeneralizationSplitter([in_frac, ...])

Randomly splits the users into two sets so that interactions for a user will always occur only in one split.

UserInteractionTimeSplitter(t)

Split users based on the time of their most recent interactions.

MostRecentSplitter(n)

Splits the n most recent interactions of a user into the second return value, and earlier interactions into the first.