irspack.split.holdout_specific_interactions

irspack.split.holdout_specific_interactions(df, user_column, item_column, interaction_indicator, validatable_user_ratio_val=0.2, validatable_user_ratio_test=0.2, random_state=None)[source]

Holds-out (part of) the interactions specified by the users.

All the users will be split into two category:

Those who have an interaction in the specified subset. We denote them as “validatable” users.

Those who don’t.

We split the users in 1. into three parts (train, validation, test)-users, and hold-out the specified interactions. The interactions of non-validatable users will be part of the train dataset.

This split will be useful when want to:

recommend only part of the items (e.g., rather unpopular ones) to the users. In this case, the held-out interactions will be the ones with these specific items.

split the dataframe by a certain timepoint, and ensure that no information after that timepoint contaminates the training set.

Parameters:

df (DataFrame) – The data source.
user_column (str) – The column name of the users.
item_column (str) – The column name of the items.
interaction_indicator (ndarray) – Specifies where in df the held-out interactions are.
validatable_user_ratio_val (float) – The ratio of “validation-set users” in the “validatable users”. Defaults to 0.2.
validatable_user_ration_test – The ratio of “test-set users” in the “validatable users”. Defaults to 0.2.
random_state (Union[None, int, RandomState]) – The random seed used to split validatable users into three. Defaults to None.
validatable_user_ratio_test (float) –

Returns:

A tuple consiting of

The aligned list of all the items.

A dictionary with train/val/test user pairs.

Return type:

Tuple[List[Any], Dict[str, UserTrainTestInteractionPair]]