irspack.split.holdout_specific_interactions

irspack.split.holdout_specific_interactions(df, user_column, item_column, interaction_indicator, validatable_user_ratio_val=0.2, validatable_user_ratio_test=0.2, random_state=None)[source]

Holds-out (part of) the interactions specified by the users.

All the users will be split into two category:

  1. Those who have an interaction in the specified subset. We denote them as “validatable” users.

  2. Those who don’t.

We split the users in 1. into three parts (train, validation, test)-users, and hold-out the specified interactions. The interactions of non-validatable users will be part of the train dataset.

This split will be useful when want to:

  • recommend only part of the items (e.g., rather unpopular ones) to the users. In this case, the held-out interactions will be the ones with these specific items.

  • split the dataframe by a certain timepoint, and ensure that no information after that timepoint contaminates the training set.

Parameters:
  • df (DataFrame) – The data source.

  • user_column (str) – The column name of the users.

  • item_column (str) – The column name of the items.

  • interaction_indicator (ndarray) – Specifies where in df the held-out interactions are.

  • validatable_user_ratio_val (float) – The ratio of “validation-set users” in the “validatable users”. Defaults to 0.2.

  • validatable_user_ration_test – The ratio of “test-set users” in the “validatable users”. Defaults to 0.2.

  • random_state (Union[None, int, RandomState]) – The random seed used to split validatable users into three. Defaults to None.

  • validatable_user_ratio_test (float) –

Returns:

A tuple consiting of

  • The aligned list of all the items.

  • A dictionary with train/val/test user pairs.

Return type:

Tuple[List[Any], Dict[str, UserTrainTestInteractionPair]]