irspack.split.holdout_specific_interactions
- irspack.split.holdout_specific_interactions(df, user_column, item_column, interaction_indicator, validatable_user_ratio_val=0.2, validatable_user_ratio_test=0.2, random_state=None)[source]
Holds-out (part of) the interactions specified by the users.
All the users will be split into two category:
Those who have an interaction in the specified subset. We denote them as “validatable” users.
Those who don’t.
We split the users in 1. into three parts (train, validation, test)-users, and hold-out the specified interactions. The interactions of non-validatable users will be part of the train dataset.
This split will be useful when want to:
recommend only part of the items (e.g., rather unpopular ones) to the users. In this case, the held-out interactions will be the ones with these specific items.
split the dataframe by a certain timepoint, and ensure that no information after that timepoint contaminates the training set.
- Parameters:
df (DataFrame) – The data source.
user_column (str) – The column name of the users.
item_column (str) – The column name of the items.
interaction_indicator (ndarray) – Specifies where in
dfthe held-out interactions are.validatable_user_ratio_val (float) – The ratio of “validation-set users” in the “validatable users”. Defaults to 0.2.
validatable_user_ration_test – The ratio of “test-set users” in the “validatable users”. Defaults to 0.2.
random_state (Union[None, int, RandomState]) – The random seed used to split validatable users into three. Defaults to None.
validatable_user_ratio_test (float) –
- Returns:
A tuple consiting of
The aligned list of all the items.
A dictionary with train/val/test user pairs.
- Return type:
Tuple[List[Any], Dict[str, UserTrainTestInteractionPair]]