irspack.split.split_dataframe_partial_user_holdout
- irspack.split.split_dataframe_partial_user_holdout(df_all, user_column, item_column, time_column=None, rating_column=None, n_val_user=None, n_test_user=None, val_user_ratio=0.1, test_user_ratio=0.1, heldout_ratio_val=0.5, n_heldout_val=None, heldout_ratio_test=0.5, n_heldout_test=None, ceil_n_heldout=False, random_state=None)[source]
Splits the DataFrame and build an interaction matrix, holding out random interactions for a subset of randomly selected users (whom we call “validation users” and “test users”).
- Parameters:
df_all (DataFrame) – The user-item interaction event log.
user_column (str) – The column name for user_id.
item_column (str) – The column name for movie_id.
time_column (Optional[str]) – The column name (if any) specifying the time of the interaction. If this is set, the split will be based on time, and some of the most recent interactions will be held out for each user. Defaults to None.
rating_column (Optional[str]) – The column name for ratings. If
None, the rating will be treated as1for all interactions. Defaults to None.n_val_user (Optional[int]) – The number of “validation users”. Defaults to None.
n_test_user (Optional[int]) – The number of “test users”. Defaults to None.
val_user_ratio (float) – The percentage of “validation users” with respect to all users. Ignored when
n_val_useris set. Defaults to 0.1.test_user_ratio (float) – The percentage of “test users” with respect to all users. Ignored when
n_text_useris set. Defaults to 0.1.heldout_ratio_val (float) – The percentage of held-out interactions for “validation users”. Ignored if
n_heldout_valis specified. Defaults to 0.5.n_heldout_val (Optional[int]) – The maximal number of held-out interactions for “validation users”.
heldout_ratio_test (float) – The percentage of held-out interactions for “test users”. Ignored if
n_heldout_testis specified. Defaults to 0.5.n_heldout_val – The maximal number of held-out interactions for “test users”.
ceil_n_heldout (bool) – If True, the number of held-out interactions of user u will be ceil(heldout_ratio_val * N_u) and ceil(heldout_ratio_test * N_u). If False, floor function will be used instead. Defaults to False.
random_state (Union[None, int, RandomState]) – The random state for this procedure. Defaults to None.
n_heldout_test (Optional[int]) –
- Raises:
ValueError – When
n_val_user + n_test_useris greater than the number of total users.- Returns:
A dictionary with
"train","val","test"as its keys and the coressponding dataset as its values.List of unique item ids (which corresponds to the columns of the datasets).
- Return type:
A tuple consisting of