irspack.split.split_last_n_interaction_df
- irspack.split.split_last_n_interaction_df(df, user_column, timestamp_column, n_heldout=None, heldout_ratio=0.1, ceil_n_heldout=False)[source]
Split a dataframe holding out last n_heldout or last heldout_ratio part of interactions of the users.
- Parameters:
df (DataFrame) – The Dataframe to be split.
user_column (str) – The column name for users.
timestamp_column (str) – The column name for “timestamp” (it doesn’t have to be datetime).
n_heldout (Optional[int]) – If not None, specifies the maximal number of last actions to be held-out. Defaults to None.
heldout_ratio (float) – Specifies how much of each user interaction will be held out. Ignored if
n_heldoutis present.ceil_n_heldout (bool) – If this is True and n_heldout is None, the number of test interaction for a given user u will be ceil(N_u * heldout_ratio) where N_u is the number of interactions fo u. If this is False, floor(N_u * heldout_ratio) will be used instead. Defaults to False.
- Returns:
First interactions and held-out interactions.
- Return type:
Tuple[DataFrame, DataFrame]