irspack.split.split_last_n_interaction_df

irspack.split.split_last_n_interaction_df(df, user_column, timestamp_column, n_heldout=None, heldout_ratio=0.1, ceil_n_heldout=False)[source]

Split a dataframe holding out last n_heldout or last heldout_ratio part of interactions of the users.

Parameters:
  • df (DataFrame) – The Dataframe to be split.

  • user_column (str) – The column name for users.

  • timestamp_column (str) – The column name for “timestamp” (it doesn’t have to be datetime).

  • n_heldout (Optional[int]) – If not None, specifies the maximal number of last actions to be held-out. Defaults to None.

  • heldout_ratio (float) – Specifies how much of each user interaction will be held out. Ignored if n_heldout is present.

  • ceil_n_heldout (bool) – If this is True and n_heldout is None, the number of test interaction for a given user u will be ceil(N_u * heldout_ratio) where N_u is the number of interactions fo u. If this is False, floor(N_u * heldout_ratio) will be used instead. Defaults to False.

Returns:

First interactions and held-out interactions.

Return type:

Tuple[DataFrame, DataFrame]