Hyperparameter Optimization
In this tutorial, we first demonstrate how P3alphaRecommender
’s performance can be optimized by optuna-backed tune
function.
Then, by further splitting the ground-truth interaction into tran, validation and test ones, we compare several recommenders’ performance optimized on the validation set and measured on the test set.
[1]:
from IPython.display import clear_output, display
import numpy as np
import scipy.sparse as sps
from sklearn.model_selection import train_test_split
from irspack.dataset import MovieLens1MDataManager
from irspack import (
P3alphaRecommender, rowwise_train_test_split, Evaluator,
df_to_sparse
)
Read the ML1M dataset again.
We again prepare the sparse matrix X
.
[2]:
loader = MovieLens1MDataManager()
df = loader.read_interaction()
movies = loader.read_item_info()
movies.head()
X, unique_user_ids, unique_movie_ids = df_to_sparse(
df, 'userId', 'movieId'
)
Split scheme 2. Hold-out for partial users.
To perform the hyperparameter optimization, we have to repeatedly measure the accuracy metrics on the validation set. As mentioned in the previous tutorial, doing this for all users is time-comsuming (often heavier than the recommender’s learning process), so we truncate this subset as follows:
First split users into “train”, “validation” (and “test”) ones.
For train users, feed all their interactions into the recommender. For validation (test) users, hold-out part of their interaction for the validation (“prediction” part), and feed the rest (“learning” part) into the recommender.
After the fit, ask the recommender to output the score only for validation (test) users, and see how it ranks these held-out interactions for the validation (test) users.
Although we have prepared another function to do this procedure, let us first do this manually.
[3]:
# Split users into train and validation users.
X_train_user, X_valid_user = train_test_split(X, test_size=.4, random_state=0)
# Split the validation users' interaction into learning 50% and predcition 50%.
X_valid_learn, X_valid_predict = rowwise_train_test_split(
X_valid_user, test_ratio=.5, random_state=0
)
Define the evaluator and optimize the validation metric
As illustrated above, we will use
Train users’ all interactions (
X_train_user
)Validation users’ 50% interaction (
X_valid_learn
)
as the recommender’s training resource, and validation users’ rest interaction (X_valid_predict
) as the held-out ground truth:
[4]:
X_train_val_learn = sps.vstack([X_train_user, X_valid_learn])
evaluator = Evaluator(X_valid_predict, offset=X_train_user.shape[0], target_metric='ndcg', cutoff=20)
The offset
parameter specifies where the validation user block begins (where the train user block ends).
Now to start the optimization.
[5]:
best_params, validation_results = P3alphaRecommender.tune(X_train_val_learn, evaluator, random_seed=0, n_trials=20)
clear_output() # output is a bit lengthy
The best ndcg@20
value is
[6]:
validation_results['ndcg@20'].max()
[6]:
0.5159628863136182
which has been obtained by using these hyper parameters:
[7]:
best_params
[7]:
{'top_k': 217, 'normalize_weight': True}
Meanwhile, the default argument of P3alphaRecommdner
(which has been used so far) attains ndcg@20
= 0.4084. So this is indeed a significant improvement:
[8]:
rec_default = P3alphaRecommender(X_train_val_learn).learn()
evaluator.get_score(rec_default)['ndcg']
[8]:
0.4084060191998281
Check the recommender’s output again
Let us check how our recommender has evolved from the first tutorial. We consider the same setting (a new user has watched “Toy Story”), but fit the recommender using the obtained parameters.
[9]:
rec_tuned = P3alphaRecommender(X, **best_params).learn()
from irspack import ItemIDMapper
id_mapper = ItemIDMapper(unique_movie_ids)
[10]:
toystory_id = 1
recommended_id_and_score = id_mapper.recommend_for_new_user(
rec_tuned, user_profile=[toystory_id], cutoff=10
)
# Top-10 recommendations
movies.reindex([movie_id for movie_id, score in recommended_id_and_score])
[10]:
title | genres | release_year | |
---|---|---|---|
movieId | |||
1265 | Groundhog Day (1993) | Comedy|Romance | 1993 |
2396 | Shakespeare in Love (1998) | Comedy|Romance | 1998 |
3114 | Toy Story 2 (1999) | Animation|Children's|Comedy | 1999 |
1270 | Back to the Future (1985) | Comedy|Sci-Fi | 1985 |
2028 | Saving Private Ryan (1998) | Action|Drama|War | 1998 |
34 | Babe (1995) | Children's|Comedy|Drama | 1995 |
2571 | Matrix, The (1999) | Action|Sci-Fi|Thriller | 1999 |
356 | Forrest Gump (1994) | Comedy|Romance|War | 1994 |
2355 | Bug's Life, A (1998) | Animation|Children's|Comedy | 1998 |
1197 | Princess Bride, The (1987) | Action|Adventure|Comedy|Romance | 1987 |
Note how drastically the recommended contents have changed (increased significance of genre “Children’s” and disapperance of “Star Wars” series, etc…).
A train/validation/test split example
To rigorously compare the performance of various recommender algorithms, we should measure the final score against the test dataset, not the validation set, and it is straightforward now.
To begin with, we have prepared a function called split_dataframe_partial_user_holdout
which splits the users in the original dataframe into train/validation/test users, holding out partial interaction for validation/test user:
[11]:
from irspack.split import split_dataframe_partial_user_holdout
dataset, item_ids = split_dataframe_partial_user_holdout(
df, 'userId', 'movieId', val_user_ratio=.3, test_user_ratio=.3,
heldout_ratio_val=.5, heldout_ratio_test=.5
)
dataset
[11]:
{'train': <irspack.split.userwise.UserTrainTestInteractionPair at 0x7ff61f483430>,
'val': <irspack.split.userwise.UserTrainTestInteractionPair at 0x7ff61f4831f0>,
'test': <irspack.split.userwise.UserTrainTestInteractionPair at 0x7ff61f482bc0>}
As you can see, the returned dataset
is a dictionary which stores train/validation/test-users’ interactions as an instance of UserTrainTestInteractionPair
.
[12]:
train_users = dataset['train']
val_users = dataset['val']
test_users = dataset['test']
# Concatenate train/validation users into one.
train_and_val_users = train_users.concat(val_users)
[13]:
val_users.X_train
[13]:
<1812x3706 sparse matrix of type '<class 'numpy.float64'>'
with 152333 stored elements in Compressed Sparse Row format>
[14]:
val_users.X_test
[14]:
<1812x3706 sparse matrix of type '<class 'numpy.float64'>'
with 151435 stored elements in Compressed Sparse Row format>
[15]:
val_users.X_all # which equals val_users.X_train + val_users.X_test
[15]:
<1812x3706 sparse matrix of type '<class 'numpy.float64'>'
with 303768 stored elements in Compressed Sparse Row format>
[16]:
# For train users, there is no "test" interaction held out.
train_users.X_test
[16]:
<2416x3706 sparse matrix of type '<class 'numpy.float64'>'
with 0 stored elements in Compressed Sparse Row format>
For each recommender algorithm (here P3alpha
, RP3beta
, IALS
and DenseSLIM
), we perform:
Hyperparameter optimization. During this phase, we will be using train users’ all interaction and validation users’ train interaction as the source of learning, and validation users’ test interaction as the held-out ground truth.
Evaluation. During this phase, we will include train/validation users’ all interactions as well as test users’ train interaction as the source of learning, and fit the model using the parameters obtained in the optimization phase. Then we measure the recommender’s performance against test users’ test interaction.
[17]:
from typing import Type
from irspack import DenseSLIMRecommender, RP3betaRecommender, IALSRecommender, BaseRecommender
[18]:
val_evaluator = Evaluator(
val_users.X_test,
offset=train_users.n_users,
cutoff=20, target_metric="ndcg"
)
test_evaluator = Evaluator(
test_users.X_test,
offset=train_and_val_users.n_users
)
test_results = []
recommender_name_vs_best_parameter = {}
recommender_class: Type[BaseRecommender]
for recommender_class in [IALSRecommender, DenseSLIMRecommender, P3alphaRecommender, RP3betaRecommender]:
print(f'Start tuning {recommender_class.__name__}.')
best_params, validation_results_df = recommender_class.tune(
sps.vstack([train_users.X_all, val_users.X_train]),
val_evaluator, n_trials=40, random_seed=0
)
recommender = recommender_class(
sps.vstack([train_and_val_users.X_all, test_users.X_train]),
**best_params
).learn()
recommender_name_vs_best_parameter[recommender_class.__name__] = best_params
test_score = dict(
algorithm=recommender_class.__name__,
**test_evaluator.get_scores(recommender, cutoffs=[20])
)
test_results.append(test_score)
clear_output()
As you can see below, iALS and DenseSLIM outperforms others in terms of accuracy measures (recall, ndcg, map).
iALS performed well regarding the diversity scores (entropy, gini-index, appeared_item), too.
[19]:
import pandas as pd
pd.DataFrame(test_results)
[19]:
algorithm | hit@20 | recall@20 | ndcg@20 | map@20 | precision@20 | gini_index@20 | entropy@20 | appeared_item@20 | |
---|---|---|---|---|---|---|---|---|---|
0 | IALSRecommender | 0.996137 | 0.208540 | 0.576164 | 0.135950 | 0.528201 | 0.915915 | 5.989211 | 1108.0 |
1 | DenseSLIMRecommender | 0.995033 | 0.207463 | 0.572965 | 0.135436 | 0.525055 | 0.926926 | 5.859120 | 1018.0 |
2 | P3alphaRecommender | 0.993377 | 0.182982 | 0.526259 | 0.114564 | 0.477152 | 0.962736 | 5.174319 | 690.0 |
3 | RP3betaRecommender | 0.995033 | 0.188152 | 0.537070 | 0.119353 | 0.486010 | 0.957136 | 5.297020 | 846.0 |
Let’s ask each recommender, “What would you recommend to a user who has just seen “Toy Story”?
IALS, DenseSLIM, RP3beta rank “Toy Story2” at the top of recommendation list, which seems appropriate.
[20]:
for recommender_class in [IALSRecommender, DenseSLIMRecommender, RP3betaRecommender, P3alphaRecommender]:
rec_tuned = recommender_class(X, **recommender_name_vs_best_parameter[recommender_class.__name__]).learn()
toystory_id = 1
recommended_id_and_score = id_mapper.recommend_for_new_user(
rec_tuned, user_profile=[toystory_id], cutoff=10
)
print(f"{recommender_class.__name__}'s result:")
# Top-10 recommendations
display(movies.reindex([movie_id for movie_id, score in recommended_id_and_score]))
IALSRecommender's result:
title | genres | release_year | |
---|---|---|---|
movieId | |||
3114 | Toy Story 2 (1999) | Animation|Children's|Comedy | 1999 |
34 | Babe (1995) | Children's|Comedy|Drama | 1995 |
2355 | Bug's Life, A (1998) | Animation|Children's|Comedy | 1998 |
1265 | Groundhog Day (1993) | Comedy|Romance | 1993 |
588 | Aladdin (1992) | Animation|Children's|Comedy|Musical | 1992 |
2396 | Shakespeare in Love (1998) | Comedy|Romance | 1998 |
2321 | Pleasantville (1998) | Comedy | 1998 |
356 | Forrest Gump (1994) | Comedy|Romance|War | 1994 |
1148 | Wrong Trousers, The (1993) | Animation|Comedy | 1993 |
595 | Beauty and the Beast (1991) | Animation|Children's|Musical | 1991 |
DenseSLIMRecommender's result:
title | genres | release_year | |
---|---|---|---|
movieId | |||
3114 | Toy Story 2 (1999) | Animation|Children's|Comedy | 1999 |
2355 | Bug's Life, A (1998) | Animation|Children's|Comedy | 1998 |
34 | Babe (1995) | Children's|Comedy|Drama | 1995 |
588 | Aladdin (1992) | Animation|Children's|Comedy|Musical | 1992 |
1265 | Groundhog Day (1993) | Comedy|Romance | 1993 |
2396 | Shakespeare in Love (1998) | Comedy|Romance | 1998 |
356 | Forrest Gump (1994) | Comedy|Romance|War | 1994 |
1148 | Wrong Trousers, The (1993) | Animation|Comedy | 1993 |
1641 | Full Monty, The (1997) | Comedy | 1997 |
1923 | There's Something About Mary (1998) | Comedy | 1998 |
RP3betaRecommender's result:
title | genres | release_year | |
---|---|---|---|
movieId | |||
3114 | Toy Story 2 (1999) | Animation|Children's|Comedy | 1999 |
1265 | Groundhog Day (1993) | Comedy|Romance | 1993 |
2396 | Shakespeare in Love (1998) | Comedy|Romance | 1998 |
34 | Babe (1995) | Children's|Comedy|Drama | 1995 |
2355 | Bug's Life, A (1998) | Animation|Children's|Comedy | 1998 |
1270 | Back to the Future (1985) | Comedy|Sci-Fi | 1985 |
260 | Star Wars: Episode IV - A New Hope (1977) | Action|Adventure|Fantasy|Sci-Fi | 1977 |
2028 | Saving Private Ryan (1998) | Action|Drama|War | 1998 |
356 | Forrest Gump (1994) | Comedy|Romance|War | 1994 |
1210 | Star Wars: Episode VI - Return of the Jedi (1983) | Action|Adventure|Romance|Sci-Fi|War | 1983 |
P3alphaRecommender's result:
title | genres | release_year | |
---|---|---|---|
movieId | |||
1265 | Groundhog Day (1993) | Comedy|Romance | 1993 |
2396 | Shakespeare in Love (1998) | Comedy|Romance | 1998 |
3114 | Toy Story 2 (1999) | Animation|Children's|Comedy | 1999 |
1270 | Back to the Future (1985) | Comedy|Sci-Fi | 1985 |
2028 | Saving Private Ryan (1998) | Action|Drama|War | 1998 |
34 | Babe (1995) | Children's|Comedy|Drama | 1995 |
356 | Forrest Gump (1994) | Comedy|Romance|War | 1994 |
2355 | Bug's Life, A (1998) | Animation|Children's|Comedy | 1998 |
1197 | Princess Bride, The (1987) | Action|Adventure|Comedy|Romance | 1987 |
588 | Aladdin (1992) | Animation|Children's|Comedy|Musical | 1992 |