irspack.recommenders.IALSRecommender

class irspack.recommenders.IALSRecommender(X_train_all, n_components=20, alpha0=0.0, reg=0.001, nu=1.0, confidence_scaling='none', epsilon=1.0, init_std=0.1, solver_type='CG', max_cg_steps=3, ialspp_subspace_dimension=64, loss_type='IALSPP', nu_star=None, random_seed=42, n_threads=None, train_epochs=16, prediction_time_max_cg_steps=5, prediction_time_ialspp_iteration=7)[source]

Bases: BaseRecommenderWithEarlyStopping, BaseRecommenderWithUserEmbedding, BaseRecommenderWithItemEmbedding

Implementation of implicit Alternating Least Squares (iALS) or Weighted Matrix Factorization (WMF).

By default, it tries to minimize the following loss:

\[\frac{1}{2} \sum _{u, i \in S} c_{ui} (\mathbf{u}_u \cdot \mathbf{v}_i - 1) ^ 2 + \frac{\alpha_0}{2} \sum_{u, i} (\mathbf{u}_u \cdot \mathbf{v}_i) ^ 2 + \frac{\text{reg}}{2} \left( \sum_u (\alpha_0 I + N_u) ^ \nu || \mathbf{u}_u || ^2 + \sum_i (\alpha_0 U + N_i) ^ \nu || \mathbf{v}_i || ^2 \right)\]

where \(S\) denotes the set of all pairs wher \(X_{ui}\) is non-zero.

See the seminal paper:

Collaborative filtering for implicit feedback datasets

By default it uses a conjugate gradient descent version:

Applications of the conjugate gradient method for implicit feedback collaborative filtering

The loss above is slightly different from the original version. See the following paper for the loss used here

Revisiting the Performance of iALS on Item Recommendation Benchmarks

Parameters:

X_train_all (Union[scipy.sparse.csr_matrix, scipy.sparse.csc_matrix]) – Input interaction matrix.
n_components (int, optional) – The dimension for latent factor. Defaults to 20.
alpha0 (float, optional) – The “unobserved” weight.
reg (float, optional) – Regularization coefficient for both user & item factors. Defaults to 1e-3.
nu (float, optional) – Controlles frequency regularization introduced in the paper, “Revisiting the Performance of iALS on Item Recommendation Benchmarks”.
confidence_scaling (str, optional) –
Specifies how to scale confidence scaling \(c_{ui}\). Must be either “none” or “log”. If “none”, the non-zero (not-necessarily 1) \(X_{ui}\) yields

\[c_{ui} = A + X_{ui}\]

If “log”,

\[c_{ui} = A + \log (1 + X_{ui} / \epsilon )\]

The constant \(A\) above will be 0 if loss_type is "IALSPP", \(\alpha_0\) if loss_type is "ORIGINAL".

Defaults to “none”.
epsilon (float, optional) – The \(\epsilon\) parameter for log-scaling described above. Will not have any effect if confidence_scaling is “none”. Defaults to 1.0f.
init_std (float, optional) – Standard deviation for initialization normal distribution. The actual std for each user/item vector components are scaled by 1 / n_components ** .5. Defaults to 0.1.
solver_type ("CHOLESKY" | "CG" | "IALSPP", optional) – Which solver to use. Defaults to “CG”.
max_cg_steps (int, optional) – Maximal number of conjute gradient descent steps during the training time. Defaults to 3. Used only when solver_type=="CG". By increasing this parameter, the result will be closer to Cholesky decomposition method (i.e., when solver_type == "CHOLESKY"), but it wll take longer time.
ialspp_subspace_dimension (int, optional) – The subspace dimension of iALS++ (ignored if the solver_type is not “IALSPP”). If this value is 1, specialized implementation described in Fast Matrix Factorization for Online Recommendation with Implicit Feedback will be used instead. Defaults to 64.
loss_type (Literal["IALSPP", "ORIGINAL"], optional) – Specifies the subtle difference between iALS++ vs Original Loss.
nu_star (Optional[float], optional) – If not None, used as the reference scale for nu described in the “Revisiting…” paper. Defaults to None.
random_seed (int, optional) – The random seed to initialize the parameters.
n_threads (Optional[int], optional) – Specifies the number of threads to use for the computation. If None, the environment variable "IRSPACK_NUM_THREADS_DEFAULT" will be looked up, and if the variable is not set, it will be set to os.cpu_count(). Defaults to None.
train_epochs (int, optional) – Maximal number of epochs. Defaults to 16.
prediction_time_max_cg_steps (int, optional) – Maximal number of conjute gradient descent steps during the prediction time, i.e., the case when a user unseen at the training time is given as a history matrix. Defaults to 5.
prediction_time_ialspp_iteration (int) –

Examples

>>> from irspack import IALSRecommender, rowwise_train_test_split, Evaluator
>>> from irspack.utils.sample_data import mf_example_data
>>> X = mf_example_data(100, 30, random_state=1)
>>> X_train, X_test = rowwise_train_test_split(X, random_state=0)
>>> rec = IALSRecommender(X_train)
>>> rec.learn()
>>> evaluator=Evaluator(X_test)
>>> print(evaluator.get_scores(rec, [20]))
OrderedDict([('hit@20', 1.0), ('recall@20', 0.9003412698412698), ('ndcg@20', 0.6175493479217139), ('map@20', 0.3848785870622406), ('precision@20', 0.3385), ('gini_index@20', 0.0814), ('entropy@20', 3.382497875272383), ('appeared_item@20', 30.0)])

__init__(X_train_all, n_components=20, alpha0=0.0, reg=0.001, nu=1.0, confidence_scaling='none', epsilon=1.0, init_std=0.1, solver_type='CG', max_cg_steps=3, ialspp_subspace_dimension=64, loss_type='IALSPP', nu_star=None, random_seed=42, n_threads=None, train_epochs=16, prediction_time_max_cg_steps=5, prediction_time_ialspp_iteration=7)[source]

Parameters:

X_train_all (Union[csr_matrix, csc_matrix]) –
n_components (int) –
alpha0 (float) –
reg (float) –
nu (float) –
confidence_scaling (str) –
epsilon (float) –
init_std (float) –
solver_type (typing_extensions.Literal[CG, CHOLESKY, IALSPP]) –
max_cg_steps (int) –
ialspp_subspace_dimension (int) –
loss_type (typing_extensions.Literal[IALSPP, ORIGINAL]) –
nu_star (Optional[float]) –
random_seed (int) –
n_threads (Optional[int]) –
train_epochs (int) –
prediction_time_max_cg_steps (int) –
prediction_time_ialspp_iteration (int) –

Return type:

None

Methods

`__init__`(X_train_all[, n_components, ...])
`compute_item_embedding`(X)	Given an unknown items' interaction with known user, computes the latent factors of the items by least square (fixing user embeddings).
`compute_user_embedding`(X)	Given an unknown users' interaction with known items, computes the latent factors of the users by least square (fixing item embeddings).
`default_suggest_parameter`(trial, fixed_params)
`from_config`(X_train_all, config)
`get_item_embedding`()	Get item embedding vectors.
`get_score`(user_indices)	Compute the item recommendation score for a subset of users.
`get_score_block`(begin, end)	Compute the score for a block of the users.
`get_score_cold_user`(X)	Compute the item recommendation score for unseen users whose profiles are given as another user-item relation matrix.
`get_score_cold_user_remove_seen`(X)	Compute the item recommendation score for unseen users whose profiles are given as another user-item relation matrix.
`get_score_from_item_embedding`(user_indices, ...)
`get_score_from_user_embedding`(user_embedding)	Compute the item score from user embedding.
`get_score_remove_seen`(user_indices)	Compute the item score and mask the item in the training set.
`get_score_remove_seen_block`(begin, end)	Compute the score for a block of the users, and mask the items in the training set.
`get_user_embedding`()	Get user embedding vectors.
`learn`()	Learns and returns itself.
`learn_with_optimizer`(evaluator, trial[, ...])	Learning procedures with early stopping and pruning.
`load_state`()
`run_epoch`()
`save_state`()
`start_learning`()
`tune`(data, evaluator[, n_trials, timeout, ...])	Perform the optimization step.
`tune_doubling_dimension`(data, evaluator, ...)	Perform tuning gradually doubling n_components.
`tune_with_study`(study, data, evaluator[, ...])

Attributes

`default_tune_range`
`trainer_as_ials`

X_train_all: sps.csr_matrix: The matrix to feed into recommender.

compute_item_embedding(X)[source]

Given an unknown items’ interaction with known user, computes the latent factors of the items by least square (fixing user embeddings).

Parameters:: X (Union[csr_matrix, csc_matrix]) – The interaction history of the new users. X.shape[0] must be equal to self.n_users.
Return type:: ndarray

compute_user_embedding(X)[source]

Given an unknown users’ interaction with known items, computes the latent factors of the users by least square (fixing item embeddings).

Parameters:: X (Union[csr_matrix, csc_matrix]) – The interaction history of the new users. X.shape[1] must be equal to self.n_items.
Return type:: ndarray

get_item_embedding()[source]

Get item embedding vectors.

Returns:: The latent vector representation of items. Its number of rows is equal to the number of the items.
Return type:: ndarray

get_score(user_indices)[source]

Compute the item recommendation score for a subset of users.

Parameters:: user_indices (ndarray) – The index defines the subset of users.
Returns:: The item scores. Its shape will be (len(user_indices), self.n_items)
Return type:: ndarray

get_score_block(begin, end)[source]

Compute the score for a block of the users.

Parameters:

begin (int) – where the evaluated user block begins.
end (int) – where the evaluated user block ends.

Returns:

The item scores. Its shape will be (end - begin, self.n_items)

Return type:

ndarray

get_score_cold_user(X)[source]

Compute the item recommendation score for unseen users whose profiles are given as another user-item relation matrix.

Parameters:: X (Union[csr_matrix, csc_matrix]) – The profile user-item relation matrix for unseen users. Its number of rows is arbitrary, but the number of columns must be self.n_items.
Returns:: Computed item scores for users. Its shape is equal to X.
Return type:: ndarray

get_score_cold_user_remove_seen(X)

Compute the item recommendation score for unseen users whose profiles are given as another user-item relation matrix. The score will then be masked by the input.

Parameters:: X (Union[csr_matrix, csc_matrix]) – The profile user-item relation matrix for unseen users. Its number of rows is arbitrary, but the number of columns must be self.n_items.
Returns:: Computed & masked item scores for users. Its shape is equal to X.
Return type:: ndarray

get_score_from_user_embedding(user_embedding)[source]

Compute the item score from user embedding. Mainly used for cold-start scenario.

Parameters:: user_embedding (ndarray) – Latent user representation obtained elsewhere.
Returns:: The score array. Its shape will be (user_embedding.shape[0], self.n_items)
Return type:: DenseScoreArray

get_score_remove_seen(user_indices)

Compute the item score and mask the item in the training set. Masked items will have the score -inf.

Parameters:: user_indices (ndarray) – Specifies the subset of users.
Returns:: The masked item scores. Its shape will be (len(user_indices), self.n_items)
Return type:: ndarray

get_score_remove_seen_block(begin, end)

Compute the score for a block of the users, and mask the items in the training set. Masked items will have the score -inf.

Parameters:

begin (int) – where the evaluated user block begins.
end (int) – where the evaluated user block ends.

Returns:

The masked item scores. Its shape will be (end - begin, self.n_items)

Return type:

ndarray

get_user_embedding()[source]

Get user embedding vectors.

Returns:: The latent vector representation of users. Its number of rows is equal to the number of the users.
Return type:: ndarray

learn()

Learns and returns itself.

Returns:: The model after fitting process.
Parameters:: self (R) –
Return type:: R

learn_with_optimizer(evaluator, trial, max_epoch=128, validate_epoch=5, score_degradation_max=5)

Learning procedures with early stopping and pruning.

Parameters:

evaluator (Optional[evaluation.Evaluator]) – The evaluator to measure the score.
trial (Optional[Trial]) – The current optuna trial under the study (if any.)
max_epoch (int) – Maximal number of epochs. If iterative learning procedure is not available, this parameter will be ignored. Defaults to 128.
validate_epoch (int) – The frequency of validation score measurement. If iterative learning procedure is not available, this parameter will be ignored. Defaults to 5.
validate_epoch – The frequency of validation score measurement. If iterative learning procedure is not available, this parameter will be ignored. Defaults to 5.
score_degradation_max (int) – Maximal number of allowed score degradation. If iterative learning procedure is not available, this parameter will be ignored. Defaults to 5.

Return type:

None

classmethod tune(data, evaluator, n_trials=20, timeout=None, data_suggest_function=None, parameter_suggest_function=None, fixed_params={}, random_seed=None, prunning_n_startup_trials=10, max_epoch=16, validate_epoch=1, score_degradation_max=3, logger=None)[source]

Perform the optimization step. optuna.Study object is created inside this function.

Parameters:

data (Optional[Union[csr_matrix, csc_matrix]]) – The training data. You can also provide tunable parameter dependent training data by providing data_suggest_function. In that case, data must be None.
evaluator (evaluation.Evaluator) – The validation evaluator that measures the performance of the recommenders.
n_trials (int) – The number of expected trials (including pruned ones). Defaults to 20.
timeout (Optional[int]) – If set to some value (in seconds), the study will exit after that time period. Note that the running trials is not interrupted, though. Defaults to None.
data_suggest_function (Optional[Callable[[Trial], Union[csr_matrix, csc_matrix]]]) – If not None, this must be a function which takes optuna.Trial as its argument and returns training data. Defaults to None.
parameter_suggest_function (Optional[Callable[[Trial], Dict[str, Any]]]) – If not None, this must be a function which takes optuna.Trial as its argument and returns Dict[str, Any] (i.e., some keyword arguments of the recommender class). If None, cls.default_suggest_parameter will be used for the parameter suggestion. Defaults to None.
fixed_params (Dict[str, Any]) – Fixed parameters passed to recommenders during the optimization procedure. This will replace the suggested parameter (either by cls.default_suggest_parameter or parameter_suggest_function). Defaults to dict().
random_seed (Optional[int]) – The random seed to control optuna.samplers.TPESampler. Defaults to None.
prunning_n_startup_trials (int) – n_startup_trials argument passed to the constructor of optuna.pruners.MedianPruner.
max_epoch (int) – The maximal number of epochs for the training. If iterative learning procedure is not available, this parameter will be ignored.
validate_epoch (int, optional) – The frequency of validation score measurement. If iterative learning procedure is not available, this parameter will be ignored. Defaults to 5.
score_degradation_max (int, optional) – Maximal number of allowed score degradation. If iterative learning procedure is not available, this parameter will be ignored. Defaults to 5. Defaults to 5.
logger (Optional[Logger]) –

Returns:

A tuple that consists of

A dict containing the best paramaters. This dict can be passed to the recommender as **kwargs.

A pandas.DataFrame that contains the history of optimization.

Return type:

Tuple[Dict[str, Any], DataFrame]

classmethod tune_doubling_dimension(data, evaluator, initial_dimension, maximal_dimension, storage=None, study_name_prefix=None, n_trials_initial=40, n_trials_following=20, n_startup_trials_initial=10, n_startup_trials_following=5, max_epoch=16, validate_epoch=1, score_degradation_max=3, neighborhood_scale=3.0, suggest_function_initial=None, random_seed=None)[source]

Perform tuning gradually doubling n_components. Typically, with the initial n_components, the search will be more exhaustive, and with larger n_components, less exploration will be done around previously found parameters. This strategy is described in Revisiting the Performance of iALS on Item Recommendation Benchmarks.

Parameters:

initial_dimension (int) – The initial dimension.
maximal_dimension (int) – The maximal (inclusive) dimension to be tried.
storage (Optional[RDBStorage]) – The storage where multiple optuna.Study will be created corresponding to the various dimensions. If None, all Study will be created in-memory.
study_name_prefix (Optional[str]) – The prefix for the names of optuna.Study. For dimension d, the full name of the Study will be “{study_name_prefix}_{d}”. If None, we will use a random string for this prefix.
n_trials_initial (int) – The number of trials for the initial dimension.
n_trials_following (int) – The number of trials for the following dimensions.
n_startup_trials_initial (int) – Passed on to n_startup_trials argument of optuna.pruners.MedianPruner in the initial optuna.Study. Defaults to 10.
n_startup_trials_following (int) – Passed on to n_startup_trials argument of optuna.pruners.MedianPruner in the following optuna.Study. Defaults to 5.
neighborhood_scale (float) – alpha_0 and reg parameters will be searched within the log-uniform range [previous_dimension_result / neighborhood_scale, previous_dimension_result * neighborhood_scale]. Defaults to 3.0
suggest_overwrite_initial – Overwrites the suggestion parameters in the initial optuna.Study. Defaults to [].
random_seed (Optional[int]) – The random seed to control optuna.samplers.TPESampler. Defaults to None.
data (Union[csr_matrix, csc_matrix]) –
evaluator (Evaluator) –
max_epoch (int) –
validate_epoch (int) –
score_degradation_max (int) –
suggest_function_initial (Optional[Callable[[Trial], Dict[str, Any]]]) –

Returns:

A tuple that consists of

A dict containing the best paramaters. This dict can be passed to the recommender as **kwargs.
A pandas.DataFrame that contains the history of optimization for all dimensions.

Return type:

Tuple[Dict[str, Any], DataFrame]