irspack.recommenders.CosineKNNRecommender

class irspack.recommenders.CosineKNNRecommender(X_train_all, shrinkage=0.0, normalize=False, top_k=100, feature_weighting='NONE', bm25_k1=1.2, bm25_b=0.75, n_threads=None)[source]

Bases: BaseKNNRecommender

K-nearest neighbor recommender system based on cosine similarity. That is, the similarity matrix W is given by (column-wise top-k restricted)

\[\begin{split}\mathrm{W}_{i,j} = \begin{cases} \frac{\sum_{u} X_{ui} X_{uj}}{||X_{*i}||_2 ||X_{*j}||_2 + \mathrm{shrinkage}} & (\text{if normalize = True}) \\ \sum_{u} X_{ui} X_{uj} & (\text{if normalize = False}) \end{cases}\end{split}\]

Parameters:

X_train_all (Union[scipy.sparse.csr_matrix, scipy.sparse.csc_matrix]) – Input interaction matrix.
shrinkage (float, optional) – The shrinkage parameter for regularization. Defaults to 0.0.
normalize (bool, optional) – Whether to normalize the similarity. Defaults to False.
top_k (int, optional) – Specifies the maximal number of allowed neighbors. Defaults to 100.
feature_weighting (str, optional) –
Specifies how to weight the feature. Must be one of:
- ”NONE” : no feature weighting
- ”TF_IDF” : TF-IDF weighting
- ”BM_25” : Okapi BM-25 weighting
Defaults to “NONE”.
bm25_k1 (float, optional) – The k1 parameter for BM25. Ignored if feature_weighting is not “BM_25”. Defaults to 1.2.
bm25_b (float, optional) – The b parameter for BM25. Ignored if feature_weighting is not “BM_25”. Defaults to 0.75.
n_threads (Optional[int], optional) – Specifies the number of threads to use for the computation. If None, the environment variable "IRSPACK_NUM_THREADS_DEFAULT" will be looked up, and if the variable is not set, it will be set to os.cpu_count(). Defaults to None.

__init__(X_train_all, shrinkage=0.0, normalize=False, top_k=100, feature_weighting='NONE', bm25_k1=1.2, bm25_b=0.75, n_threads=None)[source]

Parameters:

X_train_all (Union[csr_matrix, csc_matrix]) –
shrinkage (float) –
normalize (bool) –
top_k (int) –
feature_weighting (str) –
bm25_k1 (float) –
bm25_b (float) –
n_threads (Optional[int]) –

Return type:

None

Methods

`__init__`(X_train_all[, shrinkage, ...])
`default_suggest_parameter`(trial, fixed_params)
`from_config`(X_train_all, config)
`get_score`(user_indices)	Compute the item recommendation score for a subset of users.
`get_score_block`(begin, end)	Compute the score for a block of the users.
`get_score_cold_user`(X)	Compute the item recommendation score for unseen users whose profiles are given as another user-item relation matrix.
`get_score_cold_user_remove_seen`(X)	Compute the item recommendation score for unseen users whose profiles are given as another user-item relation matrix.
`get_score_remove_seen`(user_indices)	Compute the item score and mask the item in the training set.
`get_score_remove_seen_block`(begin, end)	Compute the score for a block of the users, and mask the items in the training set.
`learn`()	Learns and returns itself.
`learn_with_optimizer`(evaluator, trial[, ...])	Learning procedures with early stopping and pruning.
`tune`(data, evaluator[, n_trials, timeout, ...])	Perform the optimization step.
`tune_with_study`(study, data, evaluator[, ...])

Attributes

`W`	The computed item-item similarity weight matrix.
`default_tune_range`

property W: Union[csr_matrix, csc_matrix, ndarray]: The computed item-item similarity weight matrix.

X_train_all: sps.csr_matrix: The matrix to feed into recommender.

get_score(user_indices)

Compute the item recommendation score for a subset of users.

Parameters:: user_indices (ndarray) – The index defines the subset of users.
Returns:: The item scores. Its shape will be (len(user_indices), self.n_items)
Return type:: ndarray

get_score_block(begin, end)

Compute the score for a block of the users.

Parameters:

begin (int) – where the evaluated user block begins.
end (int) – where the evaluated user block ends.

Returns:

The item scores. Its shape will be (end - begin, self.n_items)

Return type:

ndarray

get_score_cold_user(X)

Compute the item recommendation score for unseen users whose profiles are given as another user-item relation matrix.

Parameters:: X (Union[csr_matrix, csc_matrix]) – The profile user-item relation matrix for unseen users. Its number of rows is arbitrary, but the number of columns must be self.n_items.
Returns:: Computed item scores for users. Its shape is equal to X.
Return type:: ndarray

get_score_cold_user_remove_seen(X)

Compute the item recommendation score for unseen users whose profiles are given as another user-item relation matrix. The score will then be masked by the input.

Parameters:: X (Union[csr_matrix, csc_matrix]) – The profile user-item relation matrix for unseen users. Its number of rows is arbitrary, but the number of columns must be self.n_items.
Returns:: Computed & masked item scores for users. Its shape is equal to X.
Return type:: ndarray

get_score_remove_seen(user_indices)

Compute the item score and mask the item in the training set. Masked items will have the score -inf.

Parameters:: user_indices (ndarray) – Specifies the subset of users.
Returns:: The masked item scores. Its shape will be (len(user_indices), self.n_items)
Return type:: ndarray

get_score_remove_seen_block(begin, end)

Compute the score for a block of the users, and mask the items in the training set. Masked items will have the score -inf.

Parameters:

begin (int) – where the evaluated user block begins.
end (int) – where the evaluated user block ends.

Returns:

The masked item scores. Its shape will be (end - begin, self.n_items)

Return type:

ndarray

learn()

Learns and returns itself.

Returns:: The model after fitting process.
Parameters:: self (R) –
Return type:: R

learn_with_optimizer(evaluator, trial, max_epoch=128, validate_epoch=5, score_degradation_max=5)

Learning procedures with early stopping and pruning.

Parameters:

evaluator (Optional[evaluation.Evaluator]) – The evaluator to measure the score.
trial (Optional[Trial]) – The current optuna trial under the study (if any.)
max_epoch (int) – Maximal number of epochs. If iterative learning procedure is not available, this parameter will be ignored. Defaults to 128.
validate_epoch (int) – The frequency of validation score measurement. If iterative learning procedure is not available, this parameter will be ignored. Defaults to 5.
validate_epoch – The frequency of validation score measurement. If iterative learning procedure is not available, this parameter will be ignored. Defaults to 5.
score_degradation_max (int) – Maximal number of allowed score degradation. If iterative learning procedure is not available, this parameter will be ignored. Defaults to 5.

Return type:

None

classmethod tune(data, evaluator, n_trials=20, timeout=None, data_suggest_function=None, parameter_suggest_function=None, fixed_params={}, random_seed=None, prunning_n_startup_trials=10, max_epoch=128, validate_epoch=5, score_degradation_max=5, logger=None)

Perform the optimization step. optuna.Study object is created inside this function.

Parameters:

data (Optional[Union[csr_matrix, csc_matrix]]) – The training data. You can also provide tunable parameter dependent training data by providing data_suggest_function. In that case, data must be None.
evaluator (evaluation.Evaluator) – The validation evaluator that measures the performance of the recommenders.
n_trials (int) – The number of expected trials (including pruned ones). Defaults to 20.
timeout (Optional[int]) – If set to some value (in seconds), the study will exit after that time period. Note that the running trials is not interrupted, though. Defaults to None.
data_suggest_function (Optional[Callable[[Trial], Union[csr_matrix, csc_matrix]]]) – If not None, this must be a function which takes optuna.Trial as its argument and returns training data. Defaults to None.
parameter_suggest_function (Optional[Callable[[Trial], Dict[str, Any]]]) – If not None, this must be a function which takes optuna.Trial as its argument and returns Dict[str, Any] (i.e., some keyword arguments of the recommender class). If None, cls.default_suggest_parameter will be used for the parameter suggestion. Defaults to None.
fixed_params (Dict[str, Any]) – Fixed parameters passed to recommenders during the optimization procedure. This will replace the suggested parameter (either by cls.default_suggest_parameter or parameter_suggest_function). Defaults to dict().
random_seed (Optional[int]) – The random seed to control optuna.samplers.TPESampler. Defaults to None.
prunning_n_startup_trials (int) – n_startup_trials argument passed to the constructor of optuna.pruners.MedianPruner.
max_epoch (int) – The maximal number of epochs for the training. If iterative learning procedure is not available, this parameter will be ignored.
validate_epoch (int, optional) – The frequency of validation score measurement. If iterative learning procedure is not available, this parameter will be ignored. Defaults to 5.
score_degradation_max (int, optional) – Maximal number of allowed score degradation. If iterative learning procedure is not available, this parameter will be ignored. Defaults to 5. Defaults to 5.
logger (Optional[Logger]) –

Returns:

A tuple that consists of

A dict containing the best paramaters. This dict can be passed to the recommender as **kwargs.

A pandas.DataFrame that contains the history of optimization.

Return type:

Tuple[Dict[str, Any], DataFrame]