RUCAIBox · mkhe93 · Nov 30, 2024 · Nov 30, 2024 · Nov 30, 2024 · Nov 30, 2024
diff --git a/asset/model_list.json b/asset/model_list.json
@@ -154,6 +154,20 @@
           "repository": "RecBole",
           "repo_link": "https://github.com/RUCAIBox/RecBole"
       },
+      {
+          "category": "General Recommendation",
+          "cate_link": "/docs/user_guide/model_intro.html#general-recommendation",
+          "year": "2013",
+          "pub": "RecSys'13",
+          "model": "AsymKNN",
+          "model_link": "/docs/user_guide/model/general/asymknn.html",
+          "paper": "Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets",
+          "paper_link": "https://doi.org/10.1145/2507157.2507189",
+          "authors": "Fabio Aiolli",
+          "ref_code": "",
+          "repository": "RecBole",
+          "repo_link": "https://github.com/RUCAIBox/RecBole"
+      },
       {
           "category": "General Recommendation",
           "cate_link": "/docs/user_guide/model_intro.html#general-recommendation",

diff --git a/docs/source/recbole/recbole.model.general_recommender.asymknn.rst b/docs/source/recbole/recbole.model.general_recommender.asymknn.rst
@@ -0,0 +1,4 @@
+.. automodule:: recbole.model.general_recommender.asymknn
+   :members:
+   :undoc-members:
+   :show-inheritance:
diff --git a/docs/source/recbole/recbole.model.general_recommender.rst b/docs/source/recbole/recbole.model.general_recommender.rst
@@ -4,6 +4,7 @@ recbole.model.general\_recommender
 .. toctree::
    :maxdepth: 4
 
+   recbole.model.general_recommender.asymknn
    recbole.model.general_recommender.admmslim
    recbole.model.general_recommender.bpr
    recbole.model.general_recommender.cdae

diff --git a/docs/source/user_guide/config/evaluation_settings.rst b/docs/source/user_guide/config/evaluation_settings.rst
@@ -12,7 +12,7 @@ Evaluation settings are designed to set parameters about model evaluation.
 
   - ``order (str)``: decides how we sort the data in `.inter`. Now we support two kinds of ordering strategies: ``['RO', 'TO']``, which denotes the random ordering and temporal ordering. For ``RO``, we will shuffle the data and then split them in this order. For ``TO``, we will sort the data by the column of `TIME_FIELD` in ascending order and the split them in this order. The default value is ``RO``.
 
-  - ``split (dict)``: decides how we split the data in `.inter`. Now we support two kinds of splitting strategies: ``['RS','LS']``, which denotes the ratio-based data splitting and leave-one-out data splitting. If the key of ``split`` is ``RS``, you need to set the splitting ratio like ``[0.8,0.1,0.1]``, ``[7,2,1]`` or ``[8,0,2]``, which denotes the ratio of training set, validation set and testing set respectively. If the key of ``split`` is ``LS``, now we support three kinds of ``LS`` mode: ``['valid_and_test', 'valid_only', 'test_only']`` and you should choose one mode as the value of ``LS``.  The default value of ``split`` is ``{'RS': [0.8,0.1,0.1]}``.
+  - ``split (dict)``: decides how we split the data in `.inter`. Now we support two kinds of splitting strategies: ``['RS','LS','LK']``, which denotes the ratio-based data splitting, leave-one-out data splitting, and leave-k-out data splitting. If the key of ``split`` is ``RS``, you need to set the splitting ratio like ``[0.8,0.1,0.1]``, ``[7,2,1]`` or ``[8,0,2]``, which denotes the ratio of training set, validation set and testing set respectively. If the key of ``split`` is ``LS`` (or ``LK``), now we support three kinds of ``LS`` (``LK``)  mode: ``['valid_and_test', 'valid_only', 'test_only']`` and you should choose one mode as the value of ``LS`` (``LK``). For ``LK``, you also need to set the mode and the number ``k`` by providing a list in the following format: ``['valid_and_test', k]``. The number ``k`` represents the number of elements that will be left out according to the specified mode. The default value of ``split`` is ``{'RS': [0.8,0.1,0.1]}``.
 
   - ``mode (str|dict)``: decides the data range when we evaluate the model during ``valid`` and ``test`` phase. Now we support four kinds of evaluation mode: ``['full','unixxx','popxxx','labeled']``. ``full`` , ``unixxx`` and ``popxxx`` are designed for the evaluation on implicit feedback (data without label). For implicit feedback, we regard the items with observed interactions as positive items and those without observed interactions as negative items. ``full`` means evaluating the model on the set of all items. ``unixxx``, for example ``uni100``,  means uniformly sample 100 negative items for each positive item in testing set, and evaluate the model on these positive items with their sampled negative items. ``popxxx``, for example ``pop100``, means sample 100 negative items for each positive item in testing set based on item popularity (:obj:`Counter(item)` in `.inter` file), and evaluate the model on these positive items with their sampled negative items. Here the `xxx` must be an integer. For explicit feedback (data with label), you should set the mode as ``labeled`` and we will evaluate the model based on your label. You can use ``valid`` and ``test`` as the dict key to set specific ``mode`` in different phases. The default value is ``full``, which is equivalent to ``{'valid': 'full', 'test': 'full'}``.
 

diff --git a/docs/source/user_guide/model/general/asymknn.rst b/docs/source/user_guide/model/general/asymknn.rst
@@ -0,0 +1,88 @@
+AsymKNN
+===========
+
+Introduction
+---------------------
+
+`[paper] <https://dl.acm.org/doi/pdf/10.1145/2507157.25071896>`_
+
+**Title:** Efficient Top-N Recommendation for Very Large Scale Binary Rated Datasets
+
+**Authors:** Fabio Aiolli
+
+**Abstract:** We present a simple and scalable algorithm for top-N recommendation able to deal with very large datasets and (binary rated) implicit feedback. We focus on memory-based collaborative filtering
+algorithms similar to the well known neighboor based technique for explicit feedback. The major difference, that makes the algorithm particularly scalable, is that it uses positive feedback only
+and no explicit computation of the complete (user-by-user or itemby-item) similarity matrix needs to be performed.
+The study of the proposed algorithm has been conducted on data from the Million Songs Dataset (MSD) challenge whose task was to suggest a set of songs (out of more than 380k available songs) to more than 100k users given half of the user listening history and
+complete listening history of other 1 million people.
+In particular, we investigate on the entire recommendation pipeline, starting from the definition of suitable similarity and scoring functions and suggestions on how to aggregate multiple ranking strategies to define the overall recommendation. The technique we are
+proposing extends and improves the one that already won the MSD challenge last year.
+
+In this article, we introduce a versatile class of recommendation algorithms that calculate either user-to-user or item-to-item similarities as the foundation for generating recommendations. This approach enables the flexibility to switch between UserKNN and ItemKNN models depending on the desired application.
+
+A distinguishing feature of this class of algorithms, exemplified by AsymKNN, is its use of asymmetric cosine similarity, which generalizes the traditional cosine similarity. Specifically, when the asymmetry parameter
+``alpha = 0.5``, the method reduces to the standard cosine similarity, while other values of ``alpha`` allow for tailored emphasis on specific aspects of the interaction data. Furthermore, setting the parameter
+``beta = 1.0`` ensures a traditional UserKNN or ItemKNN, as the final scores are only divided by a fixed positive constant, preserving the same order of recommendations.
+
+Running with RecBole
+-------------------------
+
+**Model Hyper-Parameters:**
+
+- ``k (int)`` : The neighborhood size. Defaults to ``100``.
+
+- ``alpha (float)`` : Weight parameter for asymmetric cosine similarity. Defaults to ``0.5``.
+
+- ``beta (float)`` : Parameter for controlling the balance between factors in the final score normalization. Defaults to ``1.0``.
+
+- ``q (int)`` : The 'locality of scoring function' parameter. Defaults to ``1``.
+
+**Additional Parameters:**
+
+- ``knn_method (str)`` : Calculate the similarity of users if method is 'user', otherwise, calculate the similarity of items.. Defaults to ``item``.
+
+
+**A Running Example:**
+
+Write the following code to a python file, such as `run.py`
+
+.. code:: python
+
+   from recbole.quick_start import run_recbole
+
+   run_recbole(model='AsymKNN', dataset='ml-100k')
+
+And then:
+
+.. code:: bash
+
+   python run.py
+
+Tuning Hyper Parameters
+-------------------------
+
+If you want to use ``HyperTuning`` to tune hyper parameters of this model, you can copy the following settings and name it as ``hyper.test``.
+
+.. code:: bash
+
+   k choice [10,50,100,200,250,300,400,500,1000,1500,2000,2500]
+   alpha choice [0.0,0.2,0.5,0.8,1.0]
+   beta choice [0.0,0.2,0.5,0.8,1.0]
+   q choice [1,2,3,4,5,6]
+
+Note that we just provide these hyper parameter ranges for reference only, and we can not guarantee that they are the optimal range of this model.
+
+Then, with the source code of RecBole (you can download it from GitHub), you can run the ``run_hyper.py`` to tuning:
+
+.. code:: bash
+
+	python run_hyper.py --model=[model_name] --dataset=[dataset_name] --config_files=[config_files_path] --params_file=hyper.test
+
+For more details about Parameter Tuning, refer to :doc:`../../../user_guide/usage/parameter_tuning`.
+
+If you want to change parameters, dataset or evaluation settings, take a look at
+
+- :doc:`../../../user_guide/config_settings`
+- :doc:`../../../user_guide/data_intro`
+- :doc:`../../../user_guide/train_eval_intro`
+- :doc:`../../../user_guide/usage`
diff --git a/docs/source/user_guide/model_intro.rst b/docs/source/user_guide/model_intro.rst
@@ -13,6 +13,7 @@ task of top-n recommendation. All the collaborative filter(CF) based models are
 .. toctree::
    :maxdepth: 1
 
+   model/general/asymknn
    model/general/pop
    model/general/itemknn
    model/general/bpr

diff --git a/docs/source/user_guide/train_eval_intro.rst b/docs/source/user_guide/train_eval_intro.rst
@@ -42,6 +42,7 @@ items or a sampled-based ranking.
   RO                        Random Ordering
   TO                        Temporal Ordering
   LS                        Leave-one-out Splitting
+  LK                        Leave-k-out Splitting
   RS                        Ratio-based Splitting
   full                      full ranking with all item candidates
   uniN                      sample-based ranking: each positive item is paired with N sampled negative items in uniform distribution
@@ -54,7 +55,7 @@ The parameters used to control the evaluation method are as follows:
   including ``split``, ``group_by``, ``order`` and ``mode``.
 
   - ``split (dict)``:  Control the splitting of dataset and the split ratio. The key is splitting method
-    and value is the list of split ratio. The range of key is ``[RS,LS]``. Defaults to ``{'RS':[0.8, 0.1, 0.1]}``
+    and value is the list of split ratio. The range of key is ``[RS,LS,LK]``. Defaults to ``{'RS':[0.8, 0.1, 0.1]}``
   - ``group_by (str)``: Whether to split dataset with the group of user.
     Range in ``[None, user]`` and defaults to ``user``.
   - ``order (str)``: Control the ordering of data and affect the splitting of data.

diff --git a/recbole/data/dataset/dataset.py b/recbole/data/dataset/dataset.py
@@ -1729,6 +1729,74 @@ def leave_one_out(self, group_by, leave_one_mode):
         next_ds = [self.copy(_) for _ in next_df]
         return next_ds
 
+    def _split_index_by_leave_k_out(self, grouped_index, leave_k_num, k):
+        """Split indexes by strategy leave one out.
+
+        Args:
+            grouped_index (list of list of int): Index to be split.
+            leave_k_num (int): Number of parts whose length is expected to be ``1``.
+
+        Returns:
+            list: List of index that has been split.
+        """
+        #print(list(grouped_index)[0])
+        next_index = [[] for _ in range(leave_k_num + 1)]
+        for index in grouped_index:
+            index = list(index)
+            tot_cnt = len(index)
+            legal_leave_k_num = min(leave_k_num, tot_cnt - 1)
+            pr = tot_cnt - k
+            next_index[0].extend(index[:pr])
+            for i in range(legal_leave_k_num):
+                next_index[-legal_leave_k_num + i].extend(index[pr:])
+                pr += 1
+        #print(next_index[0][:len(list(grouped_index)[0])])
+        return next_index
+
+    def leave_k_out(self, group_by, leave_k_mode, k):
+        """Split interaction records by leave k out strategy.
+
+        Args:
+            group_by (str): Field name that interaction records should grouped by before splitting.
+            leave_k_mode (str): The way to leave one out. It can only take three values:
+                'valid_and_test', 'valid_only' and 'test_only'.
+
+        Returns:
+            list: List of :class:`~Dataset`, whose interaction features has been split.
+        """
+        self.logger.debug(
+            f"leave k out, group_by=[{group_by}], leave_k_mode=[{leave_k_mode}]"
+        )
+        if group_by is None:
+            raise ValueError("leave one out strategy require a group field")
+
+        grouped_inter_feat_index = self._grouped_index(
+            self.inter_feat[group_by].numpy()
+        )
+        if leave_k_mode == "valid_and_test":
+            next_index = self._split_index_by_leave_k_out(
+                grouped_inter_feat_index, leave_k_num=2, k=k
+            )
+        elif leave_k_mode == "valid_only":
+            next_index = self._split_index_by_leave_k_out(
+                grouped_inter_feat_index, leave_k_num=1, k=k
+            )
+            next_index.append([])
+        elif leave_k_mode == "test_only":
+            next_index = self._split_index_by_leave_k_out(
+                grouped_inter_feat_index, leave_k_num=1, k=k
+            )
+            next_index = [next_index[0], [], next_index[1]]
+        else:
+            raise NotImplementedError(
+                f"The leave_one_mode [{leave_k_mode}] has not been implemented."
+            )
+
+        self._drop_unused_col()
+        next_df = [self.inter_feat[index] for index in next_index]
+        next_ds = [self.copy(_) for _ in next_df]
+        return next_ds
+
     def shuffle(self):
         """Shuffle the interaction records inplace."""
         self.inter_feat.shuffle()
@@ -1799,6 +1867,10 @@ def build(self):
             datasets = self.leave_one_out(
                 group_by=self.uid_field, leave_one_mode=split_args["LS"]
             )
+        elif split_mode == "LK":
+            datasets = self.leave_k_out(
+                group_by=self.uid_field, leave_k_mode=split_args["LK"][0], k=split_args["LK"][1]
+            )
         else:
             raise NotImplementedError(
                 f"The splitting_method [{split_mode}] has not been implemented."

diff --git a/recbole/model/general_recommender/__init__.py b/recbole/model/general_recommender/__init__.py
@@ -1,3 +1,4 @@
+from recbole.model.general_recommender.asymknn import AsymKNN
 from recbole.model.general_recommender.bpr import BPR
 from recbole.model.general_recommender.cdae import CDAE
 from recbole.model.general_recommender.convncf import ConvNCF