Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Nightly test_approximate_nearest_neighbors.py::test_ivfflat FAILED #827

Closed
yinqingh opened this issue Jan 24, 2025 · 1 comment
Closed

Comments

@yinqingh
Copy link

Failed job: spark-rapids-ml_nightly/612
Failed case:

FAILED tests/test_approximate_nearest_neighbors.py::test_ivfflat[float32-combo0] - assert (0.5233380000000001 > 0.5335820000000001 or 0.01024400000000003 <= 0.01)

Detailed log:

[2025-01-24T04:59:48.166Z] =================================== FAILURES ===================================
[2025-01-24T04:59:48.166Z] _________________________ test_ivfflat[float32-combo0] _________________________
[2025-01-24T04:59:48.166Z] [gw0] linux -- Python 3.10.16 /root/miniconda3/bin/python3.10
[2025-01-24T04:59:48.166Z] 
[2025-01-24T04:59:48.166Z] combo = ('ivfflat', 'array', 10000, None, 'euclidean')
[2025-01-24T04:59:48.166Z] data_type = <class 'numpy.float32'>
[2025-01-24T04:59:48.166Z] 
[2025-01-24T04:59:48.166Z]     @pytest.mark.parametrize(
[2025-01-24T04:59:48.166Z]         "combo",
[2025-01-24T04:59:48.166Z]         [
[2025-01-24T04:59:48.166Z]             (
[2025-01-24T04:59:48.166Z]                 "ivfflat",
[2025-01-24T04:59:48.166Z]                 "array",
[2025-01-24T04:59:48.166Z]                 10000,
[2025-01-24T04:59:48.166Z]                 None,
[2025-01-24T04:59:48.166Z]                 "euclidean",
[2025-01-24T04:59:48.166Z]             ),
[2025-01-24T04:59:48.166Z]             (
[2025-01-24T04:59:48.166Z]                 "ivfflat",
[2025-01-24T04:59:48.166Z]                 "vector",
[2025-01-24T04:59:48.166Z]                 2000,
[2025-01-24T04:59:48.166Z]                 {"nlist": 10, "nprobe": 2},
[2025-01-24T04:59:48.166Z]                 "euclidean",
[2025-01-24T04:59:48.166Z]             ),
[2025-01-24T04:59:48.166Z]             (
[2025-01-24T04:59:48.166Z]                 "ivfflat",
[2025-01-24T04:59:48.166Z]                 "multi_cols",
[2025-01-24T04:59:48.166Z]                 5000,
[2025-01-24T04:59:48.166Z]                 {"nlist": 20, "nprobe": 4},
[2025-01-24T04:59:48.166Z]                 "euclidean",
[2025-01-24T04:59:48.166Z]             ),
[2025-01-24T04:59:48.166Z]             (
[2025-01-24T04:59:48.166Z]                 "ivfflat",
[2025-01-24T04:59:48.166Z]                 "array",
[2025-01-24T04:59:48.166Z]                 2000,
[2025-01-24T04:59:48.166Z]                 {"nlist": 10, "nprobe": 2},
[2025-01-24T04:59:48.166Z]                 "sqeuclidean",
[2025-01-24T04:59:48.166Z]             ),
[2025-01-24T04:59:48.166Z]             ("ivfflat", "vector", 5000, {"nlist": 20, "nprobe": 4}, "l2"),
[2025-01-24T04:59:48.166Z]             (
[2025-01-24T04:59:48.166Z]                 "ivfflat",
[2025-01-24T04:59:48.166Z]                 "multi_cols",
[2025-01-24T04:59:48.166Z]                 2000,
[2025-01-24T04:59:48.166Z]                 {"nlist": 10, "nprobe": 2},
[2025-01-24T04:59:48.166Z]                 "inner_product",
[2025-01-24T04:59:48.166Z]             ),
[2025-01-24T04:59:48.166Z]             (
[2025-01-24T04:59:48.166Z]                 "ivfflat",
[2025-01-24T04:59:48.166Z]                 "array",
[2025-01-24T04:59:48.166Z]                 2000,
[2025-01-24T04:59:48.166Z]                 {"nlist": 10, "nprobe": 2},
[2025-01-24T04:59:48.166Z]                 "cosine",
[2025-01-24T04:59:48.166Z]             ),
[2025-01-24T04:59:48.166Z]         ],
[2025-01-24T04:59:48.166Z]     )  # vector feature type will be converted to float32 to be compatible with cuml single-GPU NearestNeighbors Class
[2025-01-24T04:59:48.166Z]     @pytest.mark.parametrize("data_type", [np.float32])
[2025-01-24T04:59:48.166Z]     def test_ivfflat(
[2025-01-24T04:59:48.166Z]         combo: Tuple[str, str, int, Optional[Dict[str, Any]], str],
[2025-01-24T04:59:48.166Z]         data_type: np.dtype,
[2025-01-24T04:59:48.166Z]     ) -> None:
[2025-01-24T04:59:48.166Z]         algoParams = combo[3]
[2025-01-24T04:59:48.166Z]     
[2025-01-24T04:59:48.166Z]         # cuvs ivf_flat None sets nlist to 1000 and nprobe to 20, leading to unstable results when run multiple times
[2025-01-24T04:59:48.166Z]         expected_avg_recall: float = 0.95 if algoParams != None else 0.5
[2025-01-24T04:59:48.166Z]         expected_avg_dist_gap: float = 1e-4 if algoParams != None else 1e-2
[2025-01-24T04:59:48.166Z]         tolerance: float = 1e-4 if algoParams != None else 1e-2
[2025-01-24T04:59:48.166Z]         data_shape: Tuple[int, int] = (10000, 50)
[2025-01-24T04:59:48.166Z] >       ann_algorithm_test_func(
[2025-01-24T04:59:48.166Z]             combo=combo,
[2025-01-24T04:59:48.166Z]             data_shape=data_shape,
[2025-01-24T04:59:48.166Z]             data_type=data_type,
[2025-01-24T04:59:48.166Z]             expected_avg_recall=expected_avg_recall,
[2025-01-24T04:59:48.166Z]             expected_avg_dist_gap=expected_avg_dist_gap,
[2025-01-24T04:59:48.166Z]             tolerance=tolerance,
[2025-01-24T04:59:48.166Z]         )
[2025-01-24T04:59:48.166Z] 
[2025-01-24T04:59:48.166Z] tests/test_approximate_nearest_neighbors.py:632: 
[2025-01-24T04:59:48.166Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2025-01-24T04:59:48.166Z] tests/test_approximate_nearest_neighbors.py:506: in ann_algorithm_test_func
[2025-01-24T04:59:48.166Z]     ann_evaluator.compare_with_cuml_or_cuvs_sg(
[2025-01-24T04:59:48.166Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2025-01-24T04:59:48.166Z] 
[2025-01-24T04:59:48.166Z] self = <tests.test_approximate_nearest_neighbors.ANNEvaluator object at 0x7ff92393b4c0>
[2025-01-24T04:59:48.166Z] algorithm = 'ivfflat', algoParams = None
[2025-01-24T04:59:48.166Z] given_indices = array([[   0, 4709, 9361, ..., 3312, 7312, 5266],
[2025-01-24T04:59:48.166Z]        [   1, 8804, 1531, ..., 7962,  705, 2092],
[2025-01-24T04:59:48.166Z]        [   2, 5018...5482, 6680, 9051],
[2025-01-24T04:59:48.166Z]        [9998, 1102, 9694, ..., 1317, 2800,   17],
[2025-01-24T04:59:48.166Z]        [9999, 6308, 7655, ..., 8746, 3210, 8370]])
[2025-01-24T04:59:48.166Z] given_distances = array([[0.        , 0.17746431, 0.17917138, ..., 0.20973857, 0.21014556,
[2025-01-24T04:59:48.166Z]         0.21043234],
[2025-01-24T04:59:48.166Z]        [0.        , 0.17...84,
[2025-01-24T04:59:48.166Z]         0.22275288],
[2025-01-24T04:59:48.166Z]        [0.        , 0.15567206, 0.17591041, ..., 0.20936921, 0.20940953,
[2025-01-24T04:59:48.166Z]         0.20962319]])
[2025-01-24T04:59:48.166Z] tolerance = 0.01
[2025-01-24T04:59:48.166Z] 
[2025-01-24T04:59:48.166Z]     def compare_with_cuml_or_cuvs_sg(
[2025-01-24T04:59:48.166Z]         self,
[2025-01-24T04:59:48.166Z]         algorithm: str,
[2025-01-24T04:59:48.166Z]         algoParams: Optional[Dict[str, Any]],
[2025-01-24T04:59:48.166Z]         given_indices: np.ndarray,
[2025-01-24T04:59:48.166Z]         given_distances: np.ndarray,
[2025-01-24T04:59:48.166Z]         tolerance: float,
[2025-01-24T04:59:48.166Z]     ) -> None:
[2025-01-24T04:59:48.166Z]         # compare with cuml sg ANN on avg_recall and avg_dist_gap
[2025-01-24T04:59:48.166Z]         cuvssg_distances, cuvssg_indices = self.get_cuvs_sg_results(
[2025-01-24T04:59:48.166Z]             algorithm=algorithm, algoParams=algoParams
[2025-01-24T04:59:48.166Z]         )
[2025-01-24T04:59:48.166Z]     
[2025-01-24T04:59:48.166Z]         # compare cuml sg with given results
[2025-01-24T04:59:48.166Z]         avg_recall_cumlann = self.cal_avg_recall(cuvssg_indices)
[2025-01-24T04:59:48.166Z]         avg_recall = self.cal_avg_recall(given_indices)
[2025-01-24T04:59:48.166Z] >       assert (avg_recall > avg_recall_cumlann) or abs(
[2025-01-24T04:59:48.166Z]             avg_recall - avg_recall_cumlann
[2025-01-24T04:59:48.166Z]         ) <= tolerance
[2025-01-24T04:59:48.166Z] E       assert (0.5233380000000001 > 0.5335820000000001 or 0.01024400000000003 <= 0.01)
[2025-01-24T04:59:48.166Z] E        +  where 0.01024400000000003 = abs((0.5233380000000001 - 0.5335820000000001))
[2025-01-24T04:59:48.166Z] 
[2025-01-24T04:59:48.166Z] tests/test_approximate_nearest_neighbors.py:308: AssertionError
@lijinf2
Copy link
Collaborator

lijinf2 commented Jan 28, 2025

Fix has been merged: #828
Nightly gets passed.

@lijinf2 lijinf2 closed this as completed Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants