Skip to content

Commit

Permalink
(fix): fixed inconsistenicies in the code and removed debug statements
Browse files Browse the repository at this point in the history
Signed-off-by: Kannav02 <[email protected]>
  • Loading branch information
Kannav02 committed Nov 20, 2024
1 parent afd2259 commit e4ccf8d
Showing 1 changed file with 5 additions and 7 deletions.
12 changes: 5 additions & 7 deletions common/mongoClient.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,12 @@
feedback_collection_name = "feedback"
context_collection_name = "context"

print(os.getenv("MONGO_DB_URI"))


def get_mongo_client() -> Database:
def get_mongo_db_client() -> Database:
"""
Get the MongoDB client.
Returns:
- MongoClient: The MongoDB client.
- DatabaseClient: The Database client.
Note:
MongoDB doesn't create a collection or a database until it gets content, so no need to check if the data already exists or not.
Expand Down Expand Up @@ -149,5 +146,6 @@ def create_collection(collection_name:str,client_database:Database,validator:obj
except Exception as e:
print(f"Failed to create collection: {e}")
return None

submit_feedback()

if __name__ == "__main__":
submit_feedback()

2 comments on commit e4ccf8d

@luarss
Copy link
Collaborator

@luarss luarss commented on e4ccf8d Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

===================================
==> Dataset: EDA Corpus
==> Running tests for agent-retriever
/home/luarss/actions-runner/_work/ORAssistant/ORAssistant/evaluation/.venv/lib/python3.12/site-packages/deepeval/init.py:49: UserWarning: You are using deepeval version 1.4.9, however version 1.5.8 is available. You should consider upgrading via the "pip install --upgrade deepeval" command.
warnings.warn(

Fetching 2 files: 0%| | 0/2 [00:00<?, ?it/s]
Fetching 2 files: 50%|█████ | 1/2 [00:00<00:00, 4.16it/s]
Fetching 2 files: 100%|██████████| 2/2 [00:00<00:00, 8.31it/s]

Evaluating: 0%| | 0/100 [00:00<?, ?it/s]
Evaluating: 1%| | 1/100 [00:13<22:36, 13.70s/it]
Evaluating: 2%|▏ | 2/100 [00:25<20:26, 12.52s/it]
Evaluating: 3%|▎ | 3/100 [00:37<20:04, 12.41s/it]
Evaluating: 4%|▍ | 4/100 [00:46<17:30, 10.94s/it]
Evaluating: 5%|▌ | 5/100 [00:57<17:22, 10.98s/it]
Evaluating: 6%|▌ | 6/100 [01:08<17:08, 10.94s/it]
Evaluating: 7%|▋ | 7/100 [01:20<17:27, 11.26s/it]
Evaluating: 8%|▊ | 8/100 [01:32<17:34, 11.46s/it]
Evaluating: 9%|▉ | 9/100 [01:44<17:37, 11.62s/it]
Evaluating: 10%|█ | 10/100 [01:54<17:03, 11.37s/it]
Evaluating: 11%|█ | 11/100 [02:05<16:37, 11.21s/it]
Evaluating: 12%|█▏ | 12/100 [02:15<15:41, 10.70s/it]
Evaluating: 13%|█▎ | 13/100 [02:27<16:08, 11.14s/it]
Evaluating: 14%|█▍ | 14/100 [02:38<16:09, 11.27s/it]
Evaluating: 15%|█▌ | 15/100 [02:49<15:41, 11.07s/it]
Evaluating: 16%|█▌ | 16/100 [03:01<15:57, 11.39s/it]
Evaluating: 17%|█▋ | 17/100 [03:13<15:57, 11.54s/it]
Evaluating: 18%|█▊ | 18/100 [03:25<15:46, 11.54s/it]
Evaluating: 19%|█▉ | 19/100 [03:35<15:05, 11.18s/it]
Evaluating: 20%|██ | 20/100 [03:46<14:43, 11.05s/it]
Evaluating: 21%|██ | 21/100 [03:56<14:08, 10.74s/it]
Evaluating: 22%|██▏ | 22/100 [04:08<14:26, 11.11s/it]
Evaluating: 23%|██▎ | 23/100 [04:18<14:05, 10.98s/it]
Evaluating: 24%|██▍ | 24/100 [04:28<13:24, 10.59s/it]
Evaluating: 25%|██▌ | 25/100 [04:38<12:56, 10.35s/it]
Evaluating: 26%|██▌ | 26/100 [04:47<12:29, 10.12s/it]
Evaluating: 27%|██▋ | 27/100 [04:59<12:47, 10.51s/it]
Evaluating: 28%|██▊ | 28/100 [05:10<12:57, 10.80s/it]
Evaluating: 29%|██▉ | 29/100 [05:21<12:47, 10.81s/it]
Evaluating: 30%|███ | 30/100 [05:33<12:56, 11.09s/it]
Evaluating: 31%|███ | 31/100 [05:44<12:51, 11.18s/it]
Evaluating: 32%|███▏ | 32/100 [05:55<12:36, 11.12s/it]
Evaluating: 33%|███▎ | 33/100 [06:07<12:37, 11.31s/it]
Evaluating: 34%|███▍ | 34/100 [06:19<12:47, 11.63s/it]
Evaluating: 35%|███▌ | 35/100 [06:30<12:24, 11.45s/it]
Evaluating: 36%|███▌ | 36/100 [06:41<11:53, 11.14s/it]
Evaluating: 37%|███▋ | 37/100 [06:51<11:26, 10.89s/it]
Evaluating: 38%|███▊ | 38/100 [07:02<11:14, 10.88s/it]
Evaluating: 39%|███▉ | 39/100 [07:12<10:44, 10.57s/it]
Evaluating: 40%|████ | 40/100 [07:22<10:26, 10.44s/it]
Evaluating: 41%|████ | 41/100 [07:34<10:43, 10.90s/it]
Evaluating: 42%|████▏ | 42/100 [07:45<10:30, 10.88s/it]
Evaluating: 43%|████▎ | 43/100 [07:55<10:14, 10.78s/it]
Evaluating: 44%|████▍ | 44/100 [08:07<10:16, 11.01s/it]
Evaluating: 45%|████▌ | 45/100 [08:18<10:14, 11.17s/it]
Evaluating: 46%|████▌ | 46/100 [08:31<10:26, 11.61s/it]
Evaluating: 47%|████▋ | 47/100 [08:45<10:56, 12.39s/it]
Evaluating: 48%|████▊ | 48/100 [08:59<11:01, 12.72s/it]
Evaluating: 49%|████▉ | 49/100 [09:09<10:13, 12.03s/it]
Evaluating: 50%|█████ | 50/100 [09:21<09:59, 12.00s/it]
Evaluating: 51%|█████ | 51/100 [09:33<09:41, 11.87s/it]
Evaluating: 52%|█████▏ | 52/100 [09:45<09:30, 11.89s/it]
Evaluating: 53%|█████▎ | 53/100 [09:55<09:02, 11.55s/it]
Evaluating: 54%|█████▍ | 54/100 [10:07<08:52, 11.57s/it]
Evaluating: 55%|█████▌ | 55/100 [10:19<08:39, 11.55s/it]
Evaluating: 56%|█████▌ | 56/100 [10:29<08:20, 11.37s/it]
Evaluating: 57%|█████▋ | 57/100 [10:40<08:01, 11.19s/it]
Evaluating: 58%|█████▊ | 58/100 [10:51<07:41, 10.98s/it]
Evaluating: 59%|█████▉ | 59/100 [11:01<07:27, 10.91s/it]
Evaluating: 60%|██████ | 60/100 [11:14<07:30, 11.26s/it]
Evaluating: 61%|██████ | 61/100 [11:23<06:59, 10.74s/it]
Evaluating: 62%|██████▏ | 62/100 [11:34<06:48, 10.74s/it]
Evaluating: 63%|██████▎ | 63/100 [11:45<06:37, 10.75s/it]
Evaluating: 64%|██████▍ | 64/100 [11:57<06:39, 11.10s/it]
Evaluating: 65%|██████▌ | 65/100 [12:06<06:11, 10.61s/it]
Evaluating: 66%|██████▌ | 66/100 [12:17<06:03, 10.70s/it]
Evaluating: 67%|██████▋ | 67/100 [12:28<05:57, 10.82s/it]
Evaluating: 68%|██████▊ | 68/100 [12:38<05:37, 10.53s/it]
Evaluating: 69%|██████▉ | 69/100 [12:48<05:24, 10.47s/it]
Evaluating: 70%|███████ | 70/100 [12:58<05:05, 10.17s/it]
Evaluating: 71%|███████ | 71/100 [13:08<04:58, 10.29s/it]
Evaluating: 72%|███████▏ | 72/100 [13:19<04:50, 10.37s/it]
Evaluating: 73%|███████▎ | 73/100 [13:31<04:52, 10.82s/it]
Evaluating: 74%|███████▍ | 74/100 [13:42<04:42, 10.85s/it]
Evaluating: 75%|███████▌ | 75/100 [13:54<04:39, 11.18s/it]
Evaluating: 76%|███████▌ | 76/100 [14:05<04:29, 11.21s/it]
Evaluating: 77%|███████▋ | 77/100 [14:18<04:29, 11.71s/it]
Evaluating: 78%|███████▊ | 78/100 [14:29<04:14, 11.55s/it]
Evaluating: 79%|███████▉ | 79/100 [14:39<03:54, 11.15s/it]
Evaluating: 80%|████████ | 80/100 [14:49<03:37, 10.88s/it]
Evaluating: 81%|████████ | 81/100 [15:00<03:26, 10.86s/it]
Evaluating: 82%|████████▏ | 82/100 [15:11<03:15, 10.85s/it]
Evaluating: 83%|████████▎ | 83/100 [15:20<02:57, 10.42s/it]
Evaluating: 84%|████████▍ | 84/100 [15:31<02:46, 10.40s/it]
Evaluating: 85%|████████▌ | 85/100 [15:40<02:31, 10.13s/it]
Evaluating: 86%|████████▌ | 86/100 [15:52<02:29, 10.69s/it]
Evaluating: 87%|████████▋ | 87/100 [16:03<02:18, 10.66s/it]
Evaluating: 88%|████████▊ | 88/100 [16:15<02:14, 11.20s/it]
Evaluating: 89%|████████▉ | 89/100 [16:26<02:01, 11.02s/it]
Evaluating: 90%|█████████ | 90/100 [16:38<01:54, 11.49s/it]
Evaluating: 91%|█████████ | 91/100 [16:51<01:45, 11.77s/it]
Evaluating: 92%|█████████▏| 92/100 [17:02<01:32, 11.59s/it]
Evaluating: 93%|█████████▎| 93/100 [17:14<01:22, 11.73s/it]
Evaluating: 94%|█████████▍| 94/100 [17:24<01:07, 11.20s/it]
Evaluating: 95%|█████████▌| 95/100 [17:34<00:54, 10.82s/it]
Evaluating: 96%|█████████▌| 96/100 [17:44<00:42, 10.53s/it]
Evaluating: 97%|█████████▋| 97/100 [17:55<00:32, 10.74s/it]
Evaluating: 98%|█████████▊| 98/100 [18:07<00:21, 10.99s/it]
Evaluating: 99%|█████████▉| 99/100 [18:17<00:10, 10.75s/it]
Evaluating: 100%|██████████| 100/100 [18:29<00:00, 11.04s/it]
Evaluating: 100%|██████████| 100/100 [18:29<00:00, 11.09s/it]
✨ You're running DeepEval's latest Contextual Precision Metric! (using
gemini-1.5-pro-002, strict=False, async_mode=True)...
✨ You're running DeepEval's latest Contextual Recall Metric! (using
gemini-1.5-pro-002, strict=False, async_mode=True)...
✨ You're running DeepEval's latest Hallucination Metric! (using
gemini-1.5-pro-002, strict=False, async_mode=True)...

Evaluating 100 test case(s) in parallel: | | 0% (0/100) [Time Taken: 00:00, ?test case/s]
Evaluating 100 test case(s) in parallel: | | 1% (1/100) [Time Taken: 00:15, 15.07s/test case]
Evaluating 100 test case(s) in parallel: |▏ | 2% (2/100) [Time Taken: 00:15, 6.31s/test case]
Evaluating 100 test case(s) in parallel: |▎ | 3% (3/100) [Time Taken: 00:15, 3.70s/test case]
Evaluating 100 test case(s) in parallel: |▍ | 4% (4/100) [Time Taken: 00:16, 2.55s/test case]
Evaluating 100 test case(s) in parallel: |▌ | 5% (5/100) [Time Taken: 00:16, 1.69s/test case]
Evaluating 100 test case(s) in parallel: |▌ | 6% (6/100) [Time Taken: 00:17, 1.21s/test case]
Evaluating 100 test case(s) in parallel: |▋ | 7% (7/100) [Time Taken: 00:17, 1.15test case/s]
Evaluating 100 test case(s) in parallel: |▉ | 9% (9/100) [Time Taken: 00:17, 2.11test case/s]
Evaluating 100 test case(s) in parallel: |█ | 11% (11/100) [Time Taken: 00:17, 3.32test case/s]
Evaluating 100 test case(s) in parallel: |█▎ | 13% (13/100) [Time Taken: 00:18, 2.99test case/s]
Evaluating 100 test case(s) in parallel: |█▍ | 14% (14/100) [Time Taken: 00:18, 3.15test case/s]
Evaluating 100 test case(s) in parallel: |█▌ | 15% (15/100) [Time Taken: 00:18, 2.95test case/s]
Evaluating 100 test case(s) in parallel: |█▌ | 16% (16/100) [Time Taken: 00:19, 2.75test case/s]
Evaluating 100 test case(s) in parallel: |█▋ | 17% (17/100) [Time Taken: 00:19, 3.04test case/s]
Evaluating 100 test case(s) in parallel: |█▊ | 18% (18/100) [Time Taken: 00:19, 2.81test case/s]
Evaluating 100 test case(s) in parallel: |██ | 20% (20/100) [Time Taken: 00:20, 4.25test case/s]
Evaluating 100 test case(s) in parallel: |██▍ | 24% (24/100) [Time Taken: 00:20, 5.70test case/s]
Evaluating 100 test case(s) in parallel: |██▌ | 25% (25/100) [Time Taken: 00:20, 5.89test case/s]
Evaluating 100 test case(s) in parallel: |██▌ | 26% (26/100) [Time Taken: 00:21, 5.00test case/s]
Evaluating 100 test case(s) in parallel: |██▋ | 27% (27/100) [Time Taken: 00:21, 3.43test case/s]
Evaluating 100 test case(s) in parallel: |██▊ | 28% (28/100) [Time Taken: 00:22, 3.45test case/s]
Evaluating 100 test case(s) in parallel: |███ | 31% (31/100) [Time Taken: 00:22, 5.92test case/s]
Evaluating 100 test case(s) in parallel: |███▏ | 32% (32/100) [Time Taken: 00:22, 4.33test case/s]
Evaluating 100 test case(s) in parallel: |███▍ | 34% (34/100) [Time Taken: 00:23, 3.68test case/s]
Evaluating 100 test case(s) in parallel: |███▌ | 36% (36/100) [Time Taken: 00:23, 4.13test case/s]
Evaluating 100 test case(s) in parallel: |███▋ | 37% (37/100) [Time Taken: 00:23, 4.07test case/s]
Evaluating 100 test case(s) in parallel: |███▉ | 39% (39/100) [Time Taken: 00:24, 4.01test case/s]
Evaluating 100 test case(s) in parallel: |████ | 40% (40/100) [Time Taken: 00:24, 3.68test case/s]
Evaluating 100 test case(s) in parallel: |████▎ | 43% (43/100) [Time Taken: 00:25, 3.67test case/s]
Evaluating 100 test case(s) in parallel: |████▌ | 45% (45/100) [Time Taken: 00:25, 4.76test case/s]
Evaluating 100 test case(s) in parallel: |████▊ | 48% (48/100) [Time Taken: 00:25, 6.99test case/s]
Evaluating 100 test case(s) in parallel: |█████ | 50% (50/100) [Time Taken: 00:26, 7.65test case/s]
Evaluating 100 test case(s) in parallel: |█████▏ | 52% (52/100) [Time Taken: 00:26, 5.31test case/s]
Evaluating 100 test case(s) in parallel: |█████▎ | 53% (53/100) [Time Taken: 00:27, 4.38test case/s]
Evaluating 100 test case(s) in parallel: |█████▌ | 56% (56/100) [Time Taken: 00:27, 5.71test case/s]
Evaluating 100 test case(s) in parallel: |█████▊ | 58% (58/100) [Time Taken: 00:27, 6.93test case/s]
Evaluating 100 test case(s) in parallel: |██████ | 60% (60/100) [Time Taken: 00:28, 6.07test case/s]
Evaluating 100 test case(s) in parallel: |██████ | 61% (61/100) [Time Taken: 00:28, 6.41test case/s]
Evaluating 100 test case(s) in parallel: |██████▎ | 63% (63/100) [Time Taken: 00:28, 8.13test case/s]
Evaluating 100 test case(s) in parallel: |██████▌ | 65% (65/100) [Time Taken: 00:28, 6.53test case/s]
Evaluating 100 test case(s) in parallel: |██████▌ | 66% (66/100) [Time Taken: 00:29, 5.72test case/s]
Evaluating 100 test case(s) in parallel: |██████▊ | 68% (68/100) [Time Taken: 00:29, 3.96test case/s]
Evaluating 100 test case(s) in parallel: |██████▉ | 69% (69/100) [Time Taken: 00:29, 4.44test case/s]
Evaluating 100 test case(s) in parallel: |███████ | 70% (70/100) [Time Taken: 00:30, 4.55test case/s]
Evaluating 100 test case(s) in parallel: |███████ | 71% (71/100) [Time Taken: 00:30, 3.18test case/s]
Evaluating 100 test case(s) in parallel: |███████▎ | 73% (73/100) [Time Taken: 00:31, 3.60test case/s]
Evaluating 100 test case(s) in parallel: |███████▍ | 74% (74/100) [Time Taken: 00:31, 4.00test case/s]
Evaluating 100 test case(s) in parallel: |███████▌ | 75% (75/100) [Time Taken: 00:31, 4.64test case/s]
Evaluating 100 test case(s) in parallel: |███████▋ | 77% (77/100) [Time Taken: 00:32, 3.41test case/s]
Evaluating 100 test case(s) in parallel: |███████▊ | 78% (78/100) [Time Taken: 00:32, 3.87test case/s]
Evaluating 100 test case(s) in parallel: |███████▉ | 79% (79/100) [Time Taken: 00:33, 2.90test case/s]
Evaluating 100 test case(s) in parallel: |████████ | 80% (80/100) [Time Taken: 00:33, 3.02test case/s]
Evaluating 100 test case(s) in parallel: |████████▏ | 82% (82/100) [Time Taken: 00:33, 4.20test case/s]
Evaluating 100 test case(s) in parallel: |████████▎ | 83% (83/100) [Time Taken: 00:34, 2.31test case/s]
Evaluating 100 test case(s) in parallel: |████████▍ | 84% (84/100) [Time Taken: 00:35, 2.36test case/s]
Evaluating 100 test case(s) in parallel: |████████▌ | 85% (85/100) [Time Taken: 00:35, 2.76test case/s]
Evaluating 100 test case(s) in parallel: |████████▌ | 86% (86/100) [Time Taken: 00:35, 2.15test case/s]
Evaluating 100 test case(s) in parallel: |████████▋ | 87% (87/100) [Time Taken: 00:36, 2.23test case/s]
Evaluating 100 test case(s) in parallel: |████████▉ | 89% (89/100) [Time Taken: 00:37, 2.22test case/s]
Evaluating 100 test case(s) in parallel: |█████████ | 91% (91/100) [Time Taken: 00:37, 3.27test case/s]
Evaluating 100 test case(s) in parallel: |█████████▏| 92% (92/100) [Time Taken: 00:38, 2.06test case/s]
Evaluating 100 test case(s) in parallel: |█████████▎| 93% (93/100) [Time Taken: 00:39, 1.54test case/s]
Evaluating 100 test case(s) in parallel: |█████████▍| 94% (94/100) [Time Taken: 00:40, 1.54test case/s]
Evaluating 100 test case(s) in parallel: |█████████▌| 95% (95/100) [Time Taken: 00:40, 1.92test case/s]
Evaluating 100 test case(s) in parallel: |█████████▌| 96% (96/100) [Time Taken: 00:41, 1.63test case/s]
Evaluating 100 test case(s) in parallel: |█████████▋| 97% (97/100) [Time Taken: 00:41, 2.11test case/s]
Evaluating 100 test case(s) in parallel: |█████████▊| 98% (98/100) [Time Taken: 00:45, 1.47s/test case]
Evaluating 100 test case(s) in parallel: |█████████▉| 99% (99/100) [Time Taken: 00:46, 1.40s/test case]
Evaluating 100 test case(s) in parallel: |██████████|100% (100/100) [Time Taken: 01:02, 5.55s/test case]
Evaluating 100 test case(s) in parallel: |██████████|100% (100/100) [Time Taken: 01:02, 1.61test case/s]
✓ Tests finished 🎉! Run 'deepeval login' to save and analyze evaluation results
on Confident AI.
‼️ Friendly reminder 😇: You can also run evaluations with ALL of deepeval's
metrics directly on Confident AI instead.
Average Metric Scores:
Contextual Precision 0.7099702380952381
Contextual Recall 0.8604999999999999
Hallucination 0.47328030303030305
Metric Passrates:
Contextual Precision 0.68
Contextual Recall 0.83
Hallucination 0.59

@luarss
Copy link
Collaborator

@luarss luarss commented on e4ccf8d Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

===================================
==> Dataset: EDA Corpus
==> Running tests for agent-retriever
/home/luarss/actions-runner/_work/ORAssistant/ORAssistant/evaluation/.venv/lib/python3.12/site-packages/deepeval/init.py:49: UserWarning: You are using deepeval version 1.4.9, however version 1.5.8 is available. You should consider upgrading via the "pip install --upgrade deepeval" command.
warnings.warn(

Fetching 2 files: 0%| | 0/2 [00:00<?, ?it/s]
Fetching 2 files: 50%|█████ | 1/2 [00:00<00:00, 4.19it/s]
Fetching 2 files: 100%|██████████| 2/2 [00:00<00:00, 8.37it/s]

Evaluating: 0%| | 0/100 [00:00<?, ?it/s]
Evaluating: 1%| | 1/100 [00:13<22:12, 13.46s/it]
Evaluating: 2%|▏ | 2/100 [00:24<19:49, 12.14s/it]
Evaluating: 3%|▎ | 3/100 [00:37<20:21, 12.59s/it]
Evaluating: 4%|▍ | 4/100 [00:46<17:46, 11.11s/it]
Evaluating: 5%|▌ | 5/100 [00:58<18:01, 11.39s/it]
Evaluating: 6%|▌ | 6/100 [01:09<17:38, 11.26s/it]
Evaluating: 7%|▋ | 7/100 [01:20<17:31, 11.31s/it]
Evaluating: 8%|▊ | 8/100 [01:32<17:29, 11.41s/it]
Evaluating: 9%|▉ | 9/100 [01:45<18:01, 11.89s/it]
Evaluating: 10%|█ | 10/100 [01:55<17:06, 11.40s/it]
Evaluating: 11%|█ | 11/100 [02:06<16:43, 11.28s/it]
Evaluating: 12%|█▏ | 12/100 [02:16<16:00, 10.92s/it]
Evaluating: 13%|█▎ | 13/100 [02:29<16:26, 11.34s/it]
Evaluating: 14%|█▍ | 14/100 [02:40<16:20, 11.40s/it]
Evaluating: 15%|█▌ | 15/100 [02:53<16:38, 11.74s/it]
Evaluating: 16%|█▌ | 16/100 [03:05<16:28, 11.77s/it]
Evaluating: 17%|█▋ | 17/100 [03:17<16:31, 11.95s/it]
Evaluating: 18%|█▊ | 18/100 [03:29<16:17, 11.92s/it]
Evaluating: 19%|█▉ | 19/100 [03:40<15:57, 11.82s/it]
Evaluating: 20%|██ | 20/100 [03:51<15:25, 11.57s/it]
Evaluating: 21%|██ | 21/100 [04:02<14:44, 11.20s/it]
Evaluating: 22%|██▏ | 22/100 [04:14<14:54, 11.47s/it]
Evaluating: 23%|██▎ | 23/100 [04:25<14:44, 11.48s/it]
Evaluating: 24%|██▍ | 24/100 [04:36<14:14, 11.24s/it]
Evaluating: 25%|██▌ | 25/100 [04:47<13:52, 11.09s/it]
Evaluating: 26%|██▌ | 26/100 [04:57<13:10, 10.68s/it]
Evaluating: 27%|██▋ | 27/100 [05:07<13:05, 10.76s/it]
Evaluating: 28%|██▊ | 28/100 [05:18<12:45, 10.63s/it]
Evaluating: 29%|██▉ | 29/100 [05:29<12:55, 10.93s/it]
Evaluating: 30%|███ | 30/100 [05:41<13:07, 11.25s/it]
Evaluating: 31%|███ | 31/100 [05:52<12:41, 11.03s/it]
Evaluating: 32%|███▏ | 32/100 [06:05<13:04, 11.54s/it]
Evaluating: 33%|███▎ | 33/100 [06:17<13:07, 11.76s/it]
Evaluating: 34%|███▍ | 34/100 [06:28<12:37, 11.48s/it]
Evaluating: 35%|███▌ | 35/100 [06:40<12:43, 11.75s/it]
Evaluating: 36%|███▌ | 36/100 [06:51<12:10, 11.41s/it]
Evaluating: 37%|███▋ | 37/100 [07:02<11:49, 11.27s/it]
Evaluating: 38%|███▊ | 38/100 [07:13<11:34, 11.20s/it]
Evaluating: 39%|███▉ | 39/100 [07:24<11:30, 11.31s/it]
Evaluating: 40%|████ | 40/100 [07:35<11:11, 11.19s/it]
Evaluating: 41%|████ | 41/100 [07:46<11:01, 11.21s/it]
Evaluating: 42%|████▏ | 42/100 [07:58<10:52, 11.25s/it]
Evaluating: 43%|████▎ | 43/100 [08:10<10:59, 11.58s/it]
Evaluating: 44%|████▍ | 44/100 [08:23<11:03, 11.85s/it]
Evaluating: 45%|████▌ | 45/100 [08:35<10:55, 11.93s/it]
Evaluating: 46%|████▌ | 46/100 [08:46<10:39, 11.84s/it]
Evaluating: 47%|████▋ | 47/100 [08:59<10:47, 12.22s/it]
Evaluating: 48%|████▊ | 48/100 [09:11<10:21, 11.94s/it]
Evaluating: 49%|████▉ | 49/100 [09:21<09:48, 11.54s/it]
Evaluating: 50%|█████ | 50/100 [09:38<10:57, 13.16s/it]
Evaluating: 51%|█████ | 51/100 [09:52<10:55, 13.37s/it]
Evaluating: 52%|█████▏ | 52/100 [10:04<10:20, 12.92s/it]
Evaluating: 53%|█████▎ | 53/100 [10:15<09:42, 12.40s/it]
Evaluating: 54%|█████▍ | 54/100 [10:28<09:30, 12.40s/it]
Evaluating: 55%|█████▌ | 55/100 [10:39<08:59, 11.99s/it]
Evaluating: 56%|█████▌ | 56/100 [10:50<08:36, 11.73s/it]
Evaluating: 57%|█████▋ | 57/100 [11:01<08:20, 11.64s/it]
Evaluating: 58%|█████▊ | 58/100 [11:12<07:53, 11.28s/it]
Evaluating: 59%|█████▉ | 59/100 [11:23<07:44, 11.33s/it]
Evaluating: 60%|██████ | 60/100 [11:34<07:30, 11.27s/it]
Evaluating: 61%|██████ | 61/100 [11:46<07:24, 11.40s/it]
Evaluating: 62%|██████▏ | 62/100 [11:57<07:13, 11.41s/it]
Evaluating: 63%|██████▎ | 63/100 [12:08<06:50, 11.08s/it]
Evaluating: 64%|██████▍ | 64/100 [12:17<06:20, 10.57s/it]
Evaluating: 65%|██████▌ | 65/100 [12:26<05:49, 9.98s/it]
Evaluating: 66%|██████▌ | 66/100 [12:36<05:45, 10.17s/it]
Evaluating: 67%|██████▋ | 67/100 [12:48<05:49, 10.59s/it]
Evaluating: 68%|██████▊ | 68/100 [12:59<05:45, 10.78s/it]
Evaluating: 69%|██████▉ | 69/100 [13:10<05:35, 10.81s/it]
Evaluating: 70%|███████ | 70/100 [13:20<05:17, 10.59s/it]
Evaluating: 71%|███████ | 71/100 [13:30<05:03, 10.48s/it]
Evaluating: 72%|███████▏ | 72/100 [13:41<04:54, 10.51s/it]
Evaluating: 73%|███████▎ | 73/100 [13:53<04:54, 10.89s/it]
Evaluating: 74%|███████▍ | 74/100 [14:05<04:52, 11.24s/it]
Evaluating: 75%|███████▌ | 75/100 [14:17<04:47, 11.48s/it]
Evaluating: 76%|███████▌ | 76/100 [14:27<04:29, 11.24s/it]
Evaluating: 77%|███████▋ | 77/100 [14:40<04:25, 11.54s/it]
Evaluating: 78%|███████▊ | 78/100 [14:51<04:14, 11.58s/it]
Evaluating: 79%|███████▉ | 79/100 [15:02<03:56, 11.25s/it]
Evaluating: 80%|████████ | 80/100 [15:13<03:43, 11.18s/it]
Evaluating: 81%|████████ | 81/100 [15:22<03:23, 10.73s/it]
Evaluating: 82%|████████▏ | 82/100 [15:33<03:14, 10.80s/it]
Evaluating: 83%|████████▎ | 83/100 [15:44<03:02, 10.71s/it]
Evaluating: 84%|████████▍ | 84/100 [15:55<02:52, 10.79s/it]
Evaluating: 85%|████████▌ | 85/100 [16:04<02:35, 10.36s/it]
Evaluating: 86%|████████▌ | 86/100 [16:16<02:30, 10.77s/it]
Evaluating: 87%|████████▋ | 87/100 [16:26<02:17, 10.61s/it]
Evaluating: 88%|████████▊ | 88/100 [16:38<02:09, 10.81s/it]
Evaluating: 89%|████████▉ | 89/100 [16:48<01:58, 10.74s/it]
Evaluating: 90%|█████████ | 90/100 [17:01<01:54, 11.48s/it]
Evaluating: 91%|█████████ | 91/100 [17:16<01:52, 12.49s/it]
Evaluating: 92%|█████████▏| 92/100 [17:27<01:35, 11.93s/it]
Evaluating: 93%|█████████▎| 93/100 [17:39<01:23, 11.98s/it]
Evaluating: 94%|█████████▍| 94/100 [17:49<01:08, 11.39s/it]
Evaluating: 95%|█████████▌| 95/100 [18:00<00:56, 11.20s/it]
Evaluating: 96%|█████████▌| 96/100 [18:10<00:43, 10.95s/it]
Evaluating: 97%|█████████▋| 97/100 [18:22<00:33, 11.25s/it]
Evaluating: 98%|█████████▊| 98/100 [18:32<00:21, 10.78s/it]
Evaluating: 99%|█████████▉| 99/100 [18:44<00:11, 11.18s/it]
Evaluating: 100%|██████████| 100/100 [18:57<00:00, 11.72s/it]
Evaluating: 100%|██████████| 100/100 [18:57<00:00, 11.37s/it]
✨ You're running DeepEval's latest Contextual Precision Metric! (using
gemini-1.5-pro-002, strict=False, async_mode=True)...
✨ You're running DeepEval's latest Contextual Recall Metric! (using
gemini-1.5-pro-002, strict=False, async_mode=True)...
✨ You're running DeepEval's latest Hallucination Metric! (using
gemini-1.5-pro-002, strict=False, async_mode=True)...

Evaluating 100 test case(s) in parallel: | | 0% (0/100) [Time Taken: 00:00, ?test case/s]
Evaluating 100 test case(s) in parallel: | | 1% (1/100) [Time Taken: 00:13, 13.79s/test case]
Evaluating 100 test case(s) in parallel: |▏ | 2% (2/100) [Time Taken: 00:14, 6.02s/test case]
Evaluating 100 test case(s) in parallel: |▎ | 3% (3/100) [Time Taken: 00:14, 3.43s/test case]
Evaluating 100 test case(s) in parallel: |▌ | 5% (5/100) [Time Taken: 00:15, 1.59s/test case]
Evaluating 100 test case(s) in parallel: |▌ | 6% (6/100) [Time Taken: 00:15, 1.20s/test case]
Evaluating 100 test case(s) in parallel: |▊ | 8% (8/100) [Time Taken: 00:15, 1.28test case/s]
Evaluating 100 test case(s) in parallel: |█ | 10% (10/100) [Time Taken: 00:16, 1.89test case/s]
Evaluating 100 test case(s) in parallel: |█ | 11% (11/100) [Time Taken: 00:16, 2.01test case/s]
Evaluating 100 test case(s) in parallel: |█▏ | 12% (12/100) [Time Taken: 00:16, 2.26test case/s]
Evaluating 100 test case(s) in parallel: |█▌ | 15% (15/100) [Time Taken: 00:16, 4.06test case/s]
Evaluating 100 test case(s) in parallel: |█▌ | 16% (16/100) [Time Taken: 00:17, 4.03test case/s]
Evaluating 100 test case(s) in parallel: |██ | 20% (20/100) [Time Taken: 00:17, 7.53test case/s]
Evaluating 100 test case(s) in parallel: |██▏ | 22% (22/100) [Time Taken: 00:17, 8.13test case/s]
Evaluating 100 test case(s) in parallel: |██▊ | 28% (28/100) [Time Taken: 00:17, 12.95test case/s]
Evaluating 100 test case(s) in parallel: |███ | 30% (30/100) [Time Taken: 00:17, 13.75test case/s]
Evaluating 100 test case(s) in parallel: |███▍ | 34% (34/100) [Time Taken: 00:17, 16.82test case/s]
Evaluating 100 test case(s) in parallel: |███▋ | 37% (37/100) [Time Taken: 00:18, 13.21test case/s]
Evaluating 100 test case(s) in parallel: |███▉ | 39% (39/100) [Time Taken: 00:18, 12.63test case/s]
Evaluating 100 test case(s) in parallel: |████ | 41% (41/100) [Time Taken: 00:18, 12.03test case/s]
Evaluating 100 test case(s) in parallel: |████▎ | 43% (43/100) [Time Taken: 00:19, 10.30test case/s]
Evaluating 100 test case(s) in parallel: |████▌ | 45% (45/100) [Time Taken: 00:19, 10.79test case/s]
Evaluating 100 test case(s) in parallel: |████▊ | 48% (48/100) [Time Taken: 00:19, 13.32test case/s]
Evaluating 100 test case(s) in parallel: |█████ | 50% (50/100) [Time Taken: 00:19, 14.48test case/s]
Evaluating 100 test case(s) in parallel: |█████▎ | 53% (53/100) [Time Taken: 00:19, 16.70test case/s]
Evaluating 100 test case(s) in parallel: |█████▌ | 55% (55/100) [Time Taken: 00:19, 11.56test case/s]
Evaluating 100 test case(s) in parallel: |█████▋ | 57% (57/100) [Time Taken: 00:20, 10.04test case/s]
Evaluating 100 test case(s) in parallel: |█████▉ | 59% (59/100) [Time Taken: 00:20, 10.31test case/s]
Evaluating 100 test case(s) in parallel: |██████ | 61% (61/100) [Time Taken: 00:20, 11.78test case/s]
Evaluating 100 test case(s) in parallel: |██████▎ | 63% (63/100) [Time Taken: 00:20, 11.50test case/s]
Evaluating 100 test case(s) in parallel: |██████▌ | 66% (66/100) [Time Taken: 00:20, 14.24test case/s]
Evaluating 100 test case(s) in parallel: |██████▊ | 68% (68/100) [Time Taken: 00:20, 12.16test case/s]
Evaluating 100 test case(s) in parallel: |███████ | 71% (71/100) [Time Taken: 00:21, 14.54test case/s]
Evaluating 100 test case(s) in parallel: |███████▍ | 74% (74/100) [Time Taken: 00:21, 10.98test case/s]
Evaluating 100 test case(s) in parallel: |███████▌ | 76% (76/100) [Time Taken: 00:21, 8.90test case/s]
Evaluating 100 test case(s) in parallel: |███████▊ | 78% (78/100) [Time Taken: 00:22, 8.35test case/s]
Evaluating 100 test case(s) in parallel: |████████ | 80% (80/100) [Time Taken: 00:22, 7.37test case/s]
Evaluating 100 test case(s) in parallel: |████████▏ | 82% (82/100) [Time Taken: 00:22, 8.87test case/s]
Evaluating 100 test case(s) in parallel: |████████▍ | 84% (84/100) [Time Taken: 00:22, 8.44test case/s]
Evaluating 100 test case(s) in parallel: |████████▌ | 86% (86/100) [Time Taken: 00:22, 9.93test case/s]
Evaluating 100 test case(s) in parallel: |████████▊ | 88% (88/100) [Time Taken: 00:23, 8.67test case/s]
Evaluating 100 test case(s) in parallel: |█████████ | 90% (90/100) [Time Taken: 00:23, 7.00test case/s]
Evaluating 100 test case(s) in parallel: |█████████ | 91% (91/100) [Time Taken: 00:24, 4.14test case/s]
Evaluating 100 test case(s) in parallel: |█████████▏| 92% (92/100) [Time Taken: 00:24, 4.64test case/s]
Evaluating 100 test case(s) in parallel: |█████████▎| 93% (93/100) [Time Taken: 00:25, 2.79test case/s]
Evaluating 100 test case(s) in parallel: |█████████▍| 94% (94/100) [Time Taken: 00:26, 2.20test case/s]
Evaluating 100 test case(s) in parallel: |█████████▌| 95% (95/100) [Time Taken: 00:26, 2.40test case/s]
Evaluating 100 test case(s) in parallel: |█████████▋| 97% (97/100) [Time Taken: 00:26, 3.38test case/s]
Evaluating 100 test case(s) in parallel: |█████████▊| 98% (98/100) [Time Taken: 00:27, 2.30test case/s]
Evaluating 100 test case(s) in parallel: |█████████▉| 99% (99/100) [Time Taken: 00:27, 2.66test case/s]
Evaluating 100 test case(s) in parallel: |██████████|100% (100/100) [Time Taken: 00:29, 1.41test case/s]
Evaluating 100 test case(s) in parallel: |██████████|100% (100/100) [Time Taken: 00:29, 3.39test case/s]
✓ Tests finished 🎉! Run 'deepeval login' to save and analyze evaluation results
on Confident AI.
‼️ Friendly reminder 😇: You can also run evaluations with ALL of deepeval's
metrics directly on Confident AI instead.
Average Metric Scores:
Contextual Precision 0.7293988095238094
Contextual Recall 0.858
Hallucination 0.5123170995670996
Metric Passrates:
Contextual Precision 0.68
Contextual Recall 0.81
Hallucination 0.6

Please sign in to comment.