Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paig evaluation paig UI support #204

Merged
merged 70 commits into from
Feb 27, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
a7f831f
evluation UI added for promptfoo
adityasinght Dec 27, 2024
3ca9f45
backend api code added
adityasinght Dec 30, 2024
6bcd279
paig_evaluation_oss_ui_support: Integrated a stepper component in CEv…
stalinnadar07 Jan 2, 2025
639cefc
paig_evaluation backend database support added. UI fixed.
adityasinght Jan 8, 2025
6094caa
rerun functionality added
adityasinght Jan 10, 2025
1632e83
multiple target support added for evalaution
adityasinght Jan 10, 2025
7199dea
paig evaluation revamp
Jan 31, 2025
b9c106a
eval target logic updated
Feb 5, 2025
1d1aa8b
web ui revamp #1
Feb 5, 2025
29e4bba
query fixed for get target list
Feb 10, 2025
0a23251
search fixed for eval targets
Feb 10, 2025
877a341
purpose component added
Feb 10, 2025
fe9f336
code added to get target by id
Feb 10, 2025
7f5f4fd
eval target new api added and issues fixed
Feb 10, 2025
7e93a13
added categories form and purpose form
Feb 10, 2025
7aa9ba3
file corrected
Feb 10, 2025
f671721
paig_evaluation_oss_ui_support: Added details page configuration form…
stalinnadar07 Feb 11, 2025
f67eb3a
updated table
Feb 12, 2025
dc5d310
updated table
Feb 12, 2025
1e08194
paig_evaluation_oss_ui_support: Code clean up and Purpose form update
stalinnadar07 Feb 12, 2025
025b9b0
paig_evaluation UI and backend supported revamped
stalinnadar07 Feb 12, 2025
70f93f6
paig_evaluation_oss_ui_support: Added fixes for card content and also…
stalinnadar07 Feb 18, 2025
d5a6349
paig_evaluation_oss_ui_support: Added timestamp(localtimezone) in bo…
stalinnadar07 Feb 18, 2025
79878e4
paig_evaluation_oss_ui_support: converted default report name to ddmm…
stalinnadar07 Feb 18, 2025
732a316
evaluation results new tables added. Rerun fixed
Feb 18, 2025
b56759a
paig_evaluation_oss_ui_support: Added refresh list after re run save
stalinnadar07 Feb 18, 2025
aa00d6f
paig_evaluation_oss_ui_support: Updated autogenerated string for Conf…
stalinnadar07 Feb 18, 2025
10bcd16
cumulative result api added
Feb 18, 2025
f2d1261
support for eval result details added
Feb 19, 2025
19dd5bb
category table added
Feb 19, 2025
3e75466
check added to avoid duplication in target names
Feb 19, 2025
3567aa5
paig_evaluation_oss_ui_support: Added initial report support for eval…
stalinnadar07 Feb 19, 2025
0af1d97
Added support for category and prompt searching
Feb 20, 2025
1c1474b
cascading deletion added when eval run is deleted
Feb 20, 2025
af4c8b5
paig_evaluation_oss_ui_support: Added graphs and table for Evaluation…
stalinnadar07 Feb 20, 2025
ba96334
paig_evaluation_oss_ui_support: Added graphs and table for Evaluation…
stalinnadar07 Feb 20, 2025
e70d10f
paig_evaluation_oss_ui_support: Added graphs and table for Evaluation…
stalinnadar07 Feb 20, 2025
4b7ce1f
added error reason in eval reports view
Feb 21, 2025
3c8bc8b
review comments resolved
Feb 21, 2025
0c27367
paig_evaluation_oss_ui_support: Added Feedback changes
stalinnadar07 Feb 21, 2025
f3b20bc
Merge pull request #192 from adityasinght/paig_evaluation_oss_ui_support
adityasinght Feb 21, 2025
0724b8f
unit test fixed , dependency added in CI action
Feb 21, 2025
9597938
Merge branch 'paig-evaluation-paig-ui-support' into paig_evaluation_o…
adityasinght Feb 21, 2025
3d9ebfa
Merge pull request #201 from adityasinght/paig_evaluation_oss_ui_support
adityasinght Feb 21, 2025
4253183
dependency added in toml for paig-server
Feb 21, 2025
0b2e7f3
Merge pull request #202 from adityasinght/paig_evaluation_oss_ui_support
adityasinght Feb 21, 2025
e6a3858
duplicate code removed
Feb 21, 2025
3847e53
duplicate code removed
Feb 21, 2025
932ad9b
paig_evaluation_oss_ui_support: Added tooltip for loading status
stalinnadar07 Feb 21, 2025
0136bd6
Merge pull request #203 from adityasinght/paig_evaluation_oss_ui_support
adityasinght Feb 24, 2025
a21fb73
headers removed if empty
Feb 24, 2025
e9ddaf4
Merge pull request #205 from adityasinght/paig_evaluation_oss_ui_support
adityasinght Feb 24, 2025
9e09ce1
review comments resolved
Feb 25, 2025
daf8527
Merge pull request #208 from adityasinght/paig_evaluation_oss_ui_support
adityasinght Feb 25, 2025
b48153d
paig_evaluation_oss_ui_support: Added Feedback changes
stalinnadar07 Feb 26, 2025
5948aed
Merge pull request #210 from adityasinght/paig_evaluation_oss_ui_support
adityasinght Feb 26, 2025
4c2e7d6
paig_evaluation_oss_ui_support: Added UI Feedback changes for Eval pages
stalinnadar07 Feb 26, 2025
5cd93fc
paig_evaluation_oss_ui_support: Added UI Feedback changes for Eval pages
stalinnadar07 Feb 26, 2025
80eb539
Merge pull request #213 from adityasinght/paig_evaluation_oss_ui_support
adityasinght Feb 26, 2025
595b0de
paig_evaluation_oss_ui_support: Code clean up
stalinnadar07 Feb 26, 2025
4d8ec15
unit tests fixed and email changed
Feb 26, 2025
86eb3f3
Merge pull request #215 from adityasinght/paig_evaluation_oss_ui_support
adityasinght Feb 26, 2025
903129c
paig_evaluation_oss_ui_support: Code clean up
stalinnadar07 Feb 27, 2025
4c9ff30
Merge pull request #218 from adityasinght/paig_evaluation_oss_ui_support
adityasinght Feb 27, 2025
25d2c2c
paig_evaluation_oss_ui_support: Code clean up
stalinnadar07 Feb 27, 2025
d447b52
Merge pull request #219 from adityasinght/paig_evaluation_oss_ui_support
adityasinght Feb 27, 2025
4ff148b
paig_evaluation_oss_ui_support: Code clean up
stalinnadar07 Feb 27, 2025
3a61739
Merge pull request #220 from adityasinght/paig_evaluation_oss_ui_support
adityasinght Feb 27, 2025
2929361
paig_evaluation_oss_ui_support: Added Feedback changes
stalinnadar07 Feb 27, 2025
140610d
Merge pull request #221 from adityasinght/paig_evaluation_oss_ui_support
adityasinght Feb 27, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion .github/workflows/paig-server-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,14 +50,22 @@ jobs:
python3 -m build -w
pip install dist/*.whl
cd ..

- name: Build and Install paig-authorizer-core wheel
run: |
. venv/bin/activate
cd paig-authorizer-core
python3 -m build -w
pip install dist/*.whl
cd ..

- name: Build and Install paig-evaluation wheel
run: |
. venv/bin/activate
cd paig-evaluation
python3 -m build -w
pip install dist/*.whl
cd ..

- name: Install PAIG-Server dependencies
run: |
Expand Down
2 changes: 1 addition & 1 deletion paig-server/backend/paig/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ def main(
action: str,
) -> None:

if action.lower() == "stop":
if action and action.lower() == "stop":
stop_server()
return
elif action.lower() == "status":
Expand Down
1 change: 1 addition & 0 deletions paig-server/backend/paig/alembic_db/env.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
from api.user.database.db_models import user_model, groups_model
from api.audit.RDS_service.db_models import access_audit_model
from api.encryption.database.db_models import encryption_master_key_model, encryption_key_model
from api.evaluation.database.db_models import eval_model, eval_targets, eval_config
from core.db_session.session import Base
target_metadata = Base.metadata

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
"""Added evaluation tables

Revision ID: 701ddf55a1b4
Revises: a95b604c47fb
Create Date: 2025-02-20 12:12:01.526199

"""
from typing import Sequence, Union

from alembic import op
import sqlalchemy as sa
import core.db_models.utils


# revision identifiers, used by Alembic.
revision: str = '701ddf55a1b4'
down_revision: Union[str, None] = 'a95b604c47fb'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None


def upgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.create_table('eval_config',
sa.Column('name', sa.String(length=255), nullable=True),
sa.Column('purpose', sa.Text(), nullable=True),
sa.Column('application_ids', sa.String(length=255), nullable=True),
sa.Column('application_names', sa.Text(), nullable=True),
sa.Column('categories', sa.Text(), nullable=True),
sa.Column('custom_prompts', sa.Text(), nullable=True),
sa.Column('version', sa.Integer(), nullable=False),
sa.Column('owner', sa.String(length=255), nullable=True),
sa.Column('id', sa.Integer(), autoincrement=True, nullable=False),
sa.Column('create_time', sa.DateTime(), nullable=True),
sa.Column('update_time', sa.DateTime(), nullable=True),
sa.Column('status', sa.String(length=255), nullable=True),
sa.PrimaryKeyConstraint('id')
)
op.create_index(op.f('ix_eval_config_id'), 'eval_config', ['id'], unique=False)
op.create_table('eval_run',
sa.Column('name', sa.String(length=255), nullable=False),
sa.Column('owner', sa.String(length=255), nullable=False),
sa.Column('purpose', sa.Text(), nullable=True),
sa.Column('eval_id', sa.String(length=255), nullable=False),
sa.Column('config_id', sa.String(length=255), nullable=False),
sa.Column('config_name', sa.String(length=255), nullable=True),
sa.Column('application_names', sa.Text(), nullable=True),
sa.Column('cumulative_result', sa.Text(), nullable=True),
sa.Column('passed', sa.String(length=255), nullable=True),
sa.Column('failed', sa.String(length=255), nullable=True),
sa.Column('base_run_id', sa.String(length=255), nullable=True),
sa.Column('id', sa.Integer(), autoincrement=True, nullable=False),
sa.Column('create_time', sa.DateTime(), nullable=True),
sa.Column('update_time', sa.DateTime(), nullable=True),
sa.Column('status', sa.String(length=255), nullable=True),
sa.PrimaryKeyConstraint('id')
)
op.create_index(op.f('ix_eval_run_id'), 'eval_run', ['id'], unique=False)
op.create_table('eval_config_history',
sa.Column('name', sa.String(length=255), nullable=True),
sa.Column('purpose', sa.Text(), nullable=True),
sa.Column('application_ids', sa.String(length=255), nullable=True),
sa.Column('application_names', sa.Text(), nullable=True),
sa.Column('generated_config', sa.Text(), nullable=True),
sa.Column('categories', sa.Text(), nullable=True),
sa.Column('custom_prompts', sa.Text(), nullable=True),
sa.Column('version', sa.Integer(), nullable=False),
sa.Column('owner', sa.String(length=255), nullable=True),
sa.Column('eval_config_id', sa.Integer(), nullable=False),
sa.Column('id', sa.Integer(), autoincrement=True, nullable=False),
sa.Column('create_time', sa.DateTime(), nullable=True),
sa.Column('update_time', sa.DateTime(), nullable=True),
sa.Column('status', sa.String(length=255), nullable=True),
sa.ForeignKeyConstraint(['eval_config_id'], ['eval_config.id'], ),
sa.PrimaryKeyConstraint('id')
)
op.create_index(op.f('ix_eval_config_history_id'), 'eval_config_history', ['id'], unique=False)
op.create_table('eval_result_prompt',
sa.Column('eval_run_id', sa.String(length=255), nullable=False),
sa.Column('eval_id', sa.String(length=255), nullable=False),
sa.Column('prompt_uuid', sa.String(length=255), nullable=False),
sa.Column('prompt', sa.Text(), nullable=False),
sa.Column('id', sa.Integer(), autoincrement=True, nullable=False),
sa.Column('create_time', sa.DateTime(), nullable=True),
sa.Column('update_time', sa.DateTime(), nullable=True),
sa.Column('status', sa.String(length=255), nullable=True),
sa.ForeignKeyConstraint(['eval_run_id'], ['eval_run.id'], ondelete='CASCADE'),
sa.PrimaryKeyConstraint('id')
)
op.create_index(op.f('ix_eval_result_prompt_id'), 'eval_result_prompt', ['id'], unique=False)
op.create_table('eval_target',
sa.Column('application_id', sa.Integer(), nullable=True),
sa.Column('config', sa.JSON(), nullable=False),
sa.Column('name', sa.String(length=255), nullable=True),
sa.Column('url', sa.Text(), nullable=True),
sa.Column('id', sa.Integer(), autoincrement=True, nullable=False),
sa.Column('create_time', sa.DateTime(), nullable=True),
sa.Column('update_time', sa.DateTime(), nullable=True),
sa.Column('status', sa.String(length=255), nullable=True),
sa.ForeignKeyConstraint(['application_id'], ['ai_application.id'], ),
sa.PrimaryKeyConstraint('id')
)
op.create_index(op.f('ix_eval_target_id'), 'eval_target', ['id'], unique=False)
op.create_table('eval_result_response',
sa.Column('eval_run_id', sa.String(length=255), nullable=False),
sa.Column('eval_result_prompt_uuid', sa.String(length=255), nullable=False),
sa.Column('eval_id', sa.String(length=255), nullable=False),
sa.Column('response', sa.Text(), nullable=True),
sa.Column('application_name', sa.String(length=255), nullable=False),
sa.Column('failure_reason', sa.Text(), nullable=True),
sa.Column('category_score', sa.Text(), nullable=True),
sa.Column('category', sa.String(length=255), nullable=True),
sa.Column('id', sa.Integer(), autoincrement=True, nullable=False),
sa.Column('create_time', sa.DateTime(), nullable=True),
sa.Column('update_time', sa.DateTime(), nullable=True),
sa.Column('status', sa.String(length=255), nullable=True),
sa.ForeignKeyConstraint(['eval_result_prompt_uuid'], ['eval_result_prompt.prompt_uuid'], ),
sa.ForeignKeyConstraint(['eval_run_id'], ['eval_run.id'], ondelete='CASCADE'),
sa.PrimaryKeyConstraint('id')
)
op.create_index(op.f('ix_eval_result_response_id'), 'eval_result_response', ['id'], unique=False)
# ### end Alembic commands ###


def downgrade() -> None:
# ### commands auto generated by Alembic - please adjust! ###
op.drop_index(op.f('ix_eval_result_response_id'), table_name='eval_result_response')
op.drop_table('eval_result_response')
op.drop_index(op.f('ix_eval_target_id'), table_name='eval_target')
op.drop_table('eval_target')
op.drop_index(op.f('ix_eval_result_prompt_id'), table_name='eval_result_prompt')
op.drop_table('eval_result_prompt')
op.drop_index(op.f('ix_eval_config_history_id'), table_name='eval_config_history')
op.drop_table('eval_config_history')
op.drop_index(op.f('ix_eval_run_id'), table_name='eval_run')
op.drop_table('eval_run')
op.drop_index(op.f('ix_eval_config_id'), table_name='eval_config')
op.drop_table('eval_config')
# ### end Alembic commands ###
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
from fastapi import Query
from pydantic import BaseModel, Field
from typing import List, Optional

from core.api_schemas.base_view import BaseView
from core.factory.database_initiator import BaseAPIFilter

class ConfigCommonModel(BaseModel):
purpose: str = Field(..., description="The purpose of the config")
name: str = Field(..., max_length=1024)
categories: List[str] = Field(default_factory=[], description="The categories of evaluation")
custom_prompts: List[str] = Field(default_factory=[], description="Custom prompts for evaluation")


class ConfigCreateRequest(ConfigCommonModel):
application_ids: str

class ConfigUpdateRequest(ConfigCommonModel):
application_ids: str


class EvalConfigFilter(BaseAPIFilter):
"""
Filter class for AI application queries.

Attributes:
id (int, optional): Filter by ID.
purpose (str, optional): Filter by purpose.
name (str, optional): Filter by name.
"""

id: Optional[int] = Field(default=None, description="Filter by id")
purpose: Optional[str] = Field(default=None, description="Filter by purpose")
name: Optional[str] = Field(default=None, description="Filter by name")

class EvalConfigView(BaseView):
purpose: str = Field(..., description="The purpose of the config")
name: str = Field(..., max_length=1024, description="The name of the config")
categories: str = Field(..., description="The categories of evaluation")
custom_prompts: str = Field(..., description="Custom prompts for evaluation")
status: str = Field(..., max_length=1024, description="The status of the config")
version: int = Field(..., gt=0, description="The version of the config")
application_names: str = Field(..., description="The application names")
eval_run_count: int = Field(..., ge=0, description="The number of evaluation runs")
owner: Optional[str] = Field(None, description="The User Name", alias="owner")
model_config = BaseView.model_config


class QueryParamsBase(BaseAPIFilter):
purpose: Optional[str] = Field(None, description="purpose", alias="purpose")
name: Optional[str] = Field(None, description="The Config name", alias="name")
owner: Optional[str] = Field(None, description="The User ID", alias="owner")
application_names: Optional[str] = Field(None, description="The Application name", alias="application_names")



class IncludeQueryParams(QueryParamsBase):
pass

def include_query_params(
include_query_application_names: Optional[str] = Query(None, alias="includeQuery.application_names"),
include_query_purpose: Optional[str] = Query(None, alias="includeQuery.purpose"),
include_query_name: Optional[str] = Query(None, alias="includeQuery.name"),
include_query_owner: Optional[str] = Query(None, alias="includeQuery.owner"),
) -> IncludeQueryParams:
return IncludeQueryParams(
application_names=include_query_application_names,
purpose=include_query_purpose,
name=include_query_name,
owner=include_query_owner
)


def exclude_query_params(
exclude_query_application_names: Optional[str] = Query(None, alias="excludeQuery.application_names"),
exclude_query_purpose: Optional[str] = Query(None, alias="excludeQuery.purpose"),
exclude_query_name: Optional[str] = Query(None, alias="excludeQuery.name"),
exclude_query_owner: Optional[str] = Query(None, alias="excludeQuery.owner"),
) -> QueryParamsBase:
return QueryParamsBase(
application_names=exclude_query_application_names,
purpose=exclude_query_purpose,
name=exclude_query_name,
owner=exclude_query_owner
)


def extract_include_query_params(params):
params_dict = params.model_dump(exclude=BaseAPIFilter.model_fields.keys(), by_alias=False, exclude_none=True)

# Extract only the required fields
filtered_params = {params.model_fields[field].alias: value for field, value in params_dict.items() if
value is not None}

return filtered_params
Loading