PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation
This repository contains benchmark contracts, human-written & LLM-written formal properties and experiment results in support of the NDSS2025 submission for PropertyGPT.
├── README.md
├── RQ1
│ ├── PropertyGPT sub-project level evaluation.md
│ └── assets
├── RQ2 cve
│ ├── CVEs
│ ├── CVEs_basic_scan_result.xlsx
│ └── CVEs_comparision.xlsx
├── RQ2 smartInv
│ ├── assets
│ ├── smartInvRQ2.md
│ └── smartInv_RQ1
├── RQ3
│ ├── asset
│ └── readme.md
├── RQ3 failure info
│ ├── asset
│ └── readme.md
├── RQ4
│ ├── RQ4.md
│ └── RQ4.xlsx
├── certora_projects
│ ├── aave_l2_bridge
│ ├── aave_proof_of_reserve
│ ├── aave_rescue_mission
│ ├── aave_staked_token
│ ├── aave_starknet_bridge
│ ├── aave_static_token
│ ├── aave_v2
│ ├── aave_v3
│ ├── aave_vault
│ ├── celo_governance
│ ├── combined_output_train_all.csv
│ ├── compound_moneymarket
│ ├── contract_extractor.py
│ ├── extractor.py
│ ├── furucombo
│ ├── gho-core
│ ├── keep_fully
│ ├── lido_v2
│ ├── notional_finance_v2
│ ├── openzepplin
│ ├── opyn_gamma_protocol
│ ├── ousd
│ ├── popsicle_v3_optimizer
│ ├── radicle_drips
│ ├── reports_info.md
│ ├── rocket_joe
│ └── sushi_benttobox
├── rule_classification
│ ├── all_rules.csv
│ ├── assets
│ ├── category_of_spec
│ └── rule_classification.md
└── weight_calcu
├── linear_analysis.py
└── path_to_save_results.xlsx
-
We include sampled property dataset and property generation results in RQ1 and all the crawled Certora audit projects are available at Certora_projects.
-
RQ1 includes detailed links to associated project files, offering specific counts of properties for each evaluated project.
- CVE: We detail the collection and evaluation process of CVEs, including comparisons with results from
gptscan
. - Curated SmartInv benchmark: We include the selection process of this benchmark, property generation results and vulnerability detection results.
- Property Fix: We document the success rates and number of repair times during property generation driven by LLM.
- Top-K setting: We show precision, recall, and F1-score for various Top-K settings for selecting appropriate properties as candidates.
We list the results of zero-day bugs found by PropertyGPT and detail the corresponding properties generated by PropertyGPT.