評価結果_2024 08 19

評価実行例： evaluation_results_2024-08-19_10-52-31.csv

１回目（ほぼデフォルト設定）

実験設定

Index methodをHigh qualityにする以外は、Difyのデフォルト設定のまま評価を行った。
ナレッジのデータは、Firecrawlで自動収集した32ドキュメント＋手動でアップロードしたPDF５つ＋Excelデータ１つ
該当のボット：http://aidrd.japaneast.cloudapp.azure.com/app/0972d5ca-505a-46fa-9b1f-1a9d218cd96d/configuration

実験結果

結果の表：evaluation_results_all_default.csv

source_urls_f1_score = 0.44 かつ Average Answer Similarity Score = 2.54ということで、数値的には正解率半分程度
source_urls_f1_scoreが0（該当する情報ソースが取得できていない）場合は、当然ながらスコアが低くなっている。また、ファイル単位では参照先が正しい場合でも、例えば「難病と指定された場合に，助成対象とはならない介護の内容は？」という質問に対して、「助成対象となる介護の内容」のチャンクを取得してしまい、「助成対象とならない費用」のチャンクを参照できていないなどの問題がある

２回目（chunk_size=200）

１回目とほぼ同様の設定だが、チャンク方法をAutoからCustomに変更し、チャンクサイズとして200文字を指定した。（Autoの場合は、文章の区切り位置次第でチャンクサイズが変動し、最小40文字、最大で1000文字超になっていた（平均すると500文字程度）

以下は Difyのドキュメントからの引用

Although automated segmentation methods based on character length, identifiers, or NLP semantic segmentation can significantly reduce the workload of large-scale text segmentation, the quality of segmentation is related to the text structure of different document formats and the semantic context. Manual checking and correction can effectively compensate for the shortcomings of machine segmentation in semantic recognition.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

評価結果_2024 08 19

１回目（ほぼデフォルト設定）

実験設定

実験結果

２回目（chunk_size=200）

Clone this wiki locally