有害文書フィルタリング

以下で述べるフィルタは、Perspective APIを使って自動生成したラベル付きデータで学習したfastTextベースのフィルタです。人手で作成したラベル付きデータ LLM-jp Toxicity Dataset で学習した、より精度の高いDeBERTaベースのフィルタはこちらにあります。

準備

pip install fasttext fugashi[unidic-lite]
/path/to/ja_cc/にフィルタリング対象のファイルCC-MAIN-2013-2016.jsonl.gz、CC-MAIN-2017-04.jsonl.gzなどがあるとします。
ft_model.bin ... フィルタリングに使うfastTextモデルです。
predict_ja_cc_gz.py ... モデルとフィルタリング対象のファイルを読み込んで、ファイル中の各textの有害スコアを計算します。
do_filter.py ... 有害スコアに基づいてtextをフィルタリング（有害か無害かに分類）します。

手順

まずCC-MAIN-2013-2016.jsonl.gz、CC-MAIN-2017-04.jsonl.gzなどのファイルのリストfile_listを作り、ファイルごとに、そのファイル中の各textに対する有害スコアを計算します。 GNU parallelで並列実行します。

ls /path/to/ja_cc/ | sed 's/\.gz//' > file_list

mkdir toxic_scores/

parallel -j 30 'python3 predict_ja_cc_gz.py \
    ft_model.bin /path/to/ja_cc/{}.gz > toxic_scores/{.} ' \
    :::: file_list

上記の結果、toxic_scores/に、CC-MAIN-2013-2016.jsonl.gz、CC-MAIN-2017-04.jsonl.gzなどのファイルにある各textに対する有害スコアが出力されます。例えばCC-MAIN-2013-2016.jsonl.gzの有害スコアはtoxic_scores/CC-MAIN-2013-2016に出力されます。一行につき１つのtextに対する有害スコアが記載されます。上記の処理で、だいたい3時間から6時間くらいかかるかもしれません。

次に、有害スコアに基づいて各textをフィルタリング（有害か無害かに分類）します。

mkdir ja_cc_toxic ja_cc_toxicity_filtered

python3 do_filter.py /path/to/ja_cc/ toxic_scores/ ja_cc_toxic/ ja_cc_toxicity_filtered/

ja_cc_toxic/に有害なtextが、ja_cc_toxicity_filtered/に無害なtextが出力されます。do_filter.pyにthreshold = 0.99とありますが、この0.99が分類閾値です。

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
do_filter.py		do_filter.py
ft_model.bin		ft_model.bin
predict_ja_cc_gz.py		predict_ja_cc_gz.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

有害文書フィルタリング

準備

手順

About

Releases

Packages

Languages

llm-jp/Toxicity_Filter_fastText

Folders and files

Latest commit

History

Repository files navigation

有害文書フィルタリング

準備

手順

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages