BERT based solution for IE problem in 2019 Language and Intelligence Challenge.
- Python 3.7
- pytorch 1.10
- OpenKE
Download from 2019 Language and Intelligence Challenge
Put data resources into raw_data/chinese
Firstly do the multi-label relation extractionpso_1
, secondly do the entity extractionpso_2
with Bert structure.
pso_1 | pso_2 |
---|---|
![]() |
![]() |
Replace the lstm structure with Bert for the paper Joint entity recognition and relation extraction as a multi-head selection problem.
python main.py --mode preprocessing --exp_name chinese_bert_re
python main.py --mode preprocessing --exp_name chinese_bert_pso --type pso_1
python main.py --mode train --exp_name chinese_bert_re
python main.py --mode train --exp_name chinese_bert_pso --type pso_1
python main.py --mode train --exp_name chinese_bert_pso --type pso_2
python main.py --mode reload --exp_name chinese_bert_re --epoch $epoch
python main.py --mode reload --exp_name chinese_bert_pso --type pso_1 --epoch $epoch
python main.py --mode reload --exp_name chinese_bert_pso --type pso_2 --epoch $epoch
python main.py --mode evaluation --exp_name chinese_bert_re
python main.py --mode evaluation --exp_name chinese_bert_pso --type pso_1
python main.py --mode evaluation --exp_name chinese_bert_pso --type pso_2
- build triple knowledge graph
- distant supervision to enrich triples
- triple classification with XGBOOST
Features | Descriptions | Resources |
---|---|---|
score | the triple confidence score for two steps / multi head selections | |
rank | the relative rank for triple confidence score in one sample candidate triples | |
transe | the transe score for one triple (s,p,o) | Triple Trustworthiness Measurement for Knowledge Graph |
sdvalidate | the sdvalidate value for one triple (s,p,o) | Improving the Quality of Linked Data Using Statistical Distributions |
one hot label | predicates for one sample candidate triples | |
seg | whether the subject/object boundaries consistent with word segment |
cancel comment on self.spo_search(text,result)
- prepare positive negative samples
- prepare TransE score
cancel comment on tester.run()
, get pos_neg.json
output
python lib/kg/negative_sample.py
Modify tag xgb_train_root
in experiments
to according run .json, like pos_neg_score.json
,
Then rerun the model to get the model confidence score for pos/neg data and get the ouput pos_neg_score.json
.
# multi-head selection
python main.py --mode postcheck --exp_name chinese_bert_re
# pso two steps model
python main.py --mode postcheck --exp_name chinese_bert_pso --type pso_2
For dev data, modify tag xgb_train_root
to dev.json
.
If use multi-head model, just run
python main.py --mode postcheck --exp_name chinese_bert_re
If pso two steps model
python main.py --mode postcheck --exp_name chinese_bert_re
then comment on tester.spo_search_res()
, scp error/chinese_bert_pso/dev.json data/chinese_bert/pso/dev.json
,then rerun
python main.py --mode postcheck --exp_name chinese_bert_re
prepare train/dev data proper format for OpenKE, get the data in data/transe
fold
python prepare_transe.py --mode train/dev
place the entity2id.txt
,relation2id.txt
,train2id.txt
to base_path in OpenKE packages and run
python lib/transe/run_transe.py --mode train/dev
get the embedding of (s,p,o) triple, scp the output(*.pickle) to the path lib/transe
python lib/transe/get_embedding.py
Firstly, for the training data, then for the dev data.
python kg/post_check.py
see in lib/metrics/attn_vis.py