Skip to content

Latest commit

 

History

History
116 lines (94 loc) · 9.76 KB

File metadata and controls

116 lines (94 loc) · 9.76 KB

WSDM 2023 Cup: Unbias Learning to Rank for A Large Scale Search Dataset from Baidu Search Engine

1 Background

This repository is the solution to the WSDM 2023 Unbiased Learning for Web Search. On the basis of the Dual Learning Algorithm (DLA), our solution conducts extensive and in-depth research on unbias learning to rank and proposes a strategy of using multiple behavioral features for unbiased learning, which greatly improves the performance of ranking models.

2 Model Overview

The overall framework of the model is shown in Fig.1.

alt 属性文本

Taking the data of one search session as an example, as shown in Fig.1, the text features of the document at position n will be fed into the relevance model to output the relevance score r, while other features of the document that can be used to calculate the propensity score are fed into the propensity model to get the propensity score s. Subsequently, p and r are multiplied to obtain the score s of the position n being clicked.

Note, instead of inputting the entire document list for the session, we pick a group of documents whose group size is 6 from the document list, including 1 document (positive sample) that is clicked and 5 documents (negative samples) that are not clicked. In addition, only the propensity score of the positive sample is provided by the model, while the propensity socre of the negative sample is forced to be set to a fixed value 0.1, which means that p1p3 and pn in Fig.1 is 0.1.

3 Environment

The environment of unbias learning to rank task is same as the Pre-training for Web Search Task.

4 Quick Start

4.1 Prepare the corpus

Suppose your have downloaded the Web Search Session Data (training data) and annotation_data_0522.txt (test data) on Google drive. Moreover, for those who cannot access google drive:

Note: unzip the train data may spend a lot of time.

4.2 The Pre-trained Language Model

A pre-trained language model is important for the model in Fig.1. You can download the pre-trained language model we trained from the table below:

PTM Version URL
Bert_Layer12_Head12 Bert_Layer12_Head12
Bert_Layer12_Head12 wwm Bert_Layer12_Head12 wwm
Bert_Layer24_Head12 Bert_Layer24_Head12
In Table, wwm means that we use whole word masking.

4.3 Directory Structure

After the corpus and pre-trained language model is ready, you should organize them with the following directory structure:

Your Data Root Path
|——baidu_ultr
|       |——data
|       |        |——part-00000
|       |        |——part-00001
|       |        |——...
|       |——annotate_data
|       |        |——annotation_data_0522.txt
|       |        |——wsdm_test_1.txt
|       |        |——wsdm_test_2_all.txt
|       |——ckpt
|       |        |——submit
|       |        |        |——model_name
|       |        |                |——config.json
|       |        |                |——pytorch.bin
|       |        |——pretrain
|       |        |        |——model_name
|       |        |                |——config.json
|       |        |                |——pytorch.bin

4.3 Training Model

  • Modify data_root in ./pretrain/start.sh as Your Data Root Path
  • Then,
cd pretrain
sh start.sh
  • You can apply tensorboard in output_dir to observe the trend of model indicators

4.4 Test Model

4.4.1 Single model

In order to quickly test the model performance, you can directly download model trained by us whose dcg@10 is 10.25 on annotation_data_0522.txt (val dataset)
Then, modify data_root as Your Data Root Pathmodel_name_or_path as the path of model you want to test and model_w as 1 in ./submit/start.sh.
Finally

cd submit
sh start.sh

4.4.2 Model Ensemble

In order to further improve the performance of the model, we used the weighted sum of the output scores of multiple models trained under different settings that we produced during the experiment as the final relevance score.
You can download these model with different setting from the table below:

Model Name URL DCG@10
on val dataset
group6_pos_slipoff_mtype_serph_emb8_mlp5l_maxmeancls_bs48 Download 10.03
group6_pos_slipoff_mtype_serph_emb8_mlp5l_maxmeancls Download 10.14
group6_pos_slipoff_mtype_serph_emb8_mlp5l_wwm Download 10.16
group6_pos_slipoff_serph_emb8_mlp5l_24l Download 10.10
group6_pos_slipoff_serph_emb8_mlp5l Download 10.25
group6_pos_slipoff_mtype_serph_emb8_bnnoelu_mlp5l_relu Download 10.20
group6_pos_slipoff_mtype_serph_emb8_bnnoelu_dropout_mlp5l_relu Download 10.14
group6_pos_slipoff_mtype_serph_emb8_bnnoelu_mlp5l_relu_24l Download 10.23
group6_pos_slipoff_mtype_serh_emb8_bnnoelu Download 10.15
group6_pos_slipoff_mtype_emb8_bnnoelu Download 10.15
group6_pos_slipoff_serh_emb8 Download 10.05
group6_pos_slipoff_pad_with_pretrain_emb8 Download 10.05

Then, modify data_root as Your Data Root Pathmodel_name_or_path as the path of model you want to test and model_w as 0.10,0.35,0.50,0.25,0.40,0.10,0.10,0.55,0.35,0.05,0.1,0.50 in ./submit/start.sh, in which model_w is set manually.

Finally

cd submit
sh start.sh

The dcg@10 of model Ensemble on val dataset is 10.54 (10.14 on final test dataset)

Contacts