Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. And the score can be mapped to a float value in [0,1] by sigmoid function.
For more detailed using, you can look reranker-encoder only or reranker-decoder only
Model | Base model | Language | layerwise | feature |
---|---|---|---|---|
BAAI/bge-reranker-base | xlm-roberta-base | Chinese and English | - | Lightweight reranker model, easy to deploy, with fast inference. |
BAAI/bge-reranker-large | xlm-roberta-large | Chinese and English | - | Lightweight reranker model, easy to deploy, with fast inference. |
BAAI/bge-reranker-v2-m3 | bge-m3 | Multilingual | - | Lightweight reranker model, possesses strong multilingual capabilities, easy to deploy, with fast inference. |
BAAI/bge-reranker-v2-gemma | gemma-2b | Multilingual | - | Suitable for multilingual contexts, performs well in both English proficiency and multilingual capabilities. |
BAAI/bge-reranker-v2-minicpm-layerwise | MiniCPM-2B-dpo-bf16 | Multilingual | 8-40 | Suitable for multilingual contexts, performs well in both English and Chinese proficiency, allows freedom to select layers for output, facilitating accelerated inference. |
You can select the model according your senario and resource.
-
For multilingual, utilize BAAI/bge-reranker-v2-m3 and BAAI/bge-reranker-v2-gemma
-
For Chinese or English, utilize BAAI/bge-reranker-v2-m3 and BAAI/bge-reranker-v2-minicpm-layerwise.
-
For efficiency, utilize BAAI/bge-reranker-v2-m3 and the low layer of BAAI/bge-reranker-v2-minicpm-layerwise.
-
For better performance, recommand BAAI/bge-reranker-v2-minicpm-layerwise and BAAI/bge-reranker-v2-gemma
You can use FlagAutoReranker
to load the model. For the custom model (not included in AUTO_RERANKER_MAPPING
), you must specify the model_class
parameter. You can also submit a pull request to add your released model to the AUTO_RERANKER_MAPPING
dictionary. If need, you can create a new <model>.py
file in here or here.
from FlagEmbedding import FlagAutoReranker
reranker = FlagAutoReranker.from_finetuned('BAAI/bge-reranker-large',
query_max_length=256,
passage_max_length=512,
use_fp16=True,
devices=['cuda:1']) # Setting use_fp16 to True speeds up computation with a slight performance degradation
score = reranker.compute_score(['query', 'passage'])
print(score) # -1.5263671875
# You can map the scores into 0-1 by set "normalize=True", which will apply sigmoid function to the score
score = reranker.compute_score(['query', 'passage'], normalize=True)
print(score) # 0.1785258315203034
scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']])
print(scores) # [-5.60546875, 5.76171875]
# You can map the scores into 0-1 by set "normalize=True", which will apply sigmoid function to the score
scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']], normalize=True)
print(scores) # [0.0036642203307843528, 0.9968641641227171]
For your custom model (assume the model is finetuned from BAAI/bge-reranker-large
, then the model class is encoder-only-base
), you can use the following code:
from FlagEmbedding import FlagAutoReranker
reranker = FlagAutoReranker.from_finetuned('your_model_name_or_path',
model_class='encoder-only-base',
query_max_length=256,
passage_max_length=512,
use_fp16=True,
devices=['cuda:1']) # Setting use_fp16 to True speeds up computation with a slight performance degradation
score = reranker.compute_score(['query', 'passage'])
print(score)
The model_class
parameter currently includes the following options:
encoder-only-base
: for encoder-only reranker model, such asBAAI/bge-reranker-large
decoder-only-base
: for decoder-only reranker model, such asBAAI/bge-reranker-v2-gemma
decoder-only-layerwise
: for decoder-only layerwise reranker model, such asBAAI/bge-reranker-v2-minicpm-layerwise
decoder-only-lightweight
: for decoder-only lightweight reranker model, such asBAAI/bge-reranker-v2.5-gemma2-lightweight
For FlagReranker
, it supports BAAI/bge-reranker-base
, BAAI/bge-reranker-large
, BAAI/bge-reranker-v2-m3
:
from FlagEmbedding import FlagReranker
reranker = FlagReranker(
'BAAI/bge-reranker-v2-m3',
query_max_length=256,
passage_max_length=512,
use_fp16=True,
devices=['cuda:1']
) # Setting use_fp16 to True speeds up computation with a slight performance degradation
score = reranker.compute_score(['query', 'passage'])
print(score) # -5.65234375
# You can map the scores into 0-1 by set "normalize=True", which will apply sigmoid function to the score
score = reranker.compute_score(['query', 'passage'], normalize=True)
print(score) # 0.003497010252573502
scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']])
print(scores) # [-8.1875, 5.26171875]
# You can map the scores into 0-1 by set "normalize=True", which will apply sigmoid function to the score
scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']], normalize=True)
print(scores) # [0.00027803096387751553, 0.9948403768236574]
For FlagLLMReranker
, it supports BAAI/bge-reranker-v2-gemma
:
from FlagEmbedding import FlagLLMReranker
reranker = FlagLLMReranker(
'BAAI/bge-reranker-v2-gemma',
query_max_length=256,
passage_max_length=512,
use_fp16=True,
devices=['cuda:1']
) # Setting use_fp16 to True speeds up computation with a slight performance degradation
score = reranker.compute_score(['query', 'passage'])
print(score)
scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']])
print(scores)
For LayerWiseFlagLLMReranker
, it supports BAAI/bge-reranker-v2-minicpm-layerwise
:
from FlagEmbedding import LayerWiseFlagLLMReranker
reranker = LayerWiseFlagLLMReranker(
'BAAI/bge-reranker-v2-minicpm-layerwise',
query_max_length=256,
passage_max_length=512,
use_fp16=True,
devices=['cuda:1']
) # Setting use_fp16 to True speeds up computation with a slight performance degradation
score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28]) # Adjusting 'cutoff_layers' to pick which layers are used for computing the score.
print(score)
scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']], cutoff_layers=[28])
print(scores)
For LightWeightFlagLLMReranker
, it supports BAAI/bge-reranker-v2.5-gemma2-lightweight
:
from FlagEmbedding import LightWeightFlagLLMReranker
reranker = LightWeightFlagLLMReranker(
'BAAI/bge-reranker-v2.5-gemma2-lightweight',
query_max_length=256,
passage_max_length=512,
use_fp16=True,
devices=['cuda:1']
) # Setting use_fp16 to True speeds up computation with a slight performance degradation
score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28], compress_ratio=2, compress_layers=[24, 40]) # Adjusting 'cutoff_layers' to pick which layers are used for computing the score.
print(score)
scores = reranker.compute_score([['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']], cutoff_layers=[28], compress_ratio=2, compress_layers=[24, 40])
print(scores)
It supports BAAI/bge-reranker-base
, BAAI/bge-reranker-large
, BAAI/bge-reranker-v2-m3
:
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-v2-m3')
model = AutoModelForSequenceClassification.from_pretrained('BAAI/bge-reranker-v2-m3')
model.eval()
pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
with torch.no_grad():
inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
scores = model(**inputs, return_dict=True).logits.view(-1, ).float()
print(scores)
It supports BAAI/bge-reranker-v2-gemma
:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
def get_inputs(pairs, tokenizer, prompt=None, max_length=1024):
if prompt is None:
prompt = "Given a query A and a passage B, determine whether the passage contains an answer to the query by providing a prediction of either 'Yes' or 'No'."
sep = "\n"
prompt_inputs = tokenizer(prompt,
return_tensors=None,
add_special_tokens=False)['input_ids']
sep_inputs = tokenizer(sep,
return_tensors=None,
add_special_tokens=False)['input_ids']
inputs = []
for query, passage in pairs:
query_inputs = tokenizer(f'A: {query}',
return_tensors=None,
add_special_tokens=False,
max_length=max_length * 3 // 4,
truncation=True)
passage_inputs = tokenizer(f'B: {passage}',
return_tensors=None,
add_special_tokens=False,
max_length=max_length,
truncation=True)
item = tokenizer.prepare_for_model(
[tokenizer.bos_token_id] + query_inputs['input_ids'],
sep_inputs + passage_inputs['input_ids'],
truncation='only_second',
max_length=max_length,
padding=False,
return_attention_mask=False,
return_token_type_ids=False,
add_special_tokens=False
)
item['input_ids'] = item['input_ids'] + sep_inputs + prompt_inputs
item['attention_mask'] = [1] * len(item['input_ids'])
inputs.append(item)
return tokenizer.pad(
inputs,
padding=True,
max_length=max_length + len(sep_inputs) + len(prompt_inputs),
pad_to_multiple_of=8,
return_tensors='pt',
)
tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-v2-gemma')
model = AutoModelForCausalLM.from_pretrained('BAAI/bge-reranker-v2-gemma')
yes_loc = tokenizer('Yes', add_special_tokens=False)['input_ids'][0]
model.eval()
pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
with torch.no_grad():
inputs = get_inputs(pairs, tokenizer)
scores = model(**inputs, return_dict=True).logits[:, -1, yes_loc].view(-1, ).float()
print(scores)
It supports BAAI/bge-reranker-v2-minicpm-layerwise
:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
def get_inputs(pairs, tokenizer, prompt=None, max_length=1024):
if prompt is None:
prompt = "Given a query A and a passage B, determine whether the passage contains an answer to the query by providing a prediction of either 'Yes' or 'No'."
sep = "\n"
prompt_inputs = tokenizer(prompt,
return_tensors=None,
add_special_tokens=False)['input_ids']
sep_inputs = tokenizer(sep,
return_tensors=None,
add_special_tokens=False)['input_ids']
inputs = []
for query, passage in pairs:
query_inputs = tokenizer(f'A: {query}',
return_tensors=None,
add_special_tokens=False,
max_length=max_length * 3 // 4,
truncation=True)
passage_inputs = tokenizer(f'B: {passage}',
return_tensors=None,
add_special_tokens=False,
max_length=max_length,
truncation=True)
item = tokenizer.prepare_for_model(
[tokenizer.bos_token_id] + query_inputs['input_ids'],
sep_inputs + passage_inputs['input_ids'],
truncation='only_second',
max_length=max_length,
padding=False,
return_attention_mask=False,
return_token_type_ids=False,
add_special_tokens=False
)
item['input_ids'] = item['input_ids'] + sep_inputs + prompt_inputs
item['attention_mask'] = [1] * len(item['input_ids'])
inputs.append(item)
return tokenizer.pad(
inputs,
padding=True,
max_length=max_length + len(sep_inputs) + len(prompt_inputs),
pad_to_multiple_of=8,
return_tensors='pt',
)
tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-v2-minicpm-layerwise', trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained('BAAI/bge-reranker-v2-minicpm-layerwise', trust_remote_code=True, torch_dtype=torch.bfloat16)
model = model.to('cuda')
model.eval()
pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
with torch.no_grad():
inputs = get_inputs(pairs, tokenizer).to(model.device)
all_scores = model(**inputs, return_dict=True, cutoff_layers=[28])
all_scores = [scores[:, -1].view(-1, ).float() for scores in all_scores[0]]
print(all_scores)
It supports BAAI/bge-reranker-v2.5-gemma2-lightweight
:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
def last_logit_pool(logits: torch.Tensor,
attention_mask: torch.Tensor) -> torch.Tensor:
left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
if left_padding:
return logits[:, -1]
else:
sequence_lengths = attention_mask.sum(dim=1) - 1
batch_size = logits.shape[0]
return torch.stack([logits[i, sequence_lengths[i]] for i in range(batch_size)], dim=0)
def get_inputs(pairs, tokenizer, prompt=None, max_length=1024):
if prompt is None:
prompt = "Predict whether passage B contains an answer to query A."
sep = "\n"
prompt_inputs = tokenizer(prompt,
return_tensors=None,
add_special_tokens=False)['input_ids']
sep_inputs = tokenizer(sep,
return_tensors=None,
add_special_tokens=False)['input_ids']
inputs = []
query_lengths = []
prompt_lengths = []
for query, passage in pairs:
query_inputs = tokenizer(f'A: {query}',
return_tensors=None,
add_special_tokens=False,
max_length=max_length * 3 // 4,
truncation=True)
passage_inputs = tokenizer(f'B: {passage}',
return_tensors=None,
add_special_tokens=False,
max_length=max_length,
truncation=True)
item = tokenizer.prepare_for_model(
[tokenizer.bos_token_id] + query_inputs['input_ids'],
sep_inputs + passage_inputs['input_ids'],
truncation='only_second',
max_length=max_length,
padding=False,
return_attention_mask=False,
return_token_type_ids=False,
add_special_tokens=False
)
item['input_ids'] = item['input_ids'] + sep_inputs + prompt_inputs
item['attention_mask'] = [1] * len(item['input_ids'])
inputs.append(item)
query_lengths.append(len([tokenizer.bos_token_id] + query_inputs['input_ids'] + sep_inputs))
prompt_lengths.append(len(sep_inputs + prompt_inputs))
return tokenizer.pad(
inputs,
padding=True,
max_length=max_length + len(sep_inputs) + len(prompt_inputs),
pad_to_multiple_of=8,
return_tensors='pt',
), query_lengths, prompt_lengths
tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-v2.5-gemma2-lightweight', trust_remote_code=True)
tokenizer.padding_side = 'right'
model = AutoModelForCausalLM.from_pretrained('BAAI/bge-reranker-v2.5-gemma2-lightweight', trust_remote_code=True)
model = model.to('cuda')
model.eval()
pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']]
with torch.no_grad():
inputs, query_lengths, prompt_lengths = get_inputs(pairs, tokenizer)
inputs = inputs.to(model.device)
outputs = model(**inputs,
return_dict=True,
cutoff_layers=[28],
compress_ratio=2,
compress_layer=[24, 40],
query_lengths=query_lengths,
prompt_lengths=prompt_lengths)
scores = []
for i in range(len(outputs.logits)):
logits = last_logit_pool(outputs.logits[i], outputs.attention_masks[i])
scores.append(logits.cpu().float().tolist())
print(scores)
If you download reranker-v2-minicpm-layerwise, you can load it with the following method:
- make sure
configuration_minicpm_reranker.py
andmodeling_minicpm_reranker.py
from BAAI/bge-reranker-v2-minicpm-layerwise in your local path. - modify the following part of
config.json
:
"auto_map": {
"AutoConfig": "configuration_minicpm_reranker.LayerWiseMiniCPMConfig",
"AutoModel": "modeling_minicpm_reranker.LayerWiseMiniCPMModel",
"AutoModelForCausalLM": "modeling_minicpm_reranker.LayerWiseMiniCPMForCausalLM"
},
- make sure
gemma_config.py
andgemma_model.py
from BAAI/bge-reranker-v2.5-gemma2-lightweight in your local path. - modify the following part of config.json:
"auto_map": {
"AutoConfig": "gemma_config.CostWiseGemmaConfig",
"AutoModel": "gemma_model.CostWiseGemmaModel",
"AutoModelForCausalLM": "gemma_model.CostWiseGemmaForCausalLM"
},
If you find this repository useful, please consider giving a star ⭐ and citation
@misc{li2023making,
title={Making Large Language Models A Better Foundation For Dense Retrieval},
author={Chaofan Li and Zheng Liu and Shitao Xiao and Yingxia Shao},
year={2023},
eprint={2312.15503},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@misc{chen2024bge,
title={BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation},
author={Jianlv Chen and Shitao Xiao and Peitian Zhang and Kun Luo and Defu Lian and Zheng Liu},
year={2024},
eprint={2402.03216},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@misc{li2024makingtextembeddersfewshot,
title={Making Text Embedders Few-Shot Learners},
author={Chaofan Li and MingHao Qin and Shitao Xiao and Jianlyu Chen and Kun Luo and Yingxia Shao and Defu Lian and Zheng Liu},
year={2024},
eprint={2409.15700},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2409.15700},
}