Getting nan score for spec2vec_similarity function #87

r00bit · 2022-11-18T21:13:01Z

Hi,
Using two similarity functions to compute similarity scores between spectra of two files:

Spec2vec: scores = calculate_scores(ref_spectrums, query_spectrums, spec2vec_similarity)
CosineGreedy: scores = calculate_scores(references=spectrums1, queries=spectrums2, similarity_function=CosineGreedy())
However, I've got nan scores for spec2vec that I think it's because of low similarity between pairs of spectra.
for example: the results of similarity functions between two spectra
spec2vec
Reference scan id: F1:2478
Query scan id: 3350
Score: [nan]

CosineGreedy
Reference scan id: F1:2478
Query scan id: 3850
Score: [0.004275957907034389, 4]
I tried to change the allowed missing percentage from 5 to higher value but it didn't work.
Could you please tell me how I can get a score rather than nan by applying spec2vec similarity function?
Thanks!

florian-huber · 2022-12-02T12:35:23Z

This could be multiple things. Usually I would expect the score to be 0 if something went wrong.

How dit you get Score?

r00bit · 2022-12-02T17:02:25Z

Here is the code I used to calculate the similarity score for two files containing 5 spectra (just for test):

def calculate_similarity_spec2vec (ref_file, query_file, model_file):

# Load reference spectrums
ref_spectrums = load_data(ref_file)

# create spectrum documents
ref_documents = create_spectrum_documents(ref_spectrums)
query_spectrums = load_data(query_file)
query_documents = create_spectrum_documents(query_spectrums)
# build model

#model= create_model(ref_documents, model_file)
model= create_model(query_documents, model_file)

# Load query spectrums

# Define similarity function
spec2vec_similarity = Spec2Vec(model=model, intensity_weighting_power=0.5,
                           allowed_missing_percentage=5.0)
# Calculate scores on all combinations of reference and query spectrums
#scores = calculate_scores(ref_documents, query_spectrums, spec2vec_similarity)
scores = calculate_scores(ref_spectrums, query_spectrums, spec2vec_similarity)

scores is ndarray with shape(5,5) containing 'nan' valuse.
When I set allowed_missing_percentage t0 .98, the scores files would be high that is not correct.
When the ref and query files are the same, it returns scores with high value.
The similarity scores that returned by ModifiedCosine similarity function is not high for each spectra pairs but the values are between 0 and 0.03.

r00bit · 2023-01-05T20:32:32Z

I learned this issue happened because there is low similarity between spectra so the missing_percentage is bigger than allowed_missing_percentage in _check_model_coverage function. For example, for the files that I tested the calculated missing percentage was around 86 so I set allowed_missing_percentage to 88, but it calculated high similarity scores which is not correct:
[[ nan nan nan nan nan]
[0.90276866 0.89489005 0.93134088 0.91878785 0.92963196]
[0.91237498 0.90594655 0.94265932 0.92595671 0.94527687]
[0.92896287 0.92476147 0.95530618 0.93518096 0.95431371]
[0.93886538 0.92957287 0.95451101 0.93058971 0.95301684]]
I think there is not way to fix it, right?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting nan score for spec2vec_similarity function #87

Getting nan score for spec2vec_similarity function #87

r00bit commented Nov 18, 2022

florian-huber commented Dec 2, 2022

r00bit commented Dec 2, 2022

r00bit commented Jan 5, 2023

Getting nan score for spec2vec_similarity function #87

Getting nan score for spec2vec_similarity function #87

Comments

r00bit commented Nov 18, 2022

florian-huber commented Dec 2, 2022

r00bit commented Dec 2, 2022

r00bit commented Jan 5, 2023