Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting nan score for spec2vec_similarity function #87

Open
r00bit opened this issue Nov 18, 2022 · 3 comments
Open

Getting nan score for spec2vec_similarity function #87

r00bit opened this issue Nov 18, 2022 · 3 comments

Comments

@r00bit
Copy link

r00bit commented Nov 18, 2022

Hi,
Using two similarity functions to compute similarity scores between spectra of two files:

  1. Spec2vec: scores = calculate_scores(ref_spectrums, query_spectrums, spec2vec_similarity)
  2. CosineGreedy: scores = calculate_scores(references=spectrums1, queries=spectrums2, similarity_function=CosineGreedy())
    However, I've got nan scores for spec2vec that I think it's because of low similarity between pairs of spectra.
    for example: the results of similarity functions between two spectra
  3. spec2vec
    Reference scan id: F1:2478
    Query scan id: 3350
    Score: [nan]

  1. CosineGreedy
    Reference scan id: F1:2478
    Query scan id: 3850
    Score: [0.004275957907034389, 4]
    I tried to change the allowed missing percentage from 5 to higher value but it didn't work.
    Could you please tell me how I can get a score rather than nan by applying spec2vec similarity function?
    Thanks!
@florian-huber
Copy link
Member

This could be multiple things. Usually I would expect the score to be 0 if something went wrong.

How dit you get Score?

@r00bit
Copy link
Author

r00bit commented Dec 2, 2022

Here is the code I used to calculate the similarity score for two files containing 5 spectra (just for test):

def calculate_similarity_spec2vec (ref_file, query_file, model_file):

# Load reference spectrums
ref_spectrums = load_data(ref_file)

# create spectrum documents
ref_documents = create_spectrum_documents(ref_spectrums)
query_spectrums = load_data(query_file)
query_documents = create_spectrum_documents(query_spectrums)
# build model

#model= create_model(ref_documents, model_file)
model= create_model(query_documents, model_file)

# Load query spectrums

# Define similarity function
spec2vec_similarity = Spec2Vec(model=model, intensity_weighting_power=0.5,
                           allowed_missing_percentage=5.0)
# Calculate scores on all combinations of reference and query spectrums
#scores = calculate_scores(ref_documents, query_spectrums, spec2vec_similarity)
scores = calculate_scores(ref_spectrums, query_spectrums, spec2vec_similarity)

scores is ndarray with shape(5,5) containing 'nan' valuse.
When I set allowed_missing_percentage t0 .98, the scores files would be high that is not correct.
When the ref and query files are the same, it returns scores with high value.
The similarity scores that returned by ModifiedCosine similarity function is not high for each spectra pairs but the values are between 0 and 0.03.

@r00bit
Copy link
Author

r00bit commented Jan 5, 2023

I learned this issue happened because there is low similarity between spectra so the missing_percentage is bigger than allowed_missing_percentage in _check_model_coverage function. For example, for the files that I tested the calculated missing percentage was around 86 so I set allowed_missing_percentage to 88, but it calculated high similarity scores which is not correct:
[[ nan nan nan nan nan]
[0.90276866 0.89489005 0.93134088 0.91878785 0.92963196]
[0.91237498 0.90594655 0.94265932 0.92595671 0.94527687]
[0.92896287 0.92476147 0.95530618 0.93518096 0.95431371]
[0.93886538 0.92957287 0.95451101 0.93058971 0.95301684]]
I think there is not way to fix it, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants