Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SMILES standardizer code #23

Merged
merged 13 commits into from
Sep 17, 2024
Merged

Add SMILES standardizer code #23

merged 13 commits into from
Sep 17, 2024

Conversation

shntnu
Copy link
Contributor

@shntnu shntnu commented Apr 6, 2024


import pandas as pd

df = pd.read_csv("/home/ec2-user/jump-cellpainting/3.standardize/standardize_ksiling_jumpmoa_jumptarget2/data/05_release/2022_10_18_JUMP-CP_compound_library_restandardized.csv", low_memory=False)
df = df.drop(columns=["jcp2020_id"])

df2 = df.loc[df.InChI_standardized != df.InChI_standardized_orig].drop_duplicates()
df2 = df2.transpose()
df2.columns = ['X1', 'X2']
df2
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
X1 X2
SMILES_original C=C(NC(=O)C(=C)NC(=O)c1csc(C2=NC3c4csc(n4)C4NC... C=C(NC(=O)C(=C)NC(=O)c1csc(C2=NC3c4csc(n4)C4NC...
SMILES_standardized CCC1NC(=O)C(C(C)O)NC(=O)c2csc(n2)C23CCC(c4nc(C... CCC1NC(=O)C(C(C)O)NC(=O)c2csc(n2)C23CCC(c4nc(C...
InChI_standardized InChI=1S/C72H85N19O18S5/c1-14-26(3)47-63(105)7... InChI=1S/C72H85N19O18S5/c1-14-26(3)47-63(105)7...
InChIKey_standardized AXHZBYJITSPJMH-UHFFFAOYSA-N AXHZBYJITSPJMH-UHFFFAOYSA-N
jcp2022_id JCP2022_091373 JCP2022_091373
pert_iname thiostrepton thiostrepton
InChIKey_orig NSFFHOGKXHRQEW-DVRIZHICSA-N NSFFHOGKXHRQEW-AIHSUZKVSA-N
InChIKey_standardized_orig UTBOEBCWXGDOGI-UHFFFAOYSA-N UTBOEBCWXGDOGI-UHFFFAOYSA-N
InChI_standardized_orig InChI=1S/C72H85N19O18S5/c1-14-26(3)47-63(105)7... InChI=1S/C72H85N19O18S5/c1-14-26(3)47-63(105)7...
jump_cp_control_type NaN NaN
pert_control_iname NaN NaN
Source[1] Broad Broad
Source[2] NaN NaN
Source[3] NaN NaN
Source[4] NaN NaN
Selection[1] T T
Selection[2] NaN NaN
Selection[3] NaN NaN
df2.loc[df2.X1 != df2.X2]
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
X1 X2
InChIKey_orig NSFFHOGKXHRQEW-DVRIZHICSA-N NSFFHOGKXHRQEW-AIHSUZKVSA-N
jump_cp_control_type NaN NaN
pert_control_iname NaN NaN
Source[2] NaN NaN
Source[3] NaN NaN
Source[4] NaN NaN
Selection[2] NaN NaN
Selection[3] NaN NaN

@shntnu shntnu requested a review from afermg April 6, 2024 12:39
@shntnu shntnu changed the title Add standardizer code Add SMILES standardizer code Apr 6, 2024
@afermg
Copy link
Collaborator

afermg commented Apr 8, 2024

I think there is some html leaking on the PR , such as " <style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; } ". I am going to presume that is not relevant to the PR itself.

@shntnu
Copy link
Contributor Author

shntnu commented Apr 8, 2024

Yes you can ignore the leak

Copy link
Collaborator

@afermg afermg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It requires some minor adjustments, but looks okay to me. Let me know if you think some of my comments are too cumbersome to implement. Also, I was able to install all dependencies using pip and lock them (See 0e53775, it uses poetry though). Would you consider adding that as an alternative dependency management solution? Space is already very limited on DGX and conda is well known for its massive venv sizes.

libs/smiles/environment.yml Outdated Show resolved Hide resolved
libs/smiles/src/smiles/standardize_smiles.py Outdated Show resolved Hide resolved
libs/smiles/src/smiles/standardize_smiles.py Outdated Show resolved Hide resolved
libs/smiles/environment.yml Outdated Show resolved Hide resolved
libs/smiles/src/smiles/standardize_smiles.py Outdated Show resolved Hide resolved
libs/smiles/src/smiles/standardize_smiles.py Outdated Show resolved Hide resolved
libs/smiles/test/test_standardize_smiles.py Outdated Show resolved Hide resolved
@afermg afermg self-assigned this May 1, 2024
@shntnu
Copy link
Contributor Author

shntnu commented Sep 17, 2024

@afermg Thank you for your thorough review! I'll go ahead and merge now.

@shntnu shntnu merged commit f8031fd into main Sep 17, 2024
1 check passed
@shntnu shntnu deleted the smiles branch September 17, 2024 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants