Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add first draft of nomeclature converters #10

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

Conversation

MrtinoRG
Copy link
Collaborator

@MrtinoRG MrtinoRG commented Dec 6, 2024

No description provided.

@MrtinoRG MrtinoRG requested a review from kjappelbaum December 6, 2024 13:04
@MrtinoRG MrtinoRG marked this pull request as draft December 6, 2024 15:00
@MrtinoRG
Copy link
Collaborator Author

MrtinoRG commented Dec 6, 2024

I still have to add the functions here to the modal app

@kjappelbaum
Copy link
Collaborator

For the OPSIN calls http://cinfony.github.io might be worth a trial

return None

@backoff.on_exception(backoff.expo, (aiohttp.ClientError, asyncio.TimeoutError), max_time=10, logger=logger)
async def opsin(self) -> Optional[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we could also run it locally (see the other comment). It should certainly also give more throughput.

Comment on lines 139 to 164
@backoff.on_exception(backoff.expo, (aiohttp.ClientError, asyncio.TimeoutError), max_time=10, logger=logger)
async def unknown(self) -> Optional[str]:
"""
Queries the Unknown API to convert chemical name to SMILES.

Returns:
str: SMILES notation if successful, None otherwise

Raises:
aiohttp.ClientError: On API connection errors
asyncio.TimeoutError: If request times out
"""
url = f"http://46.4.119.202:8082/?name=%7B{self.name}%7D&what=smiles"

async with aiohttp.ClientSession() as session:
try:
async with session.get(url, timeout=self.timeout) as response:
if response.status == 200:
message_content = await response.text()
message_text = message_content.split('Message:')[1].strip()
return message_text
logger.warning(f"Unknown API failed with status {response.status}")
return None
except Exception as e:
logger.error(f"Unknown API error: {e}")
return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we should make this one public

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about the current solution? Maybe it can be refined so anyone can use it with a local database or a private endpoint

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice tests!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice that you also mocked some parts!

@MrtinoRG MrtinoRG marked this pull request as ready for review December 14, 2024 17:21
@@ -4,6 +4,20 @@
get_number_of_topologically_distinct_atoms as _get_topologically_distinct_atoms,
get_element_info as _get_element_info,
)
from chemenv.tools.converters import (
_converters_image,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was the reason for making all of them semantically private (leading _)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that most of the functions of the app are called the same as the working functions, or they were at some point

Comment on lines 57 to 60
# Basic usage with IUPAC name
>>> converter = Name2Smiles("2-propanone")
>>> await converter.get_smiles()
'CC(=O)C'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be indented 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Thanks!

logger.error(f"Error encoding name: {e}")
raise ValueError(f"Invalid chemical name: {name}")

self.timeout = 10 # seconds
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make this customizable via the constructor? Or is this a very robust setting that we will not often touch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 6365350 they are customizable

raise e

async def get_smiles(self) -> Optional[str]:
"""Query all APIs in parallel until a valid SMILES is found.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey - this is super cool! :)

raise ValueError(f"Invalid SMILES: {smiles}")

self.smiles = smiles
self.timeout = 10 # seconds
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar question as above: should we move the timeout to the constructor?

Query PubChem API to get IUPAC name from SMILES.

Returns:
Optional[str]: IUPAC name if found, None if the query failed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Is this the correct syntax for optional arguments in this docstring style?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unified into the Google PEP 484 type annotations in 6365350

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And thanks for the note!!

Takes a SMILES and return the DeepSMILES encoding.

Args:
smiles: SMILES string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a micro-nitpick is that you could still add the type annotations to some of those docstrings

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the medium term (a separate PR), it would probably be cool to have some RXN-SMILES converters. Those might not be useful for solving chembench but for your RXN project.

Comment on lines +205 to +207
mock_cactus = AsyncMock(return_value="ethanol")
mock_pubchem = AsyncMock(return_value=None)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very cool that you have such tests!

@kjappelbaum kjappelbaum self-requested a review January 20, 2025 05:11
@MrtinoRG MrtinoRG requested a review from n0w0f January 20, 2025 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants