Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drug_Discovery_Model #1123

Closed
wants to merge 1 commit into from
Closed

Conversation

from13
Copy link

@from13 from13 commented Oct 15, 2024

Project: Drug Discovery Using Machine Learning
Objective: The goal of this project is to predict whether a chemical compound (molecule) has biological activity, which is a critical step in drug discovery. This approach helps identify potential drug candidates by screening large sets of chemical compounds for promising properties.

Concept:
In drug discovery, molecules are tested for their biological activity (i.e., the ability to interact with a biological target, such as a protein). Testing compounds in labs is costly and time-consuming, so machine learning can help filter out likely inactive compounds, thus accelerating the discovery of new drugs.

Steps Involved:
Dataset:

The dataset consists of chemical compounds represented in SMILES format (Simplified Molecular Input Line Entry System), which encodes the molecular structure as strings. It also contains labels indicating whether the molecule is biologically active (1) or inactive (0). Example dataset source: ZINC Database.
Molecular Fingerprint Representation:

Molecules are difficult to process directly, so we convert them into a numerical representation known as Morgan Fingerprints. Morgan fingerprints capture the presence of chemical substructures (specific patterns) within a molecule. These patterns are encoded into binary vectors, which serve as input features for machine learning models. Random Forest Classifier:

We use the RandomForestClassifier from the Scikit-learn library, a robust and widely-used machine learning model for classification tasks. It works by creating multiple decision trees and combining their outputs to make accurate predictions. Model Training:

The molecular fingerprints (features) and biological activity labels (target) are split into training and testing sets. The model is trained on the training set and then evaluated on the testing set. Prediction:

After training, we use the model to predict whether a new chemical compound (represented by a SMILES string) is likely to be biologically active or inactive. How It Works:
SMILES to Fingerprints: SMILES strings are converted into molecular fingerprints using RDKit. Training: The machine learning model is trained on these fingerprints along with activity labels (active/inactive). Prediction: After training, the model can predict the biological activity of new molecules based on their chemical structures.

Project: Drug Discovery Using Machine Learning
Objective: The goal of this project is to predict whether a chemical compound (molecule) has biological activity, which is a critical step in drug discovery. This approach helps identify potential drug candidates by screening large sets of chemical compounds for promising properties.

Concept:
In drug discovery, molecules are tested for their biological activity (i.e., the ability to interact with a biological target, such as a protein). Testing compounds in labs is costly and time-consuming, so machine learning can help filter out likely inactive compounds, thus accelerating the discovery of new drugs.

Steps Involved:
Dataset:

The dataset consists of chemical compounds represented in SMILES format (Simplified Molecular Input Line Entry System), which encodes the molecular structure as strings.
It also contains labels indicating whether the molecule is biologically active (1) or inactive (0).
Example dataset source: ZINC Database.
Molecular Fingerprint Representation:

Molecules are difficult to process directly, so we convert them into a numerical representation known as Morgan Fingerprints.
Morgan fingerprints capture the presence of chemical substructures (specific patterns) within a molecule. These patterns are encoded into binary vectors, which serve as input features for machine learning models.
Random Forest Classifier:

We use the RandomForestClassifier from the Scikit-learn library, a robust and widely-used machine learning model for classification tasks. It works by creating multiple decision trees and combining their outputs to make accurate predictions.
Model Training:

The molecular fingerprints (features) and biological activity labels (target) are split into training and testing sets.
The model is trained on the training set and then evaluated on the testing set.
Prediction:

After training, we use the model to predict whether a new chemical compound (represented by a SMILES string) is likely to be biologically active or inactive.
How It Works:
SMILES to Fingerprints: SMILES strings are converted into molecular fingerprints using RDKit.
Training: The machine learning model is trained on these fingerprints along with activity labels (active/inactive).
Prediction: After training, the model can predict the biological activity of new molecules based on their chemical structures.
Copy link

Thank you for submitting your pull request! 🙌 We'll review it as soon as possible. If there are any specific instructions or feedback regarding your PR, we'll provide them here. Thanks again for your contribution! 😊

@from13 from13 marked this pull request as draft October 15, 2024 08:53
@from13 from13 marked this pull request as ready for review October 15, 2024 08:54
@from13
Copy link
Author

from13 commented Oct 15, 2024

issue number -->#1123

@Niketkumardheeryan
Copy link
Owner

@from13 we don't accept .py file or text files , please try to .ipynb file with proper readme file

@Niketkumardheeryan
Copy link
Owner

Need to create new pr now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants