Drug_Discovery_Model #1123

from13 · 2024-10-15T08:53:13Z

Project: Drug Discovery Using Machine Learning
Objective: The goal of this project is to predict whether a chemical compound (molecule) has biological activity, which is a critical step in drug discovery. This approach helps identify potential drug candidates by screening large sets of chemical compounds for promising properties.

Concept:
In drug discovery, molecules are tested for their biological activity (i.e., the ability to interact with a biological target, such as a protein). Testing compounds in labs is costly and time-consuming, so machine learning can help filter out likely inactive compounds, thus accelerating the discovery of new drugs.

Steps Involved:
Dataset:

The dataset consists of chemical compounds represented in SMILES format (Simplified Molecular Input Line Entry System), which encodes the molecular structure as strings. It also contains labels indicating whether the molecule is biologically active (1) or inactive (0). Example dataset source: ZINC Database.
Molecular Fingerprint Representation:

Molecules are difficult to process directly, so we convert them into a numerical representation known as Morgan Fingerprints. Morgan fingerprints capture the presence of chemical substructures (specific patterns) within a molecule. These patterns are encoded into binary vectors, which serve as input features for machine learning models. Random Forest Classifier:

We use the RandomForestClassifier from the Scikit-learn library, a robust and widely-used machine learning model for classification tasks. It works by creating multiple decision trees and combining their outputs to make accurate predictions. Model Training:

The molecular fingerprints (features) and biological activity labels (target) are split into training and testing sets. The model is trained on the training set and then evaluated on the testing set. Prediction:

After training, we use the model to predict whether a new chemical compound (represented by a SMILES string) is likely to be biologically active or inactive. How It Works:
SMILES to Fingerprints: SMILES strings are converted into molecular fingerprints using RDKit. Training: The machine learning model is trained on these fingerprints along with activity labels (active/inactive). Prediction: After training, the model can predict the biological activity of new molecules based on their chemical structures.

Project: Drug Discovery Using Machine Learning Objective: The goal of this project is to predict whether a chemical compound (molecule) has biological activity, which is a critical step in drug discovery. This approach helps identify potential drug candidates by screening large sets of chemical compounds for promising properties. Concept: In drug discovery, molecules are tested for their biological activity (i.e., the ability to interact with a biological target, such as a protein). Testing compounds in labs is costly and time-consuming, so machine learning can help filter out likely inactive compounds, thus accelerating the discovery of new drugs. Steps Involved: Dataset: The dataset consists of chemical compounds represented in SMILES format (Simplified Molecular Input Line Entry System), which encodes the molecular structure as strings. It also contains labels indicating whether the molecule is biologically active (1) or inactive (0). Example dataset source: ZINC Database. Molecular Fingerprint Representation: Molecules are difficult to process directly, so we convert them into a numerical representation known as Morgan Fingerprints. Morgan fingerprints capture the presence of chemical substructures (specific patterns) within a molecule. These patterns are encoded into binary vectors, which serve as input features for machine learning models. Random Forest Classifier: We use the RandomForestClassifier from the Scikit-learn library, a robust and widely-used machine learning model for classification tasks. It works by creating multiple decision trees and combining their outputs to make accurate predictions. Model Training: The molecular fingerprints (features) and biological activity labels (target) are split into training and testing sets. The model is trained on the training set and then evaluated on the testing set. Prediction: After training, we use the model to predict whether a new chemical compound (represented by a SMILES string) is likely to be biologically active or inactive. How It Works: SMILES to Fingerprints: SMILES strings are converted into molecular fingerprints using RDKit. Training: The machine learning model is trained on these fingerprints along with activity labels (active/inactive). Prediction: After training, the model can predict the biological activity of new molecules based on their chemical structures.

github-actions · 2024-10-15T08:53:25Z

Thank you for submitting your pull request! 🙌 We'll review it as soon as possible. If there are any specific instructions or feedback regarding your PR, we'll provide them here. Thanks again for your contribution! 😊

from13 · 2024-10-15T08:54:16Z

issue number -->#1123

Niketkumardheeryan · 2024-10-21T15:08:29Z

@from13 we don't accept .py file or text files , please try to .ipynb file with proper readme file

Niketkumardheeryan · 2024-11-05T05:19:40Z

Need to create new pr now

from13 marked this pull request as draft October 15, 2024 08:53

from13 marked this pull request as ready for review October 15, 2024 08:54

Niketkumardheeryan closed this Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drug_Discovery_Model #1123

Drug_Discovery_Model #1123

from13 commented Oct 15, 2024

github-actions bot commented Oct 15, 2024

from13 commented Oct 15, 2024

Niketkumardheeryan commented Oct 21, 2024

Niketkumardheeryan commented Nov 5, 2024

Drug_Discovery_Model #1123

Drug_Discovery_Model #1123

Conversation

from13 commented Oct 15, 2024

github-actions bot commented Oct 15, 2024

from13 commented Oct 15, 2024

Niketkumardheeryan commented Oct 21, 2024

Niketkumardheeryan commented Nov 5, 2024