Skip to content

Releases: swansonk14/chemfunc

TPSA

17 Jan 00:11
a931b8c
Compare
Choose a tag to compare

Adding TPSA as a molecular property for compute_properties.

SDF to SMARTS and setup.py to pyproject.toml

19 Sep 23:12
b141d05
Compare
Choose a tag to compare

SDF to SMARTS

Added the function sdf_to_smarts, which behaves just like sdf_to_smiles but it converts an SDF file to SMARTS instead of SMILES. This required refactoring the code by creating a new convert_sdf function that is called by both sdf_to_smiles and sdf_to_smarts with different parameters.

setup.py to pyproject.toml

Refactored the code base to replace the setup.py (using setuptools) with pyproject.toml (using hatchling).

PAINS filter

08 Jul 17:39
74706e4
Compare
Choose a tag to compare

Adding filters for PAINS and other unwanted substructures as the "pains_plus" property in chemfunc compute_properties.

MCS denominator options

21 Jun 16:53
3aee73f
Compare
Choose a tag to compare

The maximum common substructure (MCS) similarity calculation in molecular_similarities.py now has additional options for the denominator used in the MCS similarity calculation. The denominator can be specified with --denominator <denominator> where <denominator> is one of the following three options.

mol_1: similarity = (MCS size) / (number of atoms in mol_1)
mol_2: similarity = (MCS size) / (number of atoms in mol_2)
avg: similarity = 0.5 * [(MCS size) / (number of atoms in mol_1) + (MCS size) / (number of atoms in mol_2)]

The previous definition was mol_2 so this is the default.

Improved MCS and Regression to Classification

26 May 21:26
e4737ee
Compare
Choose a tag to compare

MCS

The maximum common substructure (MCS) similarity function in molecular_similarities.py now accepts additional parameters for modifying the MCS calculation. Specifically, it now allows for match_valences, ring_matches_ring_only, and complete_rings_only (see https://www.rdkit.org/docs/source/rdkit.Chem.MCS.html). These are also accessible via the command line when running chemfunc nearest_neighbor.

Regression to Classification

The regression_to_classification.py script now includes a delete_class_indices flag to delete certain class indices. The primary use case is for building binary classification datasets with a gap between the active and inactive categories. For example, setting thresholds = [0.4, 0.6] and delete_class_indices = {1} will label data < 0.4 as 0 and data >= 0.6 as 1 (originally labeled 2) and will delete data in between 0.4 and 0.6 (originally labeled 1).

SDF to SMILES Properties

26 May 03:31
05a1dae
Compare
Choose a tag to compare

SDF to SMILES

This release primarily modifies the sdf_to_smiles function. Previously, the user had to specify which properties they wanted extracted from the SDF file (along with the SMILES) using the properties flag. That option still remains, but now the user can alternatively request that all properties are extracted from the SDF (with the all_properties flag). Additionally, the user can now specify the name of the column in the CSV file that will contain smiles using the smiles_column flag (previously it was hard-coded to "smiles").

Morgan fingerprints

Additionally, this release changes the fingerprint type of Morgan fingerprints from bool to np.float32. This brings it in line with the RDKit fingerprints, which are already np.float32. Additionally, this will help avoid issues with using Morgan fingerprints in ML models that expect float-type vectors.

Fixing t-SNE

29 Dec 16:36
269c7c9
Compare
Choose a tag to compare

This release fixes an issue in the plot_tsne.py script, where the TSNE object was initialized with a now deprecated parameter called square_distances. This parameter has now been removed.

Save Fingerprints Script

13 Jul 23:26
429c55f
Compare
Choose a tag to compare

Added a save_fingerprints script to compute fingerprints (RDKit or Morgan) from the SMILES in a CSV file and save them as an NPZ file. The computation is done in parallel for speed.

Also fixed a version issue between scipy and descriptastorus.

Minor Fixes

02 Jun 15:52
72b0626
Compare
Choose a tag to compare

Fixing RDKit fingerprints with NumPy version >= 1.24.0 and fixing metrics for nearest neighbor scripts.

Fixing SA Score Import

01 Jun 19:02
d560ae5
Compare
Choose a tag to compare

Fixing SA score import.