Similarity kernel implementation? #150

alejandroarche · 2024-12-16T14:57:13Z

Hi all!

I am reading https://doi.org/10.1039/C6CP00415F and I would like to use a global similarity kernel, as defined in Equation 18, which basically gives information about mixed-species components:

Is this feature available in dscribe? Or perhaps could you show me the corresponding source code to be modified? In the end it's just about adding a customised function that lives between 0 and 1. For instance;

Many thanks in advance.

Best,
Alejandro

lauri-codes · 2024-12-17T18:54:23Z

Hi @alejandroarche!

The SOAP output from dscribe contains all of the information needed for the vector $\vec{p}$ in that equation you shared, as explained in the documentation.

In that same documentation page, you can see how the output vector is ordered, and you can also get a part belonging to a specific $\alpha$, $\beta$ species pair with SOAP.get_.location(alpha, beta).

So if I understood the alchemical kernel correctly, you should be able to build it from the dscribe SOAP output. It would look something like this:

import numpy as np
from dscribe.descriptors import SOAP
from ase.build import molecule
from ase import Atoms

# Setup two systems that have completely different species.
system_1 = Atoms(symbols=['H'], positions=[[0,0,0]])
system_2 = Atoms(symbols=['O'], positions=[[0,0,0]])

# Set up the SOAP descriptor
species = ['H', 'O']
soap = SOAP(
    species=species,
    periodic=False,
    r_cut=2,
    sigma=0.5,
    n_max=6,
    l_max=6,
)

# Create power spectrums. A "normal" kernel would see these two SOAP vectors as
# completely different as they dont share any species.
p_1 = soap.create(system_1, centers=[0])
p_2 = soap.create(system_2, centers=[0])

# Define dummy "alchemical" kernel
kernel = {
    ("H", "H"): 1,
    ("H", "O"): 1,
    ("O", "H"): 1,
    ("O", "O"): 1,
}

# Calculate similarity between p_1 and p_2: our dummy kernel says that H and O
# are chemically identical, so the similarity will be very big (note that the
# kernel is not normalized).
similarity = 0
for alpha_1 in species:
    for beta_1 in species:
        for alpha_2 in species:
            for beta_2 in species:
                # TODO: Here I only use the same-species terms, which is a
                # major simplification. The same-species terms have a smaller
                # size than cross-species terms due to the fact that dscribe
                # does not store the symmetric elements. To have shapes that
                # are compatible for the dot-product, you would have to
                # re-create the dropped terms here manually. Maybe the SOAP
                # descriptor should have flag that disables the dropping out of
                # these terms? See also issue #48
                if alpha_1 == beta_1 and alpha_2 == beta_2:
                    similarity += np.dot(
                        p_1[0, soap.get_location([alpha_1, beta_1])],
                        p_2[0, soap.get_location([alpha_2, beta_2])]
                    ) * kernel[alpha_1, alpha_2] * kernel[beta_1, beta_2]

print(similarity)

The only catch is that the dscribe output has removed some of the symmetric elements, which don't carry any extra information. This means that the size of a same-species portion of the vector is actually smaller than a cross-species portion of the vector and you can't take their dot product directly. You would first need to need to reintroduce those dropped out parts manually. Issue #48 might help to understand this a bit more.

It should be possible to introduce a flag in SOAP that disables the dropping out of these symmetric parts. That would just require a bit of digging around in the codebase (see e.g. here). We would be more than happy to include such a PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Similarity kernel implementation? #150

Similarity kernel implementation? #150

alejandroarche commented Dec 16, 2024

lauri-codes commented Dec 17, 2024

Similarity kernel implementation? #150

Similarity kernel implementation? #150

Comments

alejandroarche commented Dec 16, 2024

lauri-codes commented Dec 17, 2024