Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Similarity kernel implementation? #150

Open
alejandroarche opened this issue Dec 16, 2024 · 1 comment
Open

Similarity kernel implementation? #150

alejandroarche opened this issue Dec 16, 2024 · 1 comment

Comments

@alejandroarche
Copy link

Hi all!

I am reading https://doi.org/10.1039/C6CP00415F and I would like to use a global similarity kernel, as defined in Equation 18, which basically gives information about mixed-species components:
Screenshot 2024-12-16 at 14 55 52

Is this feature available in dscribe? Or perhaps could you show me the corresponding source code to be modified? In the end it's just about adding a customised function that lives between 0 and 1. For instance;
Screenshot 2024-12-16 at 14 56 56

Many thanks in advance.

Best,
Alejandro

@lauri-codes
Copy link
Contributor

Hi @alejandroarche!

The SOAP output from dscribe contains all of the information needed for the vector $\vec{p}$ in that equation you shared, as explained in the documentation.

In that same documentation page, you can see how the output vector is ordered, and you can also get a part belonging to a specific $\alpha$, $\beta$ species pair with SOAP.get_.location(alpha, beta).

So if I understood the alchemical kernel correctly, you should be able to build it from the dscribe SOAP output. It would look something like this:

import numpy as np
from dscribe.descriptors import SOAP
from ase.build import molecule
from ase import Atoms

# Setup two systems that have completely different species.
system_1 = Atoms(symbols=['H'], positions=[[0,0,0]])
system_2 = Atoms(symbols=['O'], positions=[[0,0,0]])

# Set up the SOAP descriptor
species = ['H', 'O']
soap = SOAP(
    species=species,
    periodic=False,
    r_cut=2,
    sigma=0.5,
    n_max=6,
    l_max=6,
)

# Create power spectrums. A "normal" kernel would see these two SOAP vectors as
# completely different as they dont share any species.
p_1 = soap.create(system_1, centers=[0])
p_2 = soap.create(system_2, centers=[0])

# Define dummy "alchemical" kernel
kernel = {
    ("H", "H"): 1,
    ("H", "O"): 1,
    ("O", "H"): 1,
    ("O", "O"): 1,
}

# Calculate similarity between p_1 and p_2: our dummy kernel says that H and O
# are chemically identical, so the similarity will be very big (note that the
# kernel is not normalized).
similarity = 0
for alpha_1 in species:
    for beta_1 in species:
        for alpha_2 in species:
            for beta_2 in species:
                # TODO: Here I only use the same-species terms, which is a
                # major simplification. The same-species terms have a smaller
                # size than cross-species terms due to the fact that dscribe
                # does not store the symmetric elements. To have shapes that
                # are compatible for the dot-product, you would have to
                # re-create the dropped terms here manually. Maybe the SOAP
                # descriptor should have flag that disables the dropping out of
                # these terms? See also issue #48
                if alpha_1 == beta_1 and alpha_2 == beta_2:
                    similarity += np.dot(
                        p_1[0, soap.get_location([alpha_1, beta_1])],
                        p_2[0, soap.get_location([alpha_2, beta_2])]
                    ) * kernel[alpha_1, alpha_2] * kernel[beta_1, beta_2]

print(similarity)

The only catch is that the dscribe output has removed some of the symmetric elements, which don't carry any extra information. This means that the size of a same-species portion of the vector is actually smaller than a cross-species portion of the vector and you can't take their dot product directly. You would first need to need to reintroduce those dropped out parts manually. Issue #48 might help to understand this a bit more.

It should be possible to introduce a flag in SOAP that disables the dropping out of these symmetric parts. That would just require a bit of digging around in the codebase (see e.g. here). We would be more than happy to include such a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants