Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
Luthaf committed Aug 28, 2024
0 parents commit 1dc9ae8
Show file tree
Hide file tree
Showing 23,282 changed files with 8,222,994 additions and 0 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
Empty file added .nojekyll
Empty file.
1 change: 1 addition & 0 deletions CNAME
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
docs.metatensor.org
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# metatensor-docs

Documentation website for metatensor. This is in a separate repository to limit
the size of the main repository.
11 changes: 11 additions & 0 deletions index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
<!DOCTYPE html>
<html>

<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" />
<meta http-equiv="refresh" content="0;URL=latest/index.html" />
</head>

<body></body>
</html>
4 changes: 4 additions & 0 deletions latest/.buildinfo
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 28d93f123c0060d0856cc677117cf63a
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n\n# Handling sparsity\n\nThe one sentence introduction to metatensor mentions that this is a \"self-describing\n**sparse** tensor data format\". The `previous tutorial <core-tutorial-first-steps>`\nexplained the self-describing part of the format, and in this tutorial we will explore\nwhat makes metatensor a sparse format; and how to remove the corresponding sparsity when\nrequired.\n\nLike in the `previous tutorial <core-tutorial-first-steps>`, we will load the data\nwe need from a file. The code used to generate this file can be found below:\n\n.. details:: Show the code used to generate the :file:`radial-spectrum.npz` file\n\n ..\n\n The data was generated with `rascaline`_, a package to compute atomistic\n representations for machine learning applications.\n\n\n .. literalinclude:: radial-spectrum.py.example\n :language: python\n\nThe file contains a representation of two molecules called the radial spectrum. The atom\n$i$ is represented by the radial spectrum $R_i^\\alpha$, which is an\nexpansion of the neighbor density $\\rho_i^\\alpha(r)$ on a set of radial basis\nfunctions $f_n(r)$\n\n\\begin{align}R_i^\\alpha(n) = \\int f_n(r) \\rho_i(r) dr\\end{align}\n\nThe density $\\rho_i^\\alpha(r)$ associated with all neighbors of species\n$\\alpha$ of the atom $i$ (each neighbor is replaced with a Gaussian function\ncentered on the neighbor $g(r_{ij})$) is defined as:\n\n\\begin{align}\\rho_i^\\alpha(r) = \\sum_{j \\in \\text{ neighborhood of i }} g(r_{ij})\n \\delta_{\\alpha_j,\\alpha}\\end{align}\n\n\nThe exact mathematical details above don't matter too much for this tutorial, the main\npoint being that this representation treats atomic species as completely independent,\neffectively using the neighbor species $\\alpha$ for `one-hot encoding`_.\n\n\n.. py:currentmodule:: metatensor\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import ase\nimport ase.visualize.plot\nimport matplotlib.pyplot as plt\nimport numpy as np\n\nimport metatensor"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will work on the radial spectrum representation of three molecules in our system:\na carbon monoxide, an oxygen molecule and a nitrogen molecule.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"atoms = ase.Atoms(\n \"COO2N2\",\n positions=[(0, 0, 0), (1.2, 0, 0), (0, 6, 0), (1.1, 6, 0), (6, 0, 0), (7.3, 0, 0)],\n)\n\nfig, ax = plt.subplots(figsize=(3, 3))\nase.visualize.plot.plot_atoms(atoms, ax)\nax.set_axis_off()\nplt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Sparsity in ``TensorMap``\n\nThe radial spectrum representation has two keys: ``central_species`` indicating the\nspecies of the central atom (atom $i$ in the equations); and\n``neighbor_type`` indicating the species of the neighboring atoms (atom $j$\nin the equations)\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"radial_spectrum = metatensor.load(\"radial-spectrum.npz\")\n\nprint(radial_spectrum)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This shows the first level of sparsity in ``TensorMap``: block sparsity.\n\nOut of all possible combinations of ``central_species`` and ``neighbor_type``, some\nare missing such as ``central_species=7, neighbor_type=8``. This is because we are\nusing a spherical cutoff of 2.5 \u00c5, and as such there are no oxygen neighbor atoms\nclose enough to the nitrogen centers. This means that all the corresponding radial\nspectrum coefficients $R_i^\\alpha(n)$ will be zero (since the neighbor density\n$\\rho_i^\\alpha(r)$ is zero everywhere).\n\nInstead of wasting memory space by storing all of these zeros explicitly, we simply\navoid creating the corresponding blocks from the get-go and save a lot of memory!\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's now look at the block containing the representation for oxygen centers and\ncarbon neighbors:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"block = radial_spectrum.block(center_type=8, neighbor_type=6)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Naively, this block should contain samples for all oxygen atoms (since\n``center_type=8``); in practice we only have a single sample!\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(block.samples)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There is a second level of sparsity happening here, using a format related to\n[coordinate sparse arrays (COO format)](COO_). Since there is only one oxygen atom\nwith carbon neighbors, we only include this atom in the samples, and the\ndensity/radial spectrum coefficient for all the other oxygen atoms is assumed to be\nzero.\n\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Making the data dense again\n\nSometimes, we might have to use data in a sparse metatensor format with code that does\nnot understands this sparsity. One solution is to convert the data to a dense format,\nmaking the zeros explicit as much as possible. Metatensor provides functionalities to\nconvert sparse data to a dense format for the keys sparsity; and metadata to convert\nto a dense format for sample sparsity.\n\nFirst, the sample sparsity can be removed block by block by creating a new array full\nof zeros, and copying the data according to the indices in ``block.samples``\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"dense_block_data = np.zeros((len(atoms), block.values.shape[1]))\n\n# only copy the non-zero data stored in the block\ndense_block_data[block.samples[\"atom\"]] = block.values\n\nprint(dense_block_data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Alternatively, we can undo the keys sparsity with\n:py:meth:`TensorMap.keys_to_samples` and :py:meth:`TensorMap.keys_to_properties`,\nwhich merge multiple blocks along the samples or properties dimensions respectively.\n\nWhich one of these functions to call will depend on the data you are handling.\nTypically, one-hot encoding (the ``neighbor_types`` key here) should be merged\nalong the properties dimension; and keys that define subsets of the samples\n(``center_type``) should be merged along the samples dimension.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"dense_radial_spectrum = radial_spectrum.keys_to_samples(\"center_type\")\ndense_radial_spectrum = dense_radial_spectrum.keys_to_properties(\"neighbor_type\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After calling these two functions, we now have a :py:class:`TensorMap` with a single\nblock and no keys:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(dense_radial_spectrum)\n\nblock = dense_radial_spectrum.block()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that the resulting dense data array contains a lot of zeros (and has a well\ndefined block-sparse structure):\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"with np.printoptions(precision=3):\n print(block.values)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And using the metadata attached to the block, we can understand which part of the data\nis zero and why. For example, the lower-right corner of the array corresponds to\nnitrogen atoms (the last two samples):\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(block.samples.print(max_entries=-1))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And these two bottom rows are zero everywhere, except in the part representing the\nnitrogen neighbor density:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(block.properties.print(max_entries=-1))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.5"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Binary file not shown.
Binary file not shown.
Loading

0 comments on commit 1dc9ae8

Please sign in to comment.