-
Notifications
You must be signed in to change notification settings - Fork 342
Indexing
In this chapter you will learn how Elasticsuite is proceeding to index content into Elasticsearch.
This guide will not cover Elasticsearch basics, such as "what is an index ?" or "what is a field ?". It is prerequisite that you already know the main concepts of Elasticsearch before exploring this guide.
Elasticsuite will create an index in Elasticsearch for each entity type and store view.
For now, indexed entity types are Products, Categories, and Synonyms.
The indices' names are based on :
- the alias defined in configuration (see Indices Settings in Module install)
- the store code
- the indexed entity type
- an horodated pattern
Lets say we have a Magento Store with 2 store views (with 'en' and 'fr' as store code), and the alias set to magento2, the following indices will be created :
- magento2_en_catalog_category_20171110_113448
- magento2_en_catalog_product_20171110_113610
- magento2_en_thesaurus_20171110_113449
- magento2_fr_catalog_category_20171110_113448
- magento2_fr_catalog_product_20171110_113610
- magento2_fr_thesaurus_20171110_113449
These indices configuration is driven by the elasticsuite_indices.xml
file. You can declare a new elasticsuite_indices.xml
file in your module if you plan to index other entities.
Let's see how it is declared for the products index :
<indices xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="urn:magento:module:Smile_ElasticsuiteCore:etc/elasticsuite_indices.xsd">
<index identifier="catalog_product" defaultSearchType="product">
<type name="product" idFieldName="entity_id">
...
The indexer is declared via the Magento's indexer.xml
like this is done for the category indexing :
Product indexer is not shown since it is already declared in Magento and only modified by Elasticsuite
<indexer id="elasticsuite_categories_fulltext" view_id="elasticsuite_categories_fulltext" class="Smile\ElasticsuiteCatalog\Model\Category\Indexer\Fulltext">
<title translate="true">ElasticSuite Category Indexing</title>
<description translate="true">Reindex ElasticSuite catalog categories.</description>
</indexer>
Finally, your indexing model must use the proper Indexer Handler (which shall extend \Smile\ElasticsuiteCore\Indexer\GenericIndexerHandler
) and have the proper index name and type defined. This can be done via DI.
Eg for the categories :
<virtualType name="catalogCategorySearchIndexHandler" type="\Smile\ElasticsuiteCore\Indexer\GenericIndexerHandler">
<arguments>
<argument name="indexName" xsi:type="string">catalog_category</argument>
<argument name="typeName" xsi:type="string">category</argument>
</arguments>
</virtualType>
<type name="Smile\ElasticsuiteCatalog\Model\Category\Indexer\Fulltext">
<arguments>
<argument name="indexerHandler" xsi:type="object">catalogCategorySearchIndexHandler</argument>
</arguments>
</type>
Now it's time to write your Indexer Model.
Take a look on the Elasticsuite Categories Indexer which is basically an implementation of \Magento\Framework\Indexer\ActionInterface
and \Magento\Framework\Mview\ActionInterface
:
class Fulltext implements \Magento\Framework\Indexer\ActionInterface, \Magento\Framework\Mview\ActionInterface
{
/**
* @var string
*/
const INDEXER_ID = 'elasticsuite_categories_fulltext';
/**
* @var IndexerInterface
*/
private $indexerHandler;
/**
* @var StoreManagerInterface
*/
private $storeManager;
/**
* @var DimensionFactory
*/
private $dimensionFactory;
/**
* @var Full
*/
private $fullAction;
/**
* @param Full $fullAction The full index action
* @param IndexerInterface $indexerHandler The index handler
* @param StoreManagerInterface $storeManager The Store Manager
* @param DimensionFactory $dimensionFactory The dimension factory
*/
public function __construct(
Full $fullAction,
IndexerInterface $indexerHandler,
StoreManagerInterface $storeManager,
DimensionFactory $dimensionFactory
) {
$this->fullAction = $fullAction;
$this->indexerHandler = $indexerHandler;
$this->storeManager = $storeManager;
$this->dimensionFactory = $dimensionFactory;
}
/**
* Execute materialization on ids entities
*
* @param int[] $ids The ids
*
* @return void
*/
public function execute($ids)
{
$storeIds = array_keys($this->storeManager->getStores());
foreach ($storeIds as $storeId) {
$dimension = $this->dimensionFactory->create(['name' => 'scope', 'value' => $storeId]);
$this->indexerHandler->deleteIndex([$dimension], new \ArrayObject($ids));
$this->indexerHandler->saveIndex([$dimension], $this->fullAction->rebuildStoreIndex($storeId, $ids));
}
}
/**
* Execute full indexation
*
* @return void
*/
public function executeFull()
{
$storeIds = array_keys($this->storeManager->getStores());
foreach ($storeIds as $storeId) {
$dimension = $this->dimensionFactory->create(['name' => 'scope', 'value' => $storeId]);
$this->indexerHandler->cleanIndex([$dimension]);
$this->indexerHandler->saveIndex([$dimension], $this->fullAction->rebuildStoreIndex($storeId));
}
}
/**
* {@inheritDoc}
*/
public function executeList(array $categoryIds)
{
$this->execute($categoryIds);
}
/**
* {@inheritDoc}
*/
public function executeRow($categoryId)
{
$this->execute([$categoryId]);
}
You see that the main part is about the $this->fullAction->rebuildStoreIndex($storeId, $ids)
.
This model is just retrieving entities to index. It can have some custom logic, for products it only takes the products which are visible.
Once you have this, you are done with your index definition and your indexer model.
But for now, you are only iterating over main table of your entities and are missing the most part of your data.
Let's see now how you will add content into your Elasticsearch index.
The Mapping is the part that will define which fields are stored into Elasticsearch, how they are stored (it is different if a field is used for filtering or sorting), and what type they have.
You can read more about mappings in the Elasticsearch documentation
Each index can have several data sources. These objects are meant to retrieve data (from MySQL, or even elsewhere if needed) and aggregate them into documents that will be sent to Elasticsearch.
Let's go to the di.xml
file of the Smile_ElasticsuiteCatalog module and see what do we have for the product index :
<!-- Datasources resolver -->
<type name="Smile\ElasticsuiteCore\Index\DataSourceResolver">
<arguments>
<argument name="datasources" xsi:type="array">
<item name="catalog_product" xsi:type="array">
<item name="prices" xsi:type="object">Smile\ElasticsuiteCatalog\Model\Product\Indexer\Fulltext\Datasource\PriceData</item>
<item name="categories" xsi:type="object">Smile\ElasticsuiteCatalog\Model\Product\Indexer\Fulltext\Datasource\CategoryData</item>
<item name="attributes" xsi:type="object">Smile\ElasticsuiteCatalog\Model\Product\Indexer\Fulltext\Datasource\AttributeData</item>
<item name="stock" xsi:type="object">Smile\ElasticsuiteCatalog\Model\Product\Indexer\Fulltext\Datasource\InventoryData</item>
<item name="searchPositions" xsi:type="object">Smile\ElasticsuiteCatalog\Model\Product\Indexer\Fulltext\Datasource\SearchPositionData</item>
</item>
We have 5 datasources, retrieving different kind of data. Being able to have several allows us to write tiny data sources that are easy to maintain and do only a precise job.
This is also really easy for anybody to add a custom data source by defining it into a new di.xml
in his module.
Each DataSource is basically a simple Model implementing Smile\ElasticsuiteCore\Api\Index\DatasourceInterface
which has only one method : addData($storeId, array $indexData)
-
$storeId
is the Store Id being reindexed. -
$indexData
is the "current" data being indexed. Since it can have gone through other datasources before, you may have various amount of data on it. But what is important is that the key of the array is theidFieldName
defined inelasticsuite_indices.xml
. Eg in the product datasources, we often do$productIds = array_keys($indexData);
and then retrieve products data and add it to$indexData
.
Let's see an example with the Stock Datasource which uses a resource model to load stock data, and then push it to the $indexData
:
/**
* Add inventory data to the index data.
* {@inheritdoc}
*/
public function addData($storeId, array $indexData)
{
$inventoryData = $this->resourceModel->loadInventoryData($storeId, array_keys($indexData));
foreach ($inventoryData as $inventoryDataRow) {
$productId = (int) $inventoryDataRow['product_id'];
$indexData[$productId]['stock'] = [
'is_in_stock' => (bool) $inventoryDataRow['stock_status'],
'qty' => (int) $inventoryDataRow['qty'],
];
}
return $indexData;
}
For now, you should already see some examples of potential additional data sources which are easy to implement :
- fetch product ratings from the database.
- add data coming from external services via an API if needed.
- and so on...
Once the data sources are done, you are now able to define how the data you have just added should be indexed into Elasticsearch.
This part is basically about how data coming from Magento will be converted into Elasticsearch fields.
You can learn more about Elasticsearch fields types here
The easy way is to define directly the fields into the elasticsuite_indices.xml
file, like this :
<mapping>
<!-- Static fields handled by the base indexer (not datasource) -->
<field name="entity_id" type="integer" />
<field name="attribute_set_id" type="integer" />
<field name="has_options" type="boolean" />
<field name="required_options" type="boolean" />
<field name="created_at" type="date" />
<field name="updated_at" type="date" />
<field name="type_id" type="string" />
<field name="visibility" type="integer" />
...
In this file, you are also able to define custom properties of fields directly. Let's say how the SKU field is declared :
<field name="sku" type="string">
<isSearchable>1</isSearchable>
<isUsedInSpellcheck>1</isUsedInSpellcheck>
<defaultSearchAnalyzer>whitespace</defaultSearchAnalyzer>
</field>
Here you can define the following non-required properties :
- isSearchable (default to false) : if querying this index will search into this field
- isFilterable (default to true) : if the field can be used for filtering queries (then it will get indexed differently)
- isUsedInSpellcheck (default to false) : if the engine will check for exact matching in this field
- isUsedForSortBy (default to false) : if you plan to use this field to sort (then it will get indexed differently)
- searchWeight (default to 1) : the weight to give to this field when searching (default to 1)
- defaultSearchAnalyzer (default to standard) : we'll speak about this later in Custom analysis part.
You are also able to store some fields as objects or nested objects. We will not cover the difference between the two here, please refer to the Elasticsearch documentation to understand more these concepts.
Eg : stock is stored as an object field.
<field name="stock.is_in_stock" type="boolean" />
<field name="stock.qty" type="integer" />
Eg : Price is stored as a nested field.
<field name="price.price" type="double" nestedPath="price" />
<field name="price.original_price" type="double" nestedPath="price" />
<field name="price.is_discount" type="boolean" nestedPath="price" />
<field name="price.customer_group_id" type="integer" nestedPath="price" />
Eg : Category nested field with custom properties.
<field name="category.category_id" type="integer" nestedPath="category" />
<field name="category.position" type="integer" nestedPath="category" />
<field name="category.is_parent" type="boolean" nestedPath="category" />
<field name="category.name" type="string" nestedPath="category">
<isSearchable>1</isSearchable>
<isUsedInSpellcheck>1</isUsedInSpellcheck>
<isFilterable>0</isFilterable>
</field>
Ok, previous part about defining the mapping directly in XML was great, but this is not really compatible with evolutive data such as product attributes, which can be easily added/removed in the Back-Office. Their type can even be switched by users !
And, in fact, as you may have seen, the product attributes are not declared into our elasticsuite_indices.xml
file. Guess why ?
You remember the previous part about the DataSource, right ?
If your datasource is implementing Smile\ElasticsuiteCore\Api\Index\Mapping\DynamicFieldProviderInterface
, the engine will automatically detect it, and call the getFields()
method of your DataSource.
If you take a look at the methods getFields()
and initField()
located in Smile\ElasticsuiteCatalog\Model\Eav\Indexer\Fulltext\Datasource\AbstractAttributeData you will see that it does automatically the job to convert each attribute configuration (defined via the Magento's Back-Office) into an array of \Smile\ElasticsuiteCore\Api\Index\Mapping\FieldInterface
according to the values of each attribute settings (is_filterable
, is_searchable
, search_weight
and so on...)
You may implement the same logic if you plan to index custom EAV content or extensible data that does not come with a strongly-typed and irremovable structure.
Analysis is the logic which is applied to field values when they are sent to Elasticsearch. It allows to handle special characters, stem the words to their root, or even more.
If you are willing to customise this, it implies that you have already a solid knowledge about Elasticsearch (or Solr and Lucene) analyzers and filters.
You can read more about this topic in the Elasticsearch documentation
The list of available analyzers delivered by Elasticsuite is in the elasticsuite_analysis.xml
file of the ElasticsuiteCore module.
Since it's an xml file, it can be extended in your own modules to fit your needs.
The default list of analyzers and filters is quite enough to have the engine working properly on many languages and field types.
An analyzer is basically a combination of char_filters
and filters
. Let's see the standard
analyzer :
<analyzer name="standard" tokenizer="whitespace" language="default">
<filters>
<filter ref="lowercase" />
<filter ref="ascii_folding" />
<filter ref="trim" />
<filter ref="elision" />
<filter ref="word_delimiter" />
<filter ref="standard" />
</filters>
<char_filters>
<char_filter ref="html_strip" />
</char_filters>
</analyzer>
On the elasticsuite_indices.xml
you are able to define the defaultSearchAnalyzer
of a field. Remember the SKU example :
<field name="sku" type="string">
<isSearchable>1</isSearchable>
<isUsedInSpellcheck>1</isUsedInSpellcheck>
<defaultSearchAnalyzer>whitespace</defaultSearchAnalyzer>
</field>
The default analyzer for the SKU is whitespace
: it allows exact matching on the SKU. Using standard
analyzer for SKU previously resulted in Elasticsearch automatically splitting the value if having a mix of letter and numbers, or dashes, which is often the case with SKUs.
We already have a module for indexing CMS Pages, which is a quite good tutorial to learn how you can index and query for custom content in an external module.
This module is available here
-
User's Guide
-
Developer's Guide
-
Releases
- Magento 2.3.x
- Magento 2.2.x
- Magento 2.1.x
- ElasticSuite 2.5.15
- ElasticSuite 2.5.14
- ElasticSuite 2.5.13
- ElasticSuite 2.5.12
- ElasticSuite 2.5.11
- ElasticSuite 2.5.10
- ElasticSuite 2.5.9
- ElasticSuite 2.5.8
- ElasticSuite 2.5.7
- ElasticSuite 2.5.6
- ElasticSuite 2.5.5
- ElasticSuite 2.5.4
- ElasticSuite 2.5.3
- ElasticSuite 2.5.2
- ElasticSuite 2.5.1
- ElasticSuite 2.5.0
- ElasticSuite 2.4.6
- ElasticSuite 2.4.5
- ElasticSuite 2.4.4
- ElasticSuite 2.4.3
- ElasticSuite 2.4.2
- ElasticSuite 2.4.1
- ElasticSuite 2.4.0
- ElasticSuite 2.3.10
- ElasticSuite 2.3.9
- ElasticSuite 2.3.8
- ElasticSuite 2.3.7
- ElasticSuite 2.3.6
- ElasticSuite 2.3.5
- ElasticSuite 2.3.4
- ElasticSuite 2.3.3
- ElasticSuite 2.3.2
- ElasticSuite 2.3.1
- ElasticSuite 2.3.0
- Magento 2.0.x