Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHPORM-28 Add Scout engine to index into MongoDB Search #3205

Open
wants to merge 14 commits into
base: 5.x
Choose a base branch
from

Conversation

GromNaN
Copy link
Member

@GromNaN GromNaN commented Nov 13, 2024

Fix PHPORM-28

  • And "index" stands for a MongoDB collection with an Atlas Search index named "scout".
  • The collection name MUST be prefixed to prevent error of indexing in the same MongoDB Collection as the source documents.

Checklist

  • Add tests and ensure they pass

src/Scout/ScoutEngine.php Fixed Show fixed Hide fixed
src/Scout/ScoutEngine.php Fixed Show fixed Hide fixed
@GromNaN GromNaN changed the title PHPORM-28 Add Scout engined to index into MongoDB Search PHPORM-28 Add Scout engine to index into MongoDB Search Nov 14, 2024
src/Scout/ScoutEngine.php Fixed Show fixed Hide fixed
composer.json Outdated
@@ -70,7 +71,8 @@
"laravel": {
"providers": [
"MongoDB\\Laravel\\MongoDBServiceProvider",
"MongoDB\\Laravel\\MongoDBBusServiceProvider"
"MongoDB\\Laravel\\MongoDBBusServiceProvider",
"MongoDB\\Laravel\\Scout\\ScoutServiceProvider"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class could be placed at the root of the package and given a name consistent with the other classes.

Suggested change
"MongoDB\\Laravel\\Scout\\ScoutServiceProvider"
"MongoDB\\Laravel\\MongoDBScoutServiceProvider"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would "ScoutServiceProvider" likely conflict with other short class names? There looks to be redundancy in both suggestions here, but I'll defer to whatever you think is most consistent.

It's been a couple of weeks since I picked up the Telescope PR but IIRC I opted to use a namespace there as there would be multiple classes. Seemed preferable than keeping a service provider in a separate namespace from other classes associated with the feature.

Comment on lines 66 to 68
'fuzzy' => [
'maxEdits' => 2,
],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fuzzy is required to return results for partial matches (search for "lar" must return "Laravel Framework").

Comment on lines 63 to 65
'path' => [
'wildcard' => '*',
],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default we don't know on which fields to search.


return $model->getScoutModelsByIds($builder, $ids)
->filter(fn (Model $model) => in_array($model->getScoutKey(), $ids))
->sortBy(fn (Model $model) => array_search($model->getScoutKey(), $ids))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is suspicious.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I didn't understood why the index was retrieved from an array, but now I understand that it's intended to sort in the initial order of the results. It's clever. That's copilot that refactored this code copied from MeilisearchEngine: https://github.com/laravel/scout/blob/13cb6460eb9802ff95a28dc11e178b1abebdcc02/src/Engines/MeilisearchEngine.php#L290

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't flipping the array (i.e. each $id becomes a key) be more efficient to derive a sort order? That would replace repetitive array_search() calls with HashTable lookups.

*/
public function delete($models): void
{
assert($models instanceof Collection, new TypeError(sprintf('Argument #1 ($models) must be of type %s, %s given', Collection::class, get_debug_type($models))));
Copy link
Member Author

@GromNaN GromNaN Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert calls compensate for the lack of native types in the parent class.

src/Scout/ScoutEngine.php Fixed Show fixed Hide fixed
@GromNaN GromNaN marked this pull request as ready for review November 21, 2024 15:03
@GromNaN GromNaN requested a review from a team as a code owner November 21, 2024 15:03
@GromNaN GromNaN requested a review from jmikola November 21, 2024 15:03
@GromNaN
Copy link
Member Author

GromNaN commented Nov 21, 2024

Integration tests require an atlas or local-atlas cluster to run.
I need to find a way to test the result as the order is not the same for each reindexation.

Copy link
Member

@jmikola jmikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving an incomplete round of feedback. We'll need to schedule some time to actually go through this PR together if you need a meaningful review, as a lot of it is over my head.

tests/Models/SearchableModel.php Outdated Show resolved Hide resolved
}

/**
* This method must be overridden when the `getScoutKey` method is also overridden,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If getScoutKey() is overridden, isn't it possible that it no longer calls getScoutKeyName() at all? It's not clear to me why this was broken out to a separate method if it's only called from getScoutKey(), and a user could just as easily override that on its own.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getScoutKeyName is necessary for searching original model objects from the results of the search engine.
Example: if in a SQL database, the "Scout Key" is scout_key, a search query will return a list of objects with the scout_key property and the following query is executed: SELECT * FROM model WHERE scout_key IN (<ids>).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That answer is over my head. My question was more basic, as I was just looking for what might actually cal getScoutKeyName(). I think I found my answer in the Searchable trait, though.

It'd have been helpful if any overridden methods mentioned that they were overriding something from the trait (maybe using @see with a reference to the base method).

];
}

public function getAtlasSearchDefinition(): array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method called anywhere?


return $builder;
},
array_replace_recursive($defaultPipeline, [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, but in this case you're actually just replacing $defaultPipeline[0] since there's an equality matching between key 0 in $defaultPipeline and the lone key in the replacement array.

Basing this on my read of array_replace_recursive.

Does this actually need to use the recursive variant if you're really just replacing an aggregation pipeline stage in its entirety?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm using array_replace_recursive to replace the numeric key (instead of append in cas of array_merge), and this adds the filter key-value to $defaultPipeline['$search']['compound']

tests/Scout/ScoutEngineTest.php Outdated Show resolved Hide resolved
], $results->pluck('name', 'id')->all());
}

public function testItCanUseBasicSearchToFetchKeys()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test seems redundant in light of testItCanUseBasicSearch()?

*/
private function wait(Closure $callback): void
{
$timeout = hrtime()[0] + self::WAIT_TIMEOUT_SEC;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely a very minor edge case, but hrtime() could be unsupported and return false:

But if static analysis doesn't care about this then perhaps you shouldn't either.

src/Scout/ScoutEngine.php Outdated Show resolved Hide resolved
$mapping['fields']['__soft_deleted'] ??= [
'type' => 'number',
'representation' => 'int64',
'indexDoubles' => false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted this pertains to https://www.mongodb.com/docs/atlas/atlas-search/field-types/number-type/#configure-fts-field-type-field-properties

Consider including a comment that links to some docs on this, as it's not clear what this mapping means otherwise.

src/Scout/ScoutEngine.php Outdated Show resolved Hide resolved
}

/**
* This method must be overridden when the `getScoutKey` method is also overridden,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That answer is over my head. My question was more basic, as I was just looking for what might actually cal getScoutKeyName(). I think I found my answer in the Searchable trait, though.

It'd have been helpful if any overridden methods mentioned that they were overriding something from the trait (maybe using @see with a reference to the base method).

public function toSearchableArray(): array
{
return [
'id' => (int) $this->id,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the $id property documented as a string in the class docblock, but cast to an integer here? Is it expected to be an ObjectId within the database that's represented as a string in a hydrated model instance?

*
* @param EloquentCollection $models
*
* @throws Exception
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is referring to the MongoDB\Exception\Exception interface, but it'd be helpful to alias that for clarity. Why would it be the general interface instead of a RuntimeException, though?

If this is implemented correctly, I expect we wouldn't get any non-runtime exceptions from PHPC.

$collection = $this->getIndexableCollection($models);

if ($this->softDelete && $this->usesSoftDelete($models)) {
$models->each->pushSoftDeleteMetadata();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted that this calls the method in the Searchable trait.

private function usesSoftDelete(Model|EloquentCollection $model): bool
{
if ($model instanceof EloquentCollection) {
$model = $model->first();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to assume that SoftDeletes is used simply because the first model has it? I assume the trait is applied on the model class and an EloquentCollection can be assumed to always contain the same class. I dunno if Eloquent supports anything like Doctrine's multi-collection inheritance, but that might make it possible for there to be various classes here (considering an edge case where only some children utilize the trait).

But even if that were the case, would a single Scout collection cover all of the classes in that hierarchy? Perhaps not.

/**
* Delete a "search index", i.e. a MongoDB collection.
*/
public function deleteIndex($name): void
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where this would be called, but should there be some validation that it's actually an index name and not just an arbitrary MongoDB collection?

One sanity check would be to ensure it starts with $this->prefix, although that's still a bit risky per my comment below on deleteAllIndexes().

Comment on lines 382 to 384
$collection = $this->mongodb->selectCollection($name);

$collection = $this->mongodb->selectCollection($name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$collection = $this->mongodb->selectCollection($name);
$collection = $this->mongodb->selectCollection($name);
$collection = $this->mongodb->selectCollection($name);

$collection = $this->mongodb->selectCollection($name);

$collection = $this->mongodb->selectCollection($name);
/** @todo accept configuration for the mapping */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this @todo referring to the undocumented searchableMapping method I asked about in getMapping()?

'typeMap' => ['root' => 'bson'],
]);

// Many indexes with the same name may exist on Atlas local
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would multiple indexes with the same name exist? createSearchIndexes says:

You cannot create multiple indexes with the same name on a single collection.

Copy link
Member Author

@GromNaN GromNaN Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a bug with atlas local previously, but it seems to be resolved. When dropping a database without dropping the search indexes, the old indexes with were still listed for the new database.

}

$objectIds = array_column($results, '_id');
$objectIdPositions = array_flip($objectIds);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this was copied from another engine in the upstream library. Is this the more efficient approach to the array_search() algorithm I saw earlier (re: sortBy())?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants