-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHPORM-28 Add Scout engine to index into MongoDB Search #3205
base: 5.x
Are you sure you want to change the base?
Conversation
composer.json
Outdated
@@ -70,7 +71,8 @@ | |||
"laravel": { | |||
"providers": [ | |||
"MongoDB\\Laravel\\MongoDBServiceProvider", | |||
"MongoDB\\Laravel\\MongoDBBusServiceProvider" | |||
"MongoDB\\Laravel\\MongoDBBusServiceProvider", | |||
"MongoDB\\Laravel\\Scout\\ScoutServiceProvider" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class could be placed at the root of the package and given a name consistent with the other classes.
"MongoDB\\Laravel\\Scout\\ScoutServiceProvider" | |
"MongoDB\\Laravel\\MongoDBScoutServiceProvider" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would "ScoutServiceProvider" likely conflict with other short class names? There looks to be redundancy in both suggestions here, but I'll defer to whatever you think is most consistent.
It's been a couple of weeks since I picked up the Telescope PR but IIRC I opted to use a namespace there as there would be multiple classes. Seemed preferable than keeping a service provider in a separate namespace from other classes associated with the feature.
tests/Scout/ScoutEngineTest.php
Outdated
'fuzzy' => [ | ||
'maxEdits' => 2, | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fuzzy
is required to return results for partial matches (search for "lar" must return "Laravel Framework").
tests/Scout/ScoutEngineTest.php
Outdated
'path' => [ | ||
'wildcard' => '*', | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default we don't know on which fields to search.
src/Scout/ScoutEngine.php
Outdated
|
||
return $model->getScoutModelsByIds($builder, $ids) | ||
->filter(fn (Model $model) => in_array($model->getScoutKey(), $ids)) | ||
->sortBy(fn (Model $model) => array_search($model->getScoutKey(), $ids)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is suspicious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I didn't understood why the index was retrieved from an array, but now I understand that it's intended to sort in the initial order of the results. It's clever. That's copilot that refactored this code copied from MeilisearchEngine: https://github.com/laravel/scout/blob/13cb6460eb9802ff95a28dc11e178b1abebdcc02/src/Engines/MeilisearchEngine.php#L290
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't flipping the array (i.e. each $id
becomes a key) be more efficient to derive a sort order? That would replace repetitive array_search()
calls with HashTable lookups.
*/ | ||
public function delete($models): void | ||
{ | ||
assert($models instanceof Collection, new TypeError(sprintf('Argument #1 ($models) must be of type %s, %s given', Collection::class, get_debug_type($models)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert
calls compensate for the lack of native types in the parent class.
Integration tests require an atlas or local-atlas cluster to run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving an incomplete round of feedback. We'll need to schedule some time to actually go through this PR together if you need a meaningful review, as a lot of it is over my head.
} | ||
|
||
/** | ||
* This method must be overridden when the `getScoutKey` method is also overridden, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If getScoutKey()
is overridden, isn't it possible that it no longer calls getScoutKeyName()
at all? It's not clear to me why this was broken out to a separate method if it's only called from getScoutKey()
, and a user could just as easily override that on its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getScoutKeyName
is necessary for searching original model objects from the results of the search engine.
Example: if in a SQL database, the "Scout Key" is scout_key
, a search query will return a list of objects with the scout_key
property and the following query is executed: SELECT * FROM model WHERE scout_key IN (<ids>)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That answer is over my head. My question was more basic, as I was just looking for what might actually cal getScoutKeyName()
. I think I found my answer in the Searchable trait, though.
It'd have been helpful if any overridden methods mentioned that they were overriding something from the trait (maybe using @see
with a reference to the base method).
tests/Models/User.php
Outdated
]; | ||
} | ||
|
||
public function getAtlasSearchDefinition(): array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method called anywhere?
|
||
return $builder; | ||
}, | ||
array_replace_recursive($defaultPipeline, [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong, but in this case you're actually just replacing $defaultPipeline[0]
since there's an equality matching between key 0
in $defaultPipeline
and the lone key in the replacement array.
Basing this on my read of array_replace_recursive
.
Does this actually need to use the recursive variant if you're really just replacing an aggregation pipeline stage in its entirety?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm using array_replace_recursive
to replace the numeric key (instead of append in cas of array_merge
), and this adds the filter
key-value to $defaultPipeline['$search']['compound']
], $results->pluck('name', 'id')->all()); | ||
} | ||
|
||
public function testItCanUseBasicSearchToFetchKeys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this test seems redundant in light of testItCanUseBasicSearch()
?
*/ | ||
private function wait(Closure $callback): void | ||
{ | ||
$timeout = hrtime()[0] + self::WAIT_TIMEOUT_SEC; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely a very minor edge case, but hrtime()
could be unsupported and return false
:
- https://github.com/php/php-src/blob/php-8.4.1/ext/standard/basic_functions.stub.php#L2186
- https://github.com/php/php-src/blob/php-8.4.1/ext/standard/hrtime.c#L49
- https://github.com/php/php-src/blob/php-8.4.1/Zend/zend_hrtime.h#L57
But if static analysis doesn't care about this then perhaps you shouldn't either.
$mapping['fields']['__soft_deleted'] ??= [ | ||
'type' => 'number', | ||
'representation' => 'int64', | ||
'indexDoubles' => false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted this pertains to https://www.mongodb.com/docs/atlas/atlas-search/field-types/number-type/#configure-fts-field-type-field-properties
Consider including a comment that links to some docs on this, as it's not clear what this mapping means otherwise.
} | ||
|
||
/** | ||
* This method must be overridden when the `getScoutKey` method is also overridden, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That answer is over my head. My question was more basic, as I was just looking for what might actually cal getScoutKeyName()
. I think I found my answer in the Searchable trait, though.
It'd have been helpful if any overridden methods mentioned that they were overriding something from the trait (maybe using @see
with a reference to the base method).
public function toSearchableArray(): array | ||
{ | ||
return [ | ||
'id' => (int) $this->id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the $id
property documented as a string in the class docblock, but cast to an integer here? Is it expected to be an ObjectId within the database that's represented as a string in a hydrated model instance?
* | ||
* @param EloquentCollection $models | ||
* | ||
* @throws Exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is referring to the MongoDB\Exception\Exception
interface, but it'd be helpful to alias that for clarity. Why would it be the general interface instead of a RuntimeException, though?
If this is implemented correctly, I expect we wouldn't get any non-runtime exceptions from PHPC.
$collection = $this->getIndexableCollection($models); | ||
|
||
if ($this->softDelete && $this->usesSoftDelete($models)) { | ||
$models->each->pushSoftDeleteMetadata(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted that this calls the method in the Searchable trait.
private function usesSoftDelete(Model|EloquentCollection $model): bool | ||
{ | ||
if ($model instanceof EloquentCollection) { | ||
$model = $model->first(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to assume that SoftDeletes is used simply because the first model has it? I assume the trait is applied on the model class and an EloquentCollection can be assumed to always contain the same class. I dunno if Eloquent supports anything like Doctrine's multi-collection inheritance, but that might make it possible for there to be various classes here (considering an edge case where only some children utilize the trait).
But even if that were the case, would a single Scout collection cover all of the classes in that hierarchy? Perhaps not.
/** | ||
* Delete a "search index", i.e. a MongoDB collection. | ||
*/ | ||
public function deleteIndex($name): void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure where this would be called, but should there be some validation that it's actually an index name and not just an arbitrary MongoDB collection?
One sanity check would be to ensure it starts with $this->prefix
, although that's still a bit risky per my comment below on deleteAllIndexes()
.
src/Scout/ScoutEngine.php
Outdated
$collection = $this->mongodb->selectCollection($name); | ||
|
||
$collection = $this->mongodb->selectCollection($name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$collection = $this->mongodb->selectCollection($name); | |
$collection = $this->mongodb->selectCollection($name); | |
$collection = $this->mongodb->selectCollection($name); | |
src/Scout/ScoutEngine.php
Outdated
$collection = $this->mongodb->selectCollection($name); | ||
|
||
$collection = $this->mongodb->selectCollection($name); | ||
/** @todo accept configuration for the mapping */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this @todo
referring to the undocumented searchableMapping
method I asked about in getMapping()
?
src/Scout/ScoutEngine.php
Outdated
'typeMap' => ['root' => 'bson'], | ||
]); | ||
|
||
// Many indexes with the same name may exist on Atlas local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would multiple indexes with the same name exist? createSearchIndexes says:
You cannot create multiple indexes with the same name on a single collection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a bug with atlas local previously, but it seems to be resolved. When dropping a database without dropping the search indexes, the old indexes with were still listed for the new database.
} | ||
|
||
$objectIds = array_column($results, '_id'); | ||
$objectIdPositions = array_flip($objectIds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this was copied from another engine in the upstream library. Is this the more efficient approach to the array_search()
algorithm I saw earlier (re: sortBy()
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
Fix PHPORM-28
Checklist