-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force live object when adding to the index #91
Conversation
5e1579b
to
43c744b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not seen this issue, but code look good to me thank ❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally and works as expected 👍
@blueo Since we got 2 PRs for this module, I wonder which one should go to the finish line first and that might mean the other one will have to rebase as I also used the |
I think I'll update the title for this - after a bit of discussion with @ssmarco it looks like there is a better way to solve this. A possible fix is via the indexer eg if ($document instanceof DataObjectDocumentInterface) {
// Making sure we get the Live version of the DataObject before indexing the document
Versioned::withVersionedMode(static function () use ($document): void {
Versioned::set_stage(Versioned::LIVE);
$dataObject = $document->getDataObject();
$liveDataObject = DataObject::get($dataObject->ClassName)->byID($dataObject->ID);
$document->setDataObject($liveDataObject);
});
} could be added prior to adding a document in Another is to update the |
This corrects behaviour that potentially allows indexing of draft content. Queues are normally run via a "dev task" which is run in the DevelopmentAdmin context. This controller will set the reading mode to DRAFT. The upshot of this is that an Index job will unserialise a DataObject in DRAFT reading mode (saved as a Data Query param) and the subsequent `toArray` call could get draft content. This is often seen as a link indexed with `?stage=Stage` parameters. The impact is likely limited because an IndexJob is usually queued by a publish action and run shortly afterwards. In these cases it is unlikely that draft changes are present - but is isn't impossible. I checked the ReIndexJob run and it doesn't appear to have this issue because it sets the reading mode in the setup method. I have instead added this to the unserialise function as it seemed a safer method to ensure the reading mode was as expected.
6588580
to
e95d234
Compare
@blueo Have retested this again manually and working as expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm struggling a little bit to replicate the original issue.
Note: I'm using Elasticsearch, not Enterprise Search (just in case that becomes relevant to my difficulties).
- Checkout our current
master
branch - Config applied in project (below),
Page
includes alink
field (which points to theLink()
method)
---
Name: app-search
---
SilverStripe\SearchService\Service\IndexConfiguration:
auto_dependency_tracking: true
crawl_page_content: false
common_fields: &common_fields
...
link:
property: Link
options:
type: keyword
indexes:
main:
includeClasses:
Page:
fields:
<<: *common_fields
...
- Publish a page
- Run the
Search service adding 1 documents
Job - Check index for added document
Expected result: link
field contains ?stage=Stage
Actual result: link
field does not contain ?stage=Stage
That said, the code changes themselves make a lot of sense to me.
// Dataobject is only in draft | ||
if ($dataObject->hasExtension(Versioned::class) && !$dataObject->isLiveVersion()) { | ||
// DataObject has no published version (or draft changes could cause a doc to be removed) | ||
if ($dataObject->hasExtension(Versioned::class) && !$dataObject->isPublished()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling that your fix below will solve this edge case. The check for isLiveVersion
does check the _Live
table, so now that you've made sure the DataObject
is consistently the LIVE record, I think this would work.
That said, it makes more practical sense to me for us to be checking isPublished()
, as I think even if you have a different LIVE version, you would still want to reindex (and you certainly wouldn't want to remove it from the index), so I'm more than happy to make this switch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change to the Before Add event will actually run after this check. So its possible that you can add a DO that is the latest version, but it is draft (eg you published and then saved some draft changes). In this case the most recent (read draft) DO will be fetched by the _unserialize function because dev tasks run in 'draft' reading mode. Then the isLiveVersion will fail (as its not the live version, it is a version after) and the indexer will remove it.
I don't think we can always make the attached dataobject Live because we need a record for deleted versions (or we'd need to rethink how deletions are done)
This change tries to make a consistent way of ensuring a dataobject can only be 'Live' when adding to the index but may be in another state for when it needs to be removed. It does this by using the BEFORE_ADD event to re-fetch a dataobject in "live" mode. This means a non-live dataobject can be given to DataObjectDocument for use when removing or otherwise checking if an object should be indexed - however when adding to an index, it will always use a live object.
Key Changes
shouldIndex
has been changed to useisPublished
instead ofisLiveVersion
. This means you can pass a DataObject that is not live and checks will still allow it to be indexed if there is a 'live' version available. There is a current edge case where you could have a published object but the 'draft' version is the most recent - in this case a dataobject will be removed from the index even though it has a live version.onAddToSearchIndexes
has had logic added to theBEFORE_ADD
event to refetch a document in 'live' mode. This ensures only published content will go to an index as this event is called immediately prior toprocessNode
in the indexer.Notes
Initially I thought this would allow indexing of draft content. Queues are normally run via a "dev task" which is run in the DevelopmentAdmin context. This controller will set the reading mode to DRAFT. The upshot of this is that an Index job will unserialise a DataObject in DRAFT reading mode (saved as a Data Query param) and the subsequent
toArray
call could get draft content. This is often seen as a link indexed with?stage=Stage
parameters. However becauseshouldIndex
was checking that it was the Live version - this content would always have been the same as the published object.