You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.
Describe the bug
If a snapshot action is applied via a policy, when that action runs, it will fail to work properly. The same error exists whether using a local FS or using S3, but it's easier to reproduce the issue running locally in docker-compose snapshotting to a local FS.
To Reproduce
Simple docker-compose file to create a cluster
Select Management -> Index Management -> Index Policies, and paste above policy in, and name it something like test_policy.
8. Apply the above policy to the existing test index.
Select Management -> Index Management -> Indices, and then select the sample test index (kibana_sample_data_flights), Click on Apply Policy, and then in the dropdown, select the policy you created and named in the prior step.
9. Wait for the policy to be applied to the index, initiate the snapshot action, and then fail.
Expected behavior
The snapshot should complete and not error out.
The log output from the snapshot failure is:
es-node2 | [2021-06-04T21:05:01,767][INFO ][c.a.o.i.i.ManagedIndexRunner] [es-node2] Executing attempt_snapshot for kibana_sample_data_flights
es-node2 | [2021-06-04T21:05:01,790][INFO ][o.e.s.SnapshotsService ] [es-node2] snapshot [local_repository:snapshot-2021.06.04-21:05:01.768/uKIFe5apSQuPWv6tgvJDHA] started
es-node2 | [2021-06-04T21:05:01,790][INFO ][c.a.o.i.i.ManagedIndexRunner] [es-node2] Finished executing attempt_snapshot for kibana_sample_data_flights
es-node2 | [2021-06-04T21:05:01,811][WARN ][o.e.s.SnapshotShardsService] [es-node2] [[kibana_sample_data_flights][0]][local_repository:snapshot-2021.06.04-21:05:01.768/uKIFe5apSQuPWv6tgvJDHA] failed to snapshot shard
es-node2 | java.nio.file.NoSuchFileException: /mnt/snapshots/indices/Bw2it5LsRzmdIXZs20ZvYw/0/index-PyEmzKxkQlqc62OsbfTmYA
es-node2 | at sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) ~[?:?]
es-node2 | at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
es-node2 | at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) ~[?:?]
es-node2 | at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219) ~[?:?]
es-node2 | at java.nio.file.Files.newByteChannel(Files.java:375) ~[?:?]
es-node2 | at java.nio.file.Files.newByteChannel(Files.java:426) ~[?:?]
es-node2 | at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:420) ~[?:?]
es-node2 | at java.nio.file.Files.newInputStream(Files.java:160) ~[?:?]
es-node2 | at org.elasticsearch.common.blobstore.fs.FsBlobContainer.readBlob(FsBlobContainer.java:155) ~[elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.readBlob(ChecksumBlobStoreFormat.java:131) ~[elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.read(ChecksumBlobStoreFormat.java:117) ~[elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.repositories.blobstore.BlobStoreRepository.buildBlobStoreIndexShardSnapshots(BlobStoreRepository.java:2280) ~[elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.repositories.blobstore.BlobStoreRepository.snapshotShard(BlobStoreRepository.java:1811) [elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:340) [elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.snapshots.SnapshotShardsService.lambda$startNewShards$1(SnapshotShardsService.java:256) [elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:651) [elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
es-node2 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
es-node2 | at java.lang.Thread.run(Thread.java:832) [?:?]
es-node2 | [2021-06-04T21:05:01,942][WARN ][o.e.s.SnapshotsService ] [es-node2] [local_repository:snapshot-2021.06.04-21:05:01.768/uKIFe5apSQuPWv6tgvJDHA] failed to finalize snapshot
es-node2 | org.elasticsearch.snapshots.SnapshotException: [local_repository:snapshot-2021.06.04-21:05:01.768/uKIFe5apSQuPWv6tgvJDHA] failed to update snapshot in repository
es-node2 | at org.elasticsearch.repositories.blobstore.BlobStoreRepository.lambda$finalizeSnapshot$36(BlobStoreRepository.java:1019) [elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.action.ActionListener$1.onFailure(ActionListener.java:71) [elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.repositories.blobstore.BlobStoreRepository$8.onResponse(BlobStoreRepository.java:1663) [elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.repositories.blobstore.BlobStoreRepository$8.onResponse(BlobStoreRepository.java:1660) [elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.repositories.blobstore.BlobStoreRepository$4.clusterStateProcessed(BlobStoreRepository.java:1416) [elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.cluster.service.MasterService$SafeClusterStateTaskListener.clusterStateProcessed(MasterService.java:534) [elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.cluster.service.MasterService$TaskOutputs.lambda$processedDifferentClusterState$1(MasterService.java:421) [elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at java.util.ArrayList.forEach(ArrayList.java:1510) [?:?]
es-node2 | at org.elasticsearch.cluster.service.MasterService$TaskOutputs.processedDifferentClusterState(MasterService.java:421) [elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.cluster.service.MasterService.onPublicationSuccess(MasterService.java:281) [elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.cluster.service.MasterService.publish(MasterService.java:273) [elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:250) [elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:73) [elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:151) [elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:651) [elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
es-node2 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
es-node2 | at java.lang.Thread.run(Thread.java:832) [?:?]
es-node2 | Caused by: org.elasticsearch.repositories.RepositoryException: [local_repository] concurrent modification of the index-N file, expected current generation [3] but it was not found in the repository
es-node2 | at org.elasticsearch.repositories.blobstore.BlobStoreRepository.ensureSafeGenerationExists(BlobStoreRepository.java:1657) ~[elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.repositories.blobstore.BlobStoreRepository.lambda$writeIndexGen$54(BlobStoreRepository.java:1575) ~[elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:63) [elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.common.util.concurrent.ListenableFuture$1.doRun(ListenableFuture.java:112) ~[elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:226) ~[elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.common.util.concurrent.ListenableFuture.notifyListener(ListenableFuture.java:106) ~[elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.common.util.concurrent.ListenableFuture.lambda$done$0(ListenableFuture.java:98) ~[elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at java.util.ArrayList.forEach(ArrayList.java:1510) ~[?:?]
es-node2 | at org.elasticsearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:98) ~[elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.common.util.concurrent.BaseFuture.set(BaseFuture.java:144) ~[elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.common.util.concurrent.ListenableFuture.onResponse(ListenableFuture.java:127) ~[elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.action.StepListener.innerOnResponse(StepListener.java:62) ~[elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.action.NotifyOnceListener.onResponse(NotifyOnceListener.java:40) ~[elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.repositories.blobstore.BlobStoreRepository.lambda$writeIndexGen$52(BlobStoreRepository.java:1564) ~[elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73) ~[elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:710) ~[elasticsearch-7.9.1.jar:7.9.1]
es-node2 | at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.9.1.jar:7.9.1]
es-node2 | ... 3 more
And, if you look in the folder, the file does not exist. This feels like a race between when the filename is generated, and when the contents are written, but I'm just guessing.
Note, as mentioned, almost identical errors are produced when the same steps are applied to an S3 repository. Manual snapshots work fine, but snapshots created as an action fail everytime.
Desktop (please complete the following information):
OS: MacOS (although, we're running in docker)
Chrome (shouldn't atter)
Version: 1.11.0 (latest version deployed in AWS)
Additional context
the docker-compose file is based on the examples provided below, and the steps used are copied almost verbatim from the examples provided.
Describe the bug
If a snapshot action is applied via a policy, when that action runs, it will fail to work properly. The same error exists whether using a local FS or using S3, but it's easier to reproduce the issue running locally in docker-compose snapshotting to a local FS.
To Reproduce
Simple docker-compose file to create a cluster
docker compose up -d
, and then wait for the cluster to startup.(It should return
{ "acknowledged" : true}
.)(It should return
{ "accepted" : true}
.)5. Verify the contents in the snapshots folder where docker-compose was run.
Select Management -> Index Management -> Index Policies, and paste above policy in, and name it something like test_policy.
8. Apply the above policy to the existing test index.
Select Management -> Index Management -> Indices, and then select the sample test index (
kibana_sample_data_flights
), Click onApply Policy
, and then in the dropdown, select the policy you created and named in the prior step.9. Wait for the policy to be applied to the index, initiate the
snapshot
action, and then fail.Expected behavior
The snapshot should complete and not error out.
The log output from the snapshot failure is:
And, if you look in the folder, the file does not exist. This feels like a race between when the filename is generated, and when the contents are written, but I'm just guessing.
Note, as mentioned, almost identical errors are produced when the same steps are applied to an S3 repository. Manual snapshots work fine, but snapshots created as an action fail everytime.
Desktop (please complete the following information):
Additional context
the docker-compose file is based on the examples provided below, and the steps used are copied almost verbatim from the examples provided.
The text was updated successfully, but these errors were encountered: