Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPO 659: Mix read/write with md5 checking halts all I/O on all metalLB IP's and does not recover unless noobaa pod restart #6934

Closed
MonicaLemay opened this issue Apr 8, 2022 · 51 comments
Labels

Comments

@MonicaLemay
Copy link

MonicaLemay commented Apr 8, 2022

Environment info

  • NooBaa Version:
    [root@c83f1-app1 ~]# noobaa status
    INFO[0000] CLI version: 5.9.2
    INFO[0000] noobaa-image: noobaa/noobaa-core:nsfs_backport_5.9-20220331
    INFO[0000] operator-image: quay.io/rhceph-dev/odf4-mcg-rhel8-operator@sha256:01a31a47a43f01c333981056526317dfec70d1072dbd335c8386e0b3f63ef052
    INFO[0000] noobaa-db-image: quay.io/rhceph-dev/rhel8-postgresql-12@sha256:98990a28bec6aa05b70411ea5bd9c332939aea02d9d61eedf7422a32cfa0be54
  • Platform:
    [root@c83f1-app1 ~]# oc get csv
    NAME DISPLAY VERSION REPLACES PHASE
    mcg-operator.v4.9.5 NooBaa Operator 4.9.5 mcg-operator.v4.9.4 Succeeded
    ocs-operator.v4.9.5 OpenShift Container Storage 4.9.5 ocs-operator.v4.9.4 Succeeded
    odf-operator.v4.9.5 OpenShift Data Foundation 4.9.5 odf-operator.v4.9.4 Succeeded

Actual behavior

This is not the same as issue 6930.

In this issue that I am opening, it is true that the node remained in the Ready state so I don't expect any IP failover. This defect is not about metallb IP's not failing over. In this defect, I/O was running to metallb IP 172.20.100.31 which is for node master1. On node master0 , in the CNSA scale core pod, (namespace ibm-spectrum-scale), mmshutdown was issued for just that node. The other nodes remained active and with the filesystem mounted. Master0 has metallb IP 172.20.100.30. There was no I/O going to that IP.

What was observed after mmshutdown on master0 was that all I/O going to 172.20.100.31 stopped. Because of issue 6930, there was no failover. That is fine and expected. But what is not expected is for all I/O to stop.

sh-4.4# date; mmshutdown
Tue Apr  5 17:10:28 UTC 2022
Tue Apr  5 17:10:28 UTC 2022: mmshutdown: Starting force unmount of GPFS file systems
Tue Apr  5 17:10:34 UTC 2022: mmshutdown: Shutting down GPFS daemons
Shutting down!

When mmshutdown was issued, the noobaa endpoint pods only error was Stale file handle

Logs show stale file handle

pod/noobaa-endpoint-7fdb5b75fd-t99nd/endpoint] Apr-5 17:12:26.930 [Endpoint/14] [ERROR] core.endpoint.s3.s3_rest:: S3 ERROR <?xml version="1.0" encoding="UTF-8"?><Error><Code>InternalError</Code><Message>We encountered an internal error. Please try again.</Message><Resource>/s5001b85</Resource><RequestId>l1mefzt6-3wj6yz-8x</RequestId></Error> PUT /s5001b85 {"host":"172.20.100.30","accept-encoding":"identity","user-agent":"aws-cli/2.3.2 Python/3.8.8 Linux/4.18.0-240.el8.x86_64 exe/x86_64.rhel.8 prompt/off command/s3.mb","x-amz-date":"20220405T171225Z","x-amz-content-sha256":"61d056dc66f1882c0f4053be381523a7a28d384abde04fcf5b0021c716bb0ea1","authorization":"AWS4-HMAC-SHA256 Credential=QzhyXj9wVDH9DvnK97L9/20220405/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=6d9fa5c22501bfed4f312ac47621b6cec691bf1cf8f719e8250fcdc0f61522f1","content-length":"154"} Error: Stale file handle
[pod/noobaa-endpoint-7fdb5b75fd-t99nd/endpoint] Apr-5 17:12:27.589 [Endpoint/14]    [L0] core.sdk.bucketspace_nb:: could not create underlying directory - nsfs, deleting bucket [Error: Stale file handle] { code: 'Unknown system error -116' }
[pod/noobaa-endpoint-7fdb5b75fd-t99nd/endpoint] Apr-5 17:12:27.793 [Endpoint/14] [ERROR] core.endpoint.s3.s3_rest:: S3 ERROR <?xml version="1.0" encoding="UTF-8"?><Error><Code>InternalError</Code><Message>We encountered an internal error. Please try again.</Message><Resource>/s5001b85</Resource><RequestId>l1meg1if-7apxvy-1bas</RequestId></Error> PUT /s5001b85 {"host":"172.20.100.30","accept-encoding":"identity","user-agent":"aws-cli/2.3.2 Python/3.8.8 Linux/4.18.0-240.el8.x86_64 exe/x86_64.rhel.8 prompt/off command/s3.mb","x-amz-date":"20220405T171227Z","x-amz-content-sha256":"61d056dc66f1882c0f4053be381523a7a28d384abde04fcf5b0021c716bb0ea1","authorization":"AWS4-HMAC-SHA256 Credential=QzhyXj9wVDH9DvnK97L9/20220405/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=bb34c5aa7ed665ae73a2d172ab33d4056c2611ad2c09331ece80089cff46df05","content-length":"154"} Error: Stale file handle
[root@c83f1-app1 ~

This error is a bit odd because it is on the endpoint pod that was for master0. Master0's metallb IP was 172.20.100.30. Cosbench workload was only set up for 172.20.100.31.

An additional observation is that s3 command for list will work but not for write.

[root@c83f1-dan4 RW_workloads]# date; s5001_2_31 ls
Tue Apr  5 18:33:40 EDT 2022
2022-04-05 18:33:43 s5001b100
2022-04-05 18:33:43 s5001b63
2022-04-05 18:33:43 s5001b62
2022-04-05 18:33:43 s5001b61


[root@c83f1-dan4 RW_workloads]# date;  s5001_2_31 cp alias_commands s3://s5001b1
Tue Apr  5 18:35:36 EDT 2022
^C^Cfatal error:
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "concurrent/futures/thread.py", line 40, in _python_exit
  File "threading.py", line 1011, in join
  File "threading.py", line 1027, in _wait_for_tstate_lock
KeyboardInterrupt
[root@c83f1-dan4 RW_workloads]# date
Tue Apr  5 18:36:43 EDT 2022
[root@c83f1-dan4 RW_workloads]#

All future PUT to 172.20.100.31 and 172.20.100.32, get a timeout (if I don't CTL-C) and the endpoint pods record a "Error: Semaphore Timeout"
From 31 and 32, we can do GET and we can read from the Noobaa database. If we rsh into the endpoint pods for the IP's of 172.20100.31 and 172.20.100.32, we see that Spectrum Scale is still mounted in the correct place and we can write to it manually with touch file. So, this tells us that the IP's 31 and 32 are still alive and that the noobaa db is still online. It also tells us that the Spectrum Scale filesystem is still mounted and writable. The timeout on the subsequent PUT tell us that it makes a connection request but never gets a response.

The endpoint pods never restarted and they sill have their labels.

Also, In the scale core pod we run mmhealth node show -N all and we see that everything is HEALTHY, except of course on the one node that we did mmshutdown.

  sh-4.4# mmhealth node show

Node name:      master1-daemon
Node status:    HEALTHY
Status Change:  1 day ago

Component      Status        Status Change     Reasons & Notices
----------------------------------------------------------------
GPFS           HEALTHY       1 day ago         -
GUI            HEALTHY       1 day ago         -
NETWORK        HEALTHY       9 days ago        -
FILESYSTEM     HEALTHY       9 days ago        -
NOOBAA         HEALTHY       26 min. ago       -
PERFMON        HEALTHY       1 day ago         -
THRESHOLD      HEALTHY       9 days ago        -
sh-4.4# set -o vi
sh-4.4# mmhealth node show -N all

Node name:      master0-daemon
Node status:    FAILED
Status Change:  36 min. ago

Component      Status        Status Change     Reasons & Notices
--------------------------------------------------------------------------------
GPFS           FAILED        36 min. ago       gpfs_down, quorum_down
NETWORK        HEALTHY       1 day ago         -
FILESYSTEM     DEPEND        36 min. ago       unmounted_fs_check(remote-sample)
PERFMON        HEALTHY       1 day ago         -
THRESHOLD      HEALTHY       1 day ago         -

Node name:      master1-daemon
Node status:    HEALTHY
Status Change:  1 day ago

Component      Status        Status Change     Reasons & Notices
----------------------------------------------------------------
GPFS           HEALTHY       1 day ago         -
GUI            HEALTHY       1 day ago         -
NETWORK        HEALTHY       9 days ago        -
FILESYSTEM     HEALTHY       9 days ago        -
NOOBAA         HEALTHY       27 min. ago       -
PERFMON        HEALTHY       1 day ago         -
THRESHOLD      HEALTHY       9 days ago        -

Node name:      master2-daemon
Node status:    HEALTHY
Status Change:  1 day ago

Component       Status        Status Change     Reasons & Notices
-----------------------------------------------------------------
CALLHOME        HEALTHY       9 days ago        -
GPFS            HEALTHY       1 day ago         -
NETWORK         HEALTHY       9 days ago        -
FILESYSTEM      HEALTHY       9 days ago        -
GUI             HEALTHY       3 days ago        -
HEALTHCHECK     HEALTHY       9 days ago        -
PERFMON         HEALTHY       9 days ago        -
THRESHOLD       HEALTHY       9 days ago        -

Something is obviously hung in the PUT connection but logs and noobaa health don't point to anything.
When we issue mmstartup the PUT's still fail. The only way to recover is to delete the Noobaa endpoint pods and have new ones generated again.

I have been able to recreate this very easily so if it is required I can set this up on my test stand

Expected behavior

1.When doing mmshutdown on one node, it should not impact cluster wide I/O capability. It should not be an outage. If indeed an outage is expected, then mmstartup should recover I/O capability.

Steps to reproduce

  1. Start a cosbench run. I can provide the xml if needed. Once I/O is running, issue mmshutdown from within one CNSA Ibm Spectrum Scale core pod.

More information - Screenshots / Logs / Other output

Must gather and noobaa diagnose in https://ibm.ent.box.com/folder/145794528783?s=uueh7fp424vxs2bt4ndrnvh7uusgu6tocd

This issue started as HPO https://github.ibm.com/IBMSpectrumScale/hpo-core/issues/659 Screeners determined that it was with Nooobaa. I have also slacked the CNSA team for input but have not heard back.

@MonicaLemay MonicaLemay changed the title cnsa mmshutdown on one node halts all I/O on all metalLB IP's and does not recover on mmstartup but on noobaa pod restart HPO 659: cnsa mmshutdown on one node halts all I/O on all metalLB IP's and does not recover on mmstartup but on noobaa pod restart Apr 8, 2022
@MonicaLemay
Copy link
Author

I need to amend the discovery of this defect. It has nothing to do with CNSA mmshutdown on a scale core pod. The trigger for this is to have s3 service with MD5 enabled and cosbench workload with mixed read and write while doing hashCheck=true.

[root@c83f1-dan4 ~]# cat /root/cosbench/workloads/RW_workloads/rand_read_write_delete_GB_1hour/s5001_GB_random_size_num_of_objects.xml
<?xml version="1.0" encoding="UTF-8" ?>
<workload name="GBrandomsizenumobj" description="GBrandom read and write workload">
  <storage type="s3" config="accesskey=QzhyXj9wVDH9DvnK97L9;secretkey=xTwEhfGB/wRXZbk7Ymsq2fYwl52AriraQPVzrZKF;path_style_access=true;endpoint=http://metallb2:80"/>

  <workflow>

   <workstage name="prepare_rGB">
      <work name="prepare_rGB" workers="256" interval="10" rampup="0" type="prepare" ratio="100" config="cprefix=s5001b;oprefix=s5001o;osuffix=_rGB;containers=r(1,4);objects=r(1,20);sizes=r(2,3)GB;hashCheck=true"/>
    </workstage>

    <workstage name="rGB_write_read_delete">
      <work name="rGB_write_read_delete" workers="256" interval="10" rampup="0" runtime="300">
        <operation type="write" ratio="50" config="cprefix=s5001b;oprefix=s5001o;osuffix=_rGB;containers=r(5,6);objects=r(1,20);sizes=u(2,3)GB;hashCheck=true"/>
        <operation type="read" ratio="50" config="cprefix=s5001b;oprefix=s5001o;osuffix=_rGB;containers=r(1,2);objects=r(1,20);hashCheck=true"/>
      </work>


  </workflow>
</workload>

I will see that the cosbench workload fails. When I fails, a manual s3 cp times out, while s3 ls still works.

@MonicaLemay
Copy link
Author

MonicaLemay commented Apr 11, 2022

Can we make this a sev 1 defect? I think this is a critical problem for us. Could this be a noobaa db deadlock. I don't know the commands to check for noobaa db deadlock

@MonicaLemay MonicaLemay changed the title HPO 659: cnsa mmshutdown on one node halts all I/O on all metalLB IP's and does not recover on mmstartup but on noobaa pod restart HPO 659: Mix read/write with md5 checking halts all I/O on all metalLB IP's and does not recover unless noobaa pod restart Apr 11, 2022
@romayalon
Copy link
Contributor

@MonicaLemay
Are you able to curl to 172.20.100.31/172.20.100.32?

@MonicaLemay
Copy link
Author

Curl works when the condition exists. The IP's are reachable.

[root@c83f1-dan4 ~]# curl 172.20.100.31
<?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><Message>Access Denied</Message><Resource>/</Resource><RequestId>l1w6sj05-996mt5-tt8</RequestId></Error>[root@c83f1-dan4 ~]# [

Also s3 ls will work and will list the buckets. The s3 cp times out so no PUT works.

@romayalon
Copy link
Contributor

romayalon commented Apr 12, 2022

@MonicaLemay

Few observations:

  1. The endpoint pods that are attached to master0 are: noobaa-endpoint-7fdb5b75fd-t99nd, noobaa-endpoint-7fdb5b75fd-5k4c2
  2. Stale file handle error - I searched for it in all the endpoint pods logs and could only find it in these: noobaa-endpoint-7fdb5b75fd-t99nd, noobaa-endpoint-7fdb5b75fd-5k4c2 - which are the pods that are attached to master0.
    according to google:
A filehandle becomes stale whenever the file or directory referenced by the handle is removed by another host, while your client still holds an active reference to the object

I feel this stale file handle error is expected, but maybe i'm missing something?

  1. I searched for alias_commands (the object that you tried to cp) in all endpoint pods logs and couldn't find it - I suspect NooBaa never got this request and that's why the response never return and you get a timeout - NSFS debug level would help with checking it.
  2. The fact that ls is working doesn't really say anything because NooBaa throws Slow Down and it's not a timeout.
  3. cosbench failing on PUT requests - on the functional endpoint pods I see that NooBaa returns Slow down - which makes sense to me because now there are only 4 endpoint pods, each has 3Gi memory - and 0.75 Gi is the limit for NSFS buffers. I see that you are uploading large files, can you try it with smaller files and check if you still get this Slow Down error?

@MonicaLemay
Copy link
Author

MonicaLemay commented Apr 12, 2022

@romayalon Thanks Romy for the comment. I'm not as worried about Stale File handle and Cosbench slow down. What concerns me the most is after I hit this condition, I kill cosbench workload. After that I cannot do s3 cp . I can wait 12 hours and still cannot do an s3 cp file. I have to restart the noobaa pods. After I restart noobaa pods, I can do s3 cp

@romayalon
Copy link
Contributor

@MonicaLemay So can you please check again the cp command connection details (specifically the endpoint address) and check if the requests were received in the endpoint pods? A Higher debug level is needed here probably.

@MonicaLemay
Copy link
Author

MonicaLemay commented Apr 13, 2022

Did some follow-up investigation asked for by Romy. In the process, I have refined the trigger.
So, in summary 5 things:

  1. Cosbench will give a java error when the cosbench xml file has hashcheck=true AND file sizes are 3gig or bigger. In some ways, I don't consider this to be a noobaa problem . This is probably something with cosbench. When the java error occurs the file does not get sent. This can't be noobaa's fault. I don't think we need to persue the java errors from cosbench. When files are 3 gig and bigger but hashCheck is not set everything works fine.

  2. Cosbench termination of workload or lettting it complete, w/out hashcheck=true AND 3 gig or bigger is OK. s3 cp file works
 and there are no java errors in the cosbench logs.


3. Cosbench w/ hashcheck=true AND files 1 - 2 Gig will causes s3 cp file to timeout. This is the error that this defect needs to persue.
4 . I checked to make sure that Cosbench is not holding the port open or blocked. I stopped cosbench on all drivers and still could not s3 cp file
.
5. If I remove the read from the Cosbench workload xml file, then s3 cp file will work. It appears to NOT be a function of file size but doing both reads (GET) and writes (PUT) with hashCheck=true .

The best xml file to reproduce this quickly is:

[root@c83f1-dan4 RW_workloads]# cat /root/cosbench/workloads/RW_workloads/rand_read_write_delete_GB_1hour/s5002_GB_random_size_num_of_objects.xml
<?xml version="1.0" encoding="UTF-8" ?>
<workload name="GBrandomsizenumobj" description="GBrandom read and write workload">
  <storage type="s3" config="accesskey=qiIImaITeJCdkdABWAsk;secretkey=c80yUC5QfG9sxK99XQI70zXjpin9cqIeSR54r47c;path_style_access=true;endpoint=http://metallb2:80"/>

  <workflow>

   <workstage name="prepare_rGB">
      <work name="prepare_rGB" workers="256" interval="10" rampup="0" type="prepare" ratio="100" config="cprefix=s5002b;oprefix=s5002o;osuffix=_rGB;containers=r(1,2);objects=r(1,3);sizes=r(1,2)GB;hashCheck=true"/>
    </workstage>

    <workstage name="rGB_write_read_delete">
      <work name="rGB_write_read_delete" workers="256" interval="10" rampup="0" runtime="300">
        <operation type="write" ratio="20" config="cprefix=s5002b;oprefix=s5002o;osuffix=_rGB;containers=r(5,6);objects=r(1,20);sizes=r(1,2)GB;hashCheck=true"/>
        <operation type="read" ratio="80" config="cprefix=s5002b;oprefix=s5002o;osuffix=_rGB;containers=r(1,2);objects=r(1,3);hashCheck=true"/>
      </work>
    </workstage>


  </workflow>
</workload>

One test was done with log level nsfs. Noobaa diagnose and must-gather are colllected and uploaded to our box note.

root@c83f1-dan4 RW_workloads]# date ; sh /root/cosbench/cli.sh submit /root/cosbench/workloads/RW_workloads/rand_read_write_delete_GB_1hour/s5002_GB_random_size_num_of_objects.xml
Wed Apr 13 15:34:25 EDT 2022
Accepted with ID: w967
[root@c83f1-dan4 RW_workloads]#




The hang occurred around
Wed Apr 13 15:36:10 EDT 2022
Grabbed logs

Now try a new file name


[root@c83f1-dan4 RW_workloads]# date; s5001_2_31 cp good.xml s3://s5001b1
Wed Apr 13 15:38:33 EDT 2022

NOW LOG LEVEL ALL



[root@c83f1-dan4 RW_workloads]# date ; sh /root/cosbench/cli.sh submit /root/cosbench/workloads/RW_workloads/rand_read_write_delete_GB_1hour/s5002_GB_random_size_num_of_objects.xml
Wed Apr 13 15:55:08 EDT 2022
Accepted with ID: w969

Now try to s3 cp a file

[root@c83f1-dan4 RW_workloads]# date; s5001_2_31 cp nohup.out s3://s5001b1
Wed Apr 13 15:56:12 EDT 2022
upload: ./nohup.out to s3://s5001b1/nohup.out
[root@c83f1-dan4 RW_workloads]# date; s5001_2_31 cp rand_read.xml s3://s5001b1
Wed Apr 13 15:57:02 EDT 2022

In all cases oc logs does not show any errors.

@guymguym
Copy link
Member

@MonicaLemay This is a great elimination!

I probably missed it mentioned above somewhere - how much time, or which actions, it takes to recover from this situation?

Few suggestions:

  • Use aws --debug s3 cp/ls ...- the debug flag will produce very verbose logs from the client side, including tracing of all http requests and responses it sends and receives.

  • Check the metrics from openshift monitoring dashboards (see this blog for guidance) - I would take a look at the basic cpu/mem/network stats of the pods to see if we can spot anything. It would be relevant to see which pods max-out their CPU requests (endpoint/database perhaps) vs. which pods become idle and just wait for some event, and how things change overtime until the issue is hit, and then recovered.

  • Few more things you can try once this issue occurs: 1. GET an object 2. Try to access other buckets 3. Read & Write files directly to the filesystem (on that same bucket dir and best from the cluster mount point).

@MonicaLemay
Copy link
Author

@guymguym @romy

  1. When the condition occurs it is only on metallb IP 172.20.100.31 and 172.20.100.32. The metallb IP address 172.20.100.33 does not exhibiti the problem. The reason for this is b/c my cosbench set up only has 2 drivers. I don't have 3 drivers. Driver 1 points to 172.20.100.31 and driver 2 points to 172.20.100.32. It appears it only occurrs on IP's that had simultaneous read/write with hashCheck=true.
  • A file is attached for aws debug of cp file is attached. It was going to account s5002 bucket4, using 172.20.100.32.

  • When the condition occurs other accounts and buckets experience the same problem.

-When the condtion occurs s3 mb and rb works. Only PUT and GET fail to the 2 IP's that were doing cosbench simultaneous read/write with hashCheck=true.

  • I wrote 2 scripts. One does curl PUT to the s5001 account and the other does curl GET to the s5001 account. It uses the aws access keys and generates signature. The GET and PUT work until this GIT issue we are talking about occurs. When it occurs, the behaviour is just like cp file. Both curl GET and curl PUT hang to the 2 addresses mentioned in this post.

  • Last week, I had investigated CPU and memory to see if it was overly used.

Screen Shot 2022-04-14 at 2 47 40 PM

aws-debug-32.docx

@MonicaLemay
Copy link
Author

I can provide IP addresses for storage cluster, openshift cluster and appllication nodes if you noobaa team would like to debug on my cluster.

@guymguym
Copy link
Member

@MonicaLemay Thanks!

  • You didn't mention how long it takes to recover or if you had to do something to recover from this issue.
  • About the dashboard screenshot - is it filtering only that single endpoint pod? can you see all the namespace pods in the same graph to compare?
  • The cpu graph of the endpoint pod shows a burst of 3 minutes - is this the time of the entire cosbench run? or just the time of the issue?

@MonicaLemay
Copy link
Author

MonicaLemay commented Apr 15, 2022

@guymguym

  • In previous posts I mentioned that it never recovers. I have to delete endpoint pods and have them restarted. I have waited over 12 hours to see if it recovers on its own. It does not.

  • I looked at all endpoint pods on the graph but only posted the one for you to see for simplicity sake. There were only 2 endpoint pods that were doing I/O b/c the recreate scenario is so simple. It is not a high stress lots of I/O to recreate it. Only 2 endpoint pods (1 shown in graph to keep it simple) were doing I/O of about 10 files.

  • The I/O scenario to recreate this is very short - about 3 minutes or less.

One more comment - I cannot recreate this with warp. I just did a wrap run and did not see the problem. Warp and cosbench have different ways of doing tests. I think it is an important data point but not enought to yet blame cosbench as being defective

Please let me know if you need to meet with me. I can clarify all of your questions very quickly.

@romayalon
Copy link
Contributor

Hi @MonicaLemay
Can you please share the cosbench log so I'll be able to see which object failed and what was the error?
if possible please try to minimize the ops in the cosbench XML so it'll still fail, for instance, if it fails on prepare, so remove the rest of the script.

This is what I found regarding good.xml -
I see 3 retries of good.xml upload at 19:38:40, 19:39:40, 19:40:40 and eventually it fails with SlowDown error code.

[Endpoint/14][ERROR]core.endpoint.s3.s3_rest:: S3 ERROR <?xml version="1.0" encoding="UTF-8"?><Error><Code>SlowDown</Code><Message>Reduce your request rate.</Message><Resource>/s5001b1/good.xml</Resource><RequestId>l1xz6wds-8ditak-w2m</RequestId></Error> PUT /s5001b1/good.xml {"host":"172.20.100.31","accept-encoding":"identity","content-type":"text/xml","user-agent":"aws-cli/2.3.2 Python/3.8.8 Linux/4.18.0-240.el8.x86_64 exe/x86_64.rhel.8 prompt/off command/s3.cp","content-md5":"925sPKSZMpLjKyetdX660g==","expect":"100-continue","x-amz-date":"20220413T193840Z","x-amz-content-sha256":"66b57d67a15890b6985c286b56f6e135daae065aa32ea33a06d2ece990c59909","authorization":"AWS4-HMAC-SHA256 Credential=QzhyXj9wVDH9DvnK97L9/20220413/us-east-1/s3/aws4_request, SignedHeaders=content-md5;content-type;host;x-amz-content-sha256;x-amz-date, Signature=a38ec935293473e1a995f9a4d5f8ce79c3b1e0a2ef9e80f63da4d356d22ec03b","content-length":"1429"} Error: Semaphore Timeout
    at Semaphore._on_timeout (/root/node_modules/noobaa-core/src/util/semaphore.js:248:25)
    at Timeout.<anonymous> (/root/node_modules/noobaa-core/src/util/semaphore.js:244:53)
    at listOnTimeout (internal/timers.js:557:17)
    at processTimers (internal/timers.js:500:7)

And the same for rand_read.xml I see a SlowDown error-

[Endpoint/14][ERROR]core.endpoint.s3.s3_rest:: S3 ERROR <?xml version="1.0" encoding="UTF-8"?><Error><Code>SlowDown</Code><Message>Reduce your request rate.</Message><Resource>/s5001b1/rand_read.xml</Resource><RequestId>l1xzx9dk-6nkiq1-1c7f</RequestId></Error> PUT /s5001b1/rand_read.xml {"host":"172.20.100.31","accept-encoding":"identity","content-type":"text/xml","user-agent":"aws-cli/2.3.2 Python/3.8.8 Linux/4.18.0-240.el8.x86_64 exe/x86_64.rhel.8 prompt/off command/s3.cp","content-md5":"nC9U9rHx1mvrQLCpEzfLzg==","expect":"100-continue","x-amz-date":"20220413T195910Z","x-amz-content-sha256":"bb2b3ecd206acbf0a1ba55d0036bff465d40e3cdc7bdb0e0d9f5ce8e14b0c63c","authorization":"AWS4-HMAC-SHA256 Credential=QzhyXj9wVDH9DvnK97L9/20220413/us-east-1/s3/aws4_request, SignedHeaders=content-md5;content-type;host;x-amz-content-sha256;x-amz-date, Signature=2223257b97e704b739cbb6610fab5e8687d4de3f488afe477618c4208b4bd7d9","content-length":"1411"} Error: Semaphore Timeout
    at Semaphore._on_timeout (/root/node_modules/noobaa-core/src/util/semaphore.js:248:25)
    at Timeout.<anonymous> (/root/node_modules/noobaa-core/src/util/semaphore.js:244:53)
    at listOnTimeout (internal/timers.js:557:17)
    at processTimers (internal/timers.js:500:7)

@guymguym
Copy link
Member

@MonicaLemay Do we use any custom env overrides for NSFS_BUF_POOL_MEM_LIMIT? This is the initial value used by the buffers pool semaphore, and even if we don't set it explicitly it is calculated as 25% of the container memory limit, but in previous tests with Silvana we used to experiment with this setting:

kubectl set env deployment/noobaa-endpoint CONFIG_JS_NSFS_BUF_POOL_MEM_LIMIT=2147483648

There is a log print of the semaphore and buffers pool initial values here so we can check the actual value used.

@romayalon I see we left the upload_object flow with a non-accurate update to the semaphore because we don't have full visibility to the incoming http request buffers being allocated by the underlying nodejs http library. So in our code we surround the upload with just a single 8MB acquired from the semaphore in the assumption that this represents a rough estimate of the memory we use during the flow, but we know that nodejs streams can take more memory while flowing. I would try to create a stress test case that we can run locally and see if the same effect can be observed. Of course the root cause might be elsewhere still.

Here is where we acquire the semaphore on upload -

// TODO: Take up only as much as we need (requires fine-tune of the semaphore inside the _upload_stream)
// Currently we are taking config.NSFS_BUF_SIZE for any sized upload (1KB upload will take a full buffer from semaphore)
fs_xattr = await buffers_pool_sem.surround_count(
config.NSFS_BUF_SIZE,
async () => this._upload_stream(params, upload_path, fs_account_config, object_sdk.rpc_client, fs_xattr)
);

We might also want to look for the timeout warning prints that should give us the stack trace of the caller more accurately (we added those when we were debugging read flow semaphore cases) -

if (this._warning_timeout) {
// Capture the stack trace of the caller before registering the timer to identify the code that called it
const err = new Error('Warning stuck surround_count item');
warning_timer = setTimeout(() => console.error(err.stack), this._warning_timeout).unref();
}

@romayalon
Copy link
Contributor

@guymguym Sure I'll try to create a stress test..
Don't you think that the fact that it only happens when hashCheck=true is relevant information?

@guymguym
Copy link
Member

guymguym commented Apr 18, 2022

@romayalon If I understand this hashCheck=true mode correctly - this has no effect on the endpoint behavior, it only changes how the cosbench workers behaves, right? It might have a side effect on scheduling due to the workers having more cpu work between requests. But it is an interesting effect to analyze.

@MonicaLemay
Copy link
Author

@romayalon
I have uploaded the cosbench logs. Apologies. You had asked for them and I forgot to bundle them with the noobaa logs. They are in
https://ibm.ent.box.com/folder/145794528783?s=uueh7fp424vxs2bt4ndrnvh7uusgu6to

There are 2. One is from the run 967 which had NSFS debug level
The other is from run 969 which had ALL debug level.

The xml that I have posted in an above thread where I give info about run 967 and 969 is the shortest and best one. I'll copy from above and put it here so that you don't have to fish for it.

[root@c83f1-dan4 RW_workloads]# cat /root/cosbench/workloads/RW_workloads/rand_read_write_delete_GB_1hour/s5002_GB_random_size_num_of_objects.xml
<?xml version="1.0" encoding="UTF-8" ?>
<workload name="GBrandomsizenumobj" description="GBrandom read and write workload">
  <storage type="s3" config="accesskey=qiIImaITeJCdkdABWAsk;secretkey=c80yUC5QfG9sxK99XQI70zXjpin9cqIeSR54r47c;path_style_access=true;endpoint=http://metallb2:80"/>

  <workflow>

   <workstage name="prepare_rGB">
      <work name="prepare_rGB" workers="256" interval="10" rampup="0" type="prepare" ratio="100" config="cprefix=s5002b;oprefix=s5002o;osuffix=_rGB;containers=r(1,2);objects=r(1,3);sizes=r(1,2)GB;hashCheck=true"/>
    </workstage>

    <workstage name="rGB_write_read_delete">
      <work name="rGB_write_read_delete" workers="256" interval="10" rampup="0" runtime="300">
        <operation type="write" ratio="20" config="cprefix=s5002b;oprefix=s5002o;osuffix=_rGB;containers=r(5,6);objects=r(1,20);sizes=r(1,2)GB;hashCheck=true"/>
        <operation type="read" ratio="80" config="cprefix=s5002b;oprefix=s5002o;osuffix=_rGB;containers=r(1,2);objects=r(1,3);hashCheck=true"/>
      </work>
    </workstage>


  </workflow>
</workload>

@guymguym In my testing it is customer level testing. I am not allowed to modify anything other than if it is provided by our command line interface. In other words, I cannot modify memory pool buffers by editing deployments/pods. I have to use the default CPU and Memory .

I hope I answered all of your questions.

@guymguym
Copy link
Member

Got you @MonicaLemay. Just to clarify I didn't mean to use that as a valid workaround or anything funky like that, only as a debugging technique as we are still chasing the root cause of this issue. But I understand the concern with making these out-of-band changes to a long running system and whether undoing it will actually reset the configuration back correctly. I'll try to come up with more ideas on how to narrow down the problem from your side without making configuration changes, while we try to reproduce it on dev environment. Thanks!

@guymguym
Copy link
Member

@MonicaLemay couple more questions:

  1. Did you try this workload with file sizes lower than 1 GB? perhaps a 128-256 MB? I saw what you wrote on > 3 GB with hashCheck=true failing, but I didn't find if you wrote anything on reproducing this issue with lower sizes.
  2. Can share what is that java error from the >3GB case - maybe we can see if this gives any clues?

@romayalon
Copy link
Contributor

@guymguym
These are just examples, they appear many times in Cosbench logs:
I could find the following warning:

2022-04-13 15:34:59,884 [INFO] [NoneStorage] -performing GET at /s5002b1/s5002o2_rGB2022-04-13 15:34:59,948 [WARN] [AbstractOperator] -No checksum embedded in s5002b1\s5002o3_rGB

And after a while read timeouts started happening:

s5002o3_rGB2022-04-13 15:38:10,934 [WARN] [S3Storage] -below exception encountered when retrieving object s5002o1_rGB at bucket s5002b1: Unable to execute HTTP request: Read timed out2022-04-13 15:38:10,950 [ERROR] [AbstractOperator] -worker 102 fail to perform operation s5002b1/s5002o1_rGBjava.lang.NullPointerExceptionat com.intel.cosbench.driver.operator.Reader.validateChecksum(Reader.java:143)at com.intel.cosbench.driver.operator.Reader.doRead(Reader.java:95)at com.intel.cosbench.driver.operator.Reader.operate(Reader.java:69)at 

And for s5002b2/s5002o3_rGB:

2022-04-13 15:36:02,618 [ERROR] [AbstractOperator] -worker 129 fail to perform operation s5002b2/s5002o3_rGBjava.net.SocketTimeoutException: Read timed outat java.net.SocketInputStream.socketRead0(Native Method)at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)at java.net.SocketInputStream.read(SocketInputStream.java:171)at java.net.SocketInputStream.read(SocketInputStream.java:141)at 

And for s5002b5/s5002o9_rGB: (This time the error code is slowDown and not Einval as we had in the previous bug)

2022-04-13 15:39:02,275 [WARN] [S3Storage] -below exception encountered when creating object s5002o9_rGB at s5002b5: Failed to reset the request input stream;  If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
2022-04-13 15:38:04,878 [WARN] [S3Storage] -below exception encountered when retrieving object s5002o1_rGB at bucket s5002b2: Unable to execute HTTP request: Read timed out
2022-04-13 15:41:08,451 [WARN] [S3Storage] -below exception encountered when retrieving object s5002o2_rGB at bucket s5002b2: com.amazonaws.http.timers.client.SdkInterruptedException

@guymguym
Copy link
Member

@romayalon The first few are GET errors - perhaps I missed it and hashCheck=true actually invokes another read after write and that's what creating more stress?

@guymguym
Copy link
Member

guymguym commented Apr 19, 2022

OK so I see in cosbench code that hashCheck means:

  • Every object will be written with an appended hash at the end of its content, see Writer and RandomInputStream.
  • When reading the objects the reader will calculate hash while reading and when reaching the end of the object will extract the stored hash and compare, see Reader.
  • This is also explained in this short doc.

So the error you found - No checksum embedded in s5002b1\s5002o3_rGB comes from the Reader.validateChecksum() when not finding the expected guard (!!!!) around the hash at the end of the content, see here.

This seems weird indeed - I would try to look at this file from the FS in hex dump (xxd) - perhaps we can see what went wrong in its content. The appended hash should have a guard (!!!!) on both sides so the last bytes of each written object should have this: !!!!<32-hex-md5>!!!! - see HashUtil.

@guymguym
Copy link
Member

I wonder why this "no checksum embedded" log was written with "[NoneStorage]" and not "[S3Storage]" like the rest...

@romayalon
Copy link
Contributor

@guymguym yes, so according to the xml and the logs:

  1. Prepare stage - completed successfully - 3 objects (named: s5002o1, s5002o2, s5002o3) were stored in containers s5002b1 and s5002b2 (In total 6 objects of sizes 1/2 GB).
  2. write_read_delete stage - contains only writes and reads: (stage start timestamp -15:34:57,015)
    2.1. Write - range of 20 objects are being written to containers s5002b5, s5002b6 (sizes are 1/2 GB) - All writes were successful until 15:39:09,614 when we get the first reset (slowDown) error. Ratio=20
    2.2. Read - range of 3 objects (s5002o1, s5002o2, s5002o3) that will be read from containers s5002b1 and s5002b2 (same objects and buckets from prepare stage). Ratio=80
    first checksum error started at - 15:34:57,560.

@MonicaLemay do you see the template Guy mentioned !!!!<32-hex-md5>!!!! in these objects?
I'm wondering why some get ops are successful and some are not.

[NoneStorage] - most of the log messages are written with this [NoneStorage] in the log.

@MonicaLemay
Copy link
Author

When files are size 3 - 4 Gig, No files get completly written:

[root@c83f1-ems s5003]# find ./s500* -type f  | wc -l
29
[root@c83f1-ems s5003]# find ./s500* -type f
./s5003b1/.noobaa-nsfs_625d97a99e2004002a837889/uploads/9c126154-32de-4353-9b0f-abe54be1d09b
./s5003b2/.noobaa-nsfs_625d97b29e2004002a83788c/uploads/bcc7e25a-b0ee-441c-9ecc-2a410ae8f5d0
./s5003b2/.noobaa-nsfs_625d97b29e2004002a83788c/uploads/04daf916-1cee-46d1-842a-4edfae8413c6
./s5003b3/alias_commands
./s5003b5/.noobaa-nsfs_625d97be9e2004002a837895/uploads/1067f77d-96fd-42e1-9f7a-367f76d3fa54
./s5003b5/.noobaa-nsfs_625d97be9e2004002a837895/uploads/4ea1ea57-23f7-4188-a3b6-659c48c6d70d
./s5003b5/.noobaa-nsfs_625d97be9e2004002a837895/uploads/d04bfa57-eb67-4cc4-8ff8-46269272c90d
./s5003b5/.noobaa-nsfs_625d97be9e2004002a837895/uploads/3e26e3f4-6be8-42f7-aa63-ac8f7465b44d
./s5003b5/.noobaa-nsfs_625d97be9e2004002a837895/uploads/e6f23316-8bad-44be-96ff-0e7a4a739a6c
./s5003b5/.noobaa-nsfs_625d97be9e2004002a837895/uploads/715ec2bf-f681-4442-b38a-3e9582f83d0b
./s5003b5/.noobaa-nsfs_625d97be9e2004002a837895/uploads/f9cc167f-6c18-4900-bceb-0c5f0d403747
./s5003b5/.noobaa-nsfs_625d97be9e2004002a837895/uploads/3ad611f7-adeb-49db-aae6-3a4221283976
./s5003b5/.noobaa-nsfs_625d97be9e2004002a837895/uploads/5478b862-1ee6-42e3-9b2a-a17e16e8339b
./s5003b5/.noobaa-nsfs_625d97be9e2004002a837895/uploads/62dd5cae-03b4-425f-af14-aa1b117a73ab
./s5003b5/.noobaa-nsfs_625d97be9e2004002a837895/uploads/b3c2c524-b175-4f49-830c-fc1cabeba17a
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/27e34e18-8cba-44cc-ae60-2f10f3e49adf
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/e45d4c30-c1fc-4e0b-aada-788677f81f9c
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/9c19f8b5-e4c2-43c8-a136-f57f43d6c89d
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/d23fdefa-01c0-49d3-869b-9bd41f3b70cb
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/01bae4e3-65e7-4cd6-a7dd-cf2abaa67ff3
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/01a827ba-5f02-4bdf-bf3b-b1d3c185a6c8
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/928776f9-2787-4ab9-83c2-38c9c1f7a1a5
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/9f7b7aae-9c9c-4bcf-a5ca-208f194b9a61
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/b68acdbb-8b0d-4d20-be2f-01905b834163
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/be4e8952-8e02-4711-aafc-b2dc5866b92a
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/8a6ae8b8-78f2-4648-b77a-67820fd95575
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/8b718d29-d288-4d27-855c-d87e5af2dad4
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/925939f7-bf58-4cc3-9987-a1618f0deff1
./s5003b6/.noobaa-nsfs_625d97c19e2004002a837898/uploads/de49df27-eff6-4683-8d8b-43030c220912
[root@c83f1-ems s5003]#

The cosbench log:

================================================== stage: s1-prepare_rGB ==================================================
---------------------------------- mission: M4432CA8957, driver: dan4 ----------------------------------
2022-04-19 14:54:02,009 [INFO] [Log4jLogManager] - will append log to file /root/cosbench/log/mission/M4432CA8957.log
2022-04-19 14:54:02,277 [INFO] [NoneStorage] - performing PUT at /s5003b1/s5003o3_rGB
2022-04-19 14:54:02,277 [INFO] [NoneStorage] - performing PUT at /s5003b1/s5003o1_rGB
2022-04-19 14:54:02,278 [INFO] [NoneStorage] - performing PUT at /s5003b1/s5003o2_rGB
2022-04-19 14:54:02,311 [ERROR] [AbstractOperator] - worker 2 fail to perform operation s5003b1/s5003o2_rGB
java.lang.ArrayIndexOutOfBoundsException
	at com.intel.cosbench.driver.generator.RandomInputStream.processBytes(RandomInputStream.java:106)
	at org.apache.commons.io.input.NullInputStream.read(NullInputStream.java:232)
	at org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:75)
	at org.apache.commons.io.input.CountingInputStream.read(CountingInputStream.java:74)
	at com.intel.cosbench.driver.generator.XferCountingInputStream.read(XferCountingInputStream.java:38)
	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
	at com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:108)
	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
	at com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream.read(MD5DigestCalculatingInputStream.java:98)
	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
	at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
	at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:138)
	at com.amazonaws.http.RepeatableInputStreamRequestEntity.writeTo(RepeatableInputStreamRequestEntity.java:153)
	at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:89)
	at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
	at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117)
	at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
	at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203)
	at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:236)
	at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doSendRequest(SdkHttpRequestExecutor.java:63)
	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
	at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
	at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
	at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
	at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:837)
	at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:607)
	at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:376)
	at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:338)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:287)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3826)
	at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1405)
	at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1270)
	at com.intel.cosbench.api.S3Stor.S3Storage.createObject(S3Storage.java:167)
	at com.intel.cosbench.driver.operator.Writer.doWrite(Writer.java:98)
	at com.intel.cosbench.driver.operator.Preparer.operate(Preparer.java:97)
	at com.intel.cosbench.driver.operator.AbstractOperator.operate(AbstractOperator.java:76)
	at com.intel.cosbench.driver.agent.WorkAgent.performOperation(WorkAgent.java:197)
	at com.intel.cosbench.driver.agent.WorkAgent.doWork(WorkAgent.java:177)
	at com.intel.cosbench.driver.agent.WorkAgent.execute(WorkAgent.java:134)
	at com.intel.cosbench.driver.agent.AbstractAgent.call(AbstractAgent.java:44)
	at com.intel.cosbench.driver.agent.AbstractAgent.call(AbstractAgent.java:1)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
2022-04-19 14:54:02,315 [INFO] [NoneStorage] - performing PUT at /s5003b2/s5003o2_rGB
2022-04-19 14:54:02,315 [INFO] [NoneStorage] - performing PUT at /s5003b2/s5003o1_rGB
2022-04-19 14:54:02,315 [INFO] [NoneStorage] - performing PUT at /s5003b2/s5003o3_rGB
2022-04-19 14:54:02,318 [ERROR] [ErrorStatistics] - error code: N/A occurred 7 times, fail to operate: s5003b1/s5003o2_rGB, s5003b1/s5003o2_rGB, s5003b1/s5003o3_rGB, s5003b1/s5003o1_rGB, s5003b2/s5003o2_rGB, s5003b2/s5003o1_rGB, s5003b2/s5003o3_rGB
java.lang.ArrayIndexOutOfBoundsException
	at com.intel.cosbench.driver.generator.RandomInputStream.processBytes(RandomInputStream.java:106)
	at org.apache.commons.io.input.NullInputStream.read(NullInputStream.java:232)
	at org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:75)
	at org.apache.commons.io.input.CountingInputStream.read(CountingInputStream.java:74)
	at com.intel.cosbench.driver.generator.XferCountingInputStream.read(XferCountingInputStream.java:38)
	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
	at com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:108)
	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
	at com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream.read(MD5DigestCalculatingInputStream.java:98)
	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
	at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
	at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:138)
	at com.amazonaws.http.RepeatableInputStreamRequestEntity.writeTo(RepeatableInputStreamRequestEntity.java:153)
	at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:89)
	at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
	at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117)
	at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
	at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203)
	at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:236)
	at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doSendRequest(SdkHttpRequestExecutor.java:63)
	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
	at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
	at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
	at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
	at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:837)
	at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:607)
	at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:376)
	at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:338)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:287)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3826)
	at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1405)
	at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1270)
	at com.intel.cosbench.api.S3Stor.S3Storage.createObject(S3Storage.java:167)
	at com.intel.cosbench.driver.operator.Writer.doWrite(Writer.java:98)
	at com.intel.cosbench.driver.operator.Preparer.operate(Preparer.java:97)
	at com.intel.cosbench.driver.operator.AbstractOperator.operate(AbstractOperator.java:76)
	at com.intel.cosbench.driver.agent.WorkAgent.performOperation(WorkAgent.java:197)
	at com.intel.cosbench.driver.agent.WorkAgent.doWork(WorkAgent.java:177)
	at com.intel.cosbench.driver.agent.WorkAgent.execute(WorkAgent.java:134)
	at com.intel.cosbench.driver.agent.AbstractAgent.call(AbstractAgent.java:44)
	at com.intel.cosbench.driver.agent.AbstractAgent.call(AbstractAgent.java:1)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
---------------------------------- mission: ME432CA896C, driver: app7 ----------------------------------
2022-04-19 14:54:02,007 [INFO] [Log4jLogManager] - will append log to file /root/cosbench/log/mission/ME432CA896C.log
================================================== stage: s2-rGB_write_read_delete ==================================================
---------------------------------- mission: MB432CD6741, driver: dan4 ----------------------------------
2022-04-19 14:54:14,007 [INFO] [Log4jLogManager] - will append log to file /root/cosbench/log/mission/MB432CD6741.log
2022-04-19 14:54:14,272 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,278 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,280 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,280 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o2_rGB
2022-04-19 14:54:14,279 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,277 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,282 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,282 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,282 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o9_rGB
2022-04-19 14:54:14,277 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,277 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,276 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o1_rGB
2022-04-19 14:54:14,283 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,275 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,275 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,286 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o13_rGB
2022-04-19 14:54:14,287 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,274 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,274 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,287 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,288 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,274 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,288 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o17_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,274 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o2_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o3_rGB
2022-04-19 14:54:14,274 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,291 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,272 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,292 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,272 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,270 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,272 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,292 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,292 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o4_rGB
2022-04-19 14:54:14,292 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,292 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,291 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,291 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,291 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,291 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,291 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,291 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,291 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,291 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,291 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,290 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o1_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o20_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o19_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o18_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,289 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,288 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,288 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,288 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,288 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,288 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,288 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,288 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,288 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,288 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o16_rGB
2022-04-19 14:54:14,287 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,287 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,287 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,287 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,287 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,287 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,287 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o15_rGB
2022-04-19 14:54:14,287 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,286 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o14_rGB
2022-04-19 14:54:14,286 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,286 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,286 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o12_rGB
2022-04-19 14:54:14,284 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o11_rGB
2022-04-19 14:54:14,284 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,284 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,284 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,284 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,283 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o10_rGB
2022-04-19 14:54:14,283 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,283 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,283 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,283 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,283 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,283 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,283 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,283 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,283 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,283 [ERROR] [AbstractOperator] - worker 18 fail to perform operation s5003b6/s5003o2_rGB
java.lang.ArrayIndexOutOfBoundsException
	at com.intel.cosbench.driver.generator.RandomInputStream.processBytes(RandomInputStream.java:106)
	at org.apache.commons.io.input.NullInputStream.read(NullInputStream.java:232)
	at org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:75)
	at org.apache.commons.io.input.CountingInputStream.read(CountingInputStream.java:74)
	at com.intel.cosbench.driver.generator.XferCountingInputStream.read(XferCountingInputStream.java:38)
	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
	at com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:108)
	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
	at com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream.read(MD5DigestCalculatingInputStream.java:98)
	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
	at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
	at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:138)
	at com.amazonaws.http.RepeatableInputStreamRequestEntity.writeTo(RepeatableInputStreamRequestEntity.java:153)
	at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:89)
	at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
	at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117)
	at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
	at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203)
	at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:236)
	at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doSendRequest(SdkHttpRequestExecutor.java:63)
	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
	at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
	at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
	at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
	at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:837)
	at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:607)
	at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:376)
	at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:338)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:287)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3826)
	at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1405)
	at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1270)
	at com.intel.cosbench.api.S3Stor.S3Storage.createObject(S3Storage.java:167)
	at com.intel.cosbench.driver.operator.Writer.doWrite(Writer.java:98)
	at com.intel.cosbench.driver.operator.Writer.operate(Writer.java:79)
	at com.intel.cosbench.driver.operator.AbstractOperator.operate(AbstractOperator.java:76)
	at com.intel.cosbench.driver.agent.WorkAgent.performOperation(WorkAgent.java:197)
	at com.intel.cosbench.driver.agent.WorkAgent.doWork(WorkAgent.java:177)
	at com.intel.cosbench.driver.agent.WorkAgent.execute(WorkAgent.java:134)
	at com.intel.cosbench.driver.agent.AbstractAgent.call(AbstractAgent.java:44)
	at com.intel.cosbench.driver.agent.AbstractAgent.call(AbstractAgent.java:1)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
2022-04-19 14:54:14,339 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,282 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o5_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,282 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,282 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,282 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,282 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o8_rGB
2022-04-19 14:54:14,282 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,282 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,282 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,281 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o7_rGB
2022-04-19 14:54:14,281 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,281 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o6_rGB
2022-04-19 14:54:14,281 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,281 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o5_rGB
2022-04-19 14:54:14,281 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,281 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,281 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,281 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,280 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,280 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o4_rGB
2022-04-19 14:54:14,280 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,280 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,280 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o3_rGB
2022-04-19 14:54:14,347 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,346 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o8_rGB
2022-04-19 14:54:14,345 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o7_rGB
2022-04-19 14:54:14,343 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,341 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,352 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o6_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,340 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,339 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,339 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,339 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,339 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,339 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,354 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,352 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,351 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,350 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
---------------------------------- mission: M7432CD676E, driver: app7 ----------------------------------
2022-04-19 14:54:14,007 [INFO] [Log4jLogManager] - will append log to file /root/cosbench/log/mission/M7432CD676E.log
2022-04-19 14:54:14,244 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o5_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o7_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o8_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o1_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o13_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,246 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o2_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o6_rGB
2022-04-19 14:54:14,251 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,251 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o10_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,245 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,245 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,251 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,251 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,251 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,251 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o9_rGB
2022-04-19 14:54:14,251 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,251 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o8_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o7_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o5_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o4_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o3_rGB
2022-04-19 14:54:14,250 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,249 [ERROR] [AbstractOperator] - worker 189 fail to perform operation s5003b5/s5003o13_rGB
java.lang.ArrayIndexOutOfBoundsException
	at com.intel.cosbench.driver.generator.RandomInputStream.processBytes(RandomInputStream.java:106)
	at org.apache.commons.io.input.NullInputStream.read(NullInputStream.java:232)
	at org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:75)
	at org.apache.commons.io.input.CountingInputStream.read(CountingInputStream.java:74)
	at com.intel.cosbench.driver.generator.XferCountingInputStream.read(XferCountingInputStream.java:38)
	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
	at com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:108)
	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
	at com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream.read(MD5DigestCalculatingInputStream.java:98)
	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
	at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
	at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
	at org.apache.http.entity.InputStreamEntity.writeTo(InputStreamEntity.java:138)
	at com.amazonaws.http.RepeatableInputStreamRequestEntity.writeTo(RepeatableInputStreamRequestEntity.java:153)
	at org.apache.http.entity.HttpEntityWrapper.writeTo(HttpEntityWrapper.java:89)
	at org.apache.http.impl.client.EntityEnclosingRequestWrapper$EntityWrapper.writeTo(EntityEnclosingRequestWrapper.java:108)
	at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:117)
	at org.apache.http.impl.AbstractHttpClientConnection.sendRequestEntity(AbstractHttpClientConnection.java:265)
	at org.apache.http.impl.conn.ManagedClientConnectionImpl.sendRequestEntity(ManagedClientConnectionImpl.java:203)
	at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:236)
	at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doSendRequest(SdkHttpRequestExecutor.java:63)
	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
	at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
	at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
	at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
	at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:837)
	at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:607)
	at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:376)
	at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:338)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:287)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3826)
	at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1405)
	at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1270)
	at com.intel.cosbench.api.S3Stor.S3Storage.createObject(S3Storage.java:167)
	at com.intel.cosbench.driver.operator.Writer.doWrite(Writer.java:98)
	at com.intel.cosbench.driver.operator.Writer.operate(Writer.java:79)
	at com.intel.cosbench.driver.operator.AbstractOperator.operate(AbstractOperator.java:76)
	at com.intel.cosbench.driver.agent.WorkAgent.performOperation(WorkAgent.java:197)
	at com.intel.cosbench.driver.agent.WorkAgent.doWork(WorkAgent.java:177)
	at com.intel.cosbench.driver.agent.WorkAgent.execute(WorkAgent.java:134)
	at com.intel.cosbench.driver.agent.AbstractAgent.call(AbstractAgent.java:44)
	at com.intel.cosbench.driver.agent.AbstractAgent.call(AbstractAgent.java:1)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o1_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o20_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o19_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o18_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o17_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,249 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o16_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o15_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o14_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o12_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o11_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o10_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o9_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,248 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o6_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o4_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o3_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o2_rGB
2022-04-19 14:54:14,247 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,258 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,256 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,256 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o16_rGB
2022-04-19 14:54:14,256 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,256 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,259 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,255 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,255 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o15_rGB
2022-04-19 14:54:14,255 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,255 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,254 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,254 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o14_rGB
2022-04-19 14:54:14,254 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,254 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o13_rGB
2022-04-19 14:54:14,254 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o12_rGB
2022-04-19 14:54:14,254 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o11_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,253 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,260 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o2_rGB
2022-04-19 14:54:14,260 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o1_rGB
2022-04-19 14:54:14,259 [INFO] [NoneStorage] - performing PUT at /s5003b6/s5003o18_rGB
2022-04-19 14:54:14,259 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o3_rGB
2022-04-19 14:54:14,259 [INFO] [NoneStorage] - performing PUT at /s5003b5/s5003o17_rGB
2022-04-19 14:54:14,259 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB
2022-04-19 14:54:14,258 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,260 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o3_rGB
2022-04-19 14:54:14,261 [INFO] [NoneStorage] - performing GET at /s5003b1/s5003o1_rGB
2022-04-19 14:54:14,261 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB

@MonicaLemay
Copy link
Author

When files are size 100 - 200 MB the condition does not occur. This defect does not happen.

@guymguym
Copy link
Member

@MonicaLemay Are you running any special branch/fork of cosbench? or a release? or latest commit from its master branch?

@guymguym
Copy link
Member

@MonicaLemay @romayalon Hmmmmmmmmm this RandomInputStream code where you see the ArrayIndexOutOfBoundsException error is definitely looking to me like the root cause of this entire thing. It mixes up 32bit signed integers with 64bit in calculations, which is very well known to be prone to bugs around the 2 GB offset (which is the maximum positive 32bit signed value). I'll try to find the exact problem but I sense that both cases are caused by this code.

https://github.com/intel-cloud/cosbench/blob/68ffaa90e170f87a70c09b5ae187b640eb040264/dev/cosbench-driver/src/com/intel/cosbench/driver/generator/RandomInputStream.java#L81-L129

@guymguym
Copy link
Member

Ok I think I found it, and opened an issue and a PR to fix it on cosbench.
But since the last commit on cosbench is almost 4 years old, I don't think this will land anytime soon...
In any case, we might want to patch our cosbench with this fix to make sure this really solves also our 1-2 GB case.
See the PR here - intel-cloud/cosbench#426

@MonicaLemay
Copy link
Author

In order to follow up on the hex from the FS, I had to do another run. I had deleted the data from previous runs to do the requested MB and > 3GB tests.

New run:

 date ; sh /root/cosbench/cli.sh submit /root/cosbench/workloads/RW_workloads/rand_read_write_delete_GB_1hour/s5003_GB_random_size_num_of_objects.xml
Tue Apr 19 15:11:41 EDT 2022
Accepted with ID: w988

From teh cosbench log

2022-04-19 15:13:35,841 [WARN] [AbstractOperator] - No checksum embedded in s5003b2\s5003o2_rGB
2022-04-19 15:13:35,842 [INFO] [NoneStorage] - performing GET at /s5003b2/s5003o2_rGB

From the FS


[root@c83f1-ems s5003b2]# pwd ; ls -ail
/gpfs/hpofs1/s5003/s5003b2
total 4882834
8406077 drwxr-x---.  4 5003 5003       8192 Apr 19 15:13 .
  69634 drwxr-xr-x. 52 5003 5003       4096 Mar  7 19:59 ..
4639305 drwxr-x---.  3 5003 5003       4096 Mar  3 17:20 .noobaa-nsfs_622115c8c8206400294c9c68
4200092 drwxr-x---.  3 5003 5003       4096 Apr 18 13:33 .noobaa-nsfs_625d97b29e2004002a83788c
8485394 -rw-r-----.  1 5003 5003 2000000000 Apr 19 15:13 s5003o1_rGB
8485396 -rw-r-----.  1 5003 5003 2000000000 Apr 19 15:13 s5003o2_rGB
8485395 -rw-r-----.  1 5003 5003 1000000000 Apr 19 15:12 s5003o3_rGB

I'm not sure what to look at with the xxd. I am looking at file s5003o2_rGB in bucket s5003b2. The last few lines from xxd is this:

[root@c83f1-ems ~]# tail -20 /tmp/out
773592c0: 716f 6968 637a 7878 7962 6b6d 7579 6a72  qoihczxxybkmuyjr
773592d0: 7476 7168 7871 666f 7a6c 7871 656a 7272  tvqhxqfozlxqejrr
773592e0: 6f64 777a 6d72 746a 6b61 7073 666b 796f  odwzmrtjkapsfkyo
773592f0: 6f70 646f 7973 7866 6b77 6661 6b66 6c61  opdoysxfkwfakfla
77359300: 6f78 6666 616a 6579 6666 6c64 6c7a 7567  oxffajeyffldlzug
77359310: 7264 6373 6462 7869 7561 7765 676a 6f74  rdcsdbxiuawegjot
77359320: 7269 7869 7371 6c76 6267 616e 6666 6b73  rixisqlvbganffks
77359330: 7964 7071 6f75 6971 6166 666b 7068 7270  ydpqouiqaffkphrp
77359340: 7766 726f 7a6c 6178 6a6c 7677 7077 7571  wfrozlaxjlvwpwuq
77359350: 6976 6b73 7662 7872 6566 7568 6667 6878  ivksvbxrefuhfghx
77359360: 6672 7771 6764 6b6e 7778 6273 6168 7876  frwqgdknwxbsahxv
77359370: 6b63 7468 6e77 796f 6162 6670 7276 7561  kcthnwyoabfprvua
77359380: 7069 6d71 7779 7366 776d 7468 646a 7573  pimqwysfwmthdjus
77359390: 7571 6362 7775 716f 6963 6b65 676f 6c66  uqcbwuqoickegolf
773593a0: 6365 6c70 6367 6f70 6e6b 7064 7162 6a73  celpcgopnkpdqbjs
773593b0: 6272 7272 6a6d 7575 636d 6b6b 7a71 6669  brrrjmuucmkkzqfi
773593c0: 6b6d 626e 7179 7877 6d6b 6e63 6562 706e  kmbnqyxwmkncebpn
773593d0: 7575 637a 626e 6463 2121 2121 3031 3534  uuczbndc!!!!0154
773593e0: 3239 6163 3135 6239 3339 6130 6663 3934  29ac15b939a0fc94
773593f0: 3863 3635 6532 3037 3861 6166 2121 2121  8c65e2078aaf!!!!
[root@c83f1-ems ~]#

@guymguym
Copy link
Member

guymguym commented Apr 19, 2022

@MonicaLemay Thanks! So what we can see in the hex dump is that it ends with a !!!! + 32 hex chars + !!!! as expected -

773593d0: 7575 637a 626e 6463 2121 2121 3031 3534  uuczbndc!!!!0154
773593e0: 3239 6163 3135 6239 3339 6130 6663 3934  29ac15b939a0fc94
773593f0: 3863 3635 6532 3037 3861 6166 2121 2121  8c65e2078aaf!!!!

This means that in this case, the write itself actually finished fine, which makes sense because the size is 2,000,000,000 bytes which is 140 MB less than 2 GB so it shouldn't have caused int overflow.

However when I look at the Reader validateChecksum() code here -
https://github.com/intel-cloud/cosbench/blob/68ffaa90e170f87a70c09b5ae187b640eb040264/dev/cosbench-driver/src/com/intel/cosbench/driver/operator/Reader.java#L158-L163

I get a strong feeling that I see another bug since the code ignores the bytes in buf2.
I will see if I can prove that this is the case and submit an issue+pr to cosbench.

@romayalon
Copy link
Contributor

@guymguym Awesome! So you think that these failures cause retries and that's what creates more stress?

@MonicaLemay
Copy link
Author

@guymguym @romayalon
Thanks so much for all of the work on this. I think I got a little bit lost in the discussions above. It sounds like the issue that I found is not a noobaa bug but all of the problems are in cosbench??? Is that correct?? Is there still a noobaa problem that is being investigated?? Please help me by clarifying . Thanks again.

@guymguym
Copy link
Member

Hey so I think it's still hard for me to determine.

  1. The first PR I opened to cosbench on RandomInputStream (Writer) can explain the missing checksum for files >2GB and is definitely a cosbench issue (although I doubt that it is going to merge anything new after 4 years of silence).
  2. The second case for <2GB I still haven't managed to find an exact bug in cosbench Reader that would result in those "No checksum embedded in bucket/key" while the files actually have it - although I did see some questionable cases in the cosbench code that might be causing it.
  3. I am not sure if those cases are really causing the timeouts - I was looking into those hoping it's related to the changes of hashCheck=true, but I'm not sure yet.

What I wanted to suggest was to try and build cosbench with the fix in intel-cloud/cosbench#426 to see if this eliminates any of those cases.

The initial approach we discussed here was to add custom instrumentation to the noobaa endpoints code to be able to pin point bottlenecks that could cause those timeouts.

Both directions would make positive progress - WDYT? any preference?

@MonicaLemay
Copy link
Author

@guymguym I looked at the instructions to build cosbench from the PR https://github.com/intel-cloud/cosbench/blob/master/BUILD.md

Unfortunately, our team may be deciding to not invest in cosbench as a system test tool and we will not be building it. Should this defect proceed w/out the cosbench fix or should we now close this defect if you think everything we are seeing is a cosbench problem??

@guymguym
Copy link
Member

@MonicaLemay Thanks. I do believe that the fact that you had to restart the endpoints is still an issue here. The fact that we observed this no-service issue only with (potentially buggy) cosbench with hashCheck=true, means there is a pathological case in the noobaa code. Perhaps we still have an issue with promises and the semaphore as we had in #6848.

@nimrod-becker I would rename this issue title to represent this is about cosbench +hashCheck = manual restart, and I would say to keep it open because we don't know why we had to restart the endpoints.

@romayalon I would look for the semaphore messages (like this semaphore warning stuck and the buffer pool warning stuck) in the endpoints logs and see if there's anything reported.

@akmithal
Copy link

Hi @MonicaLemay , Are you blocked on your story for this fix ? Or can you still work upon on your story with md5 check off.

@MonicaLemay
Copy link
Author

@romayalon I am able to make progress on my user story. I can use warp. I reported this as a blocker b/c this defect is possibly a blocker to the MVP release I think that part of the message got lost in the hand-off of status. If you do not agree that this is a blocker to MVP please let me know. I made that assessment b/c of the above statement:

The fact that we observed this no-service issue only with (potentially buggy) cosbench with hashCheck=true, means there is a pathological case in the noobaa code. Perhaps we still have an issue with promises and the semaphore as we had in https://github.com/noobaa/noobaa-core/issues/6848.

Please let me know what you think.

@romayalon
Copy link
Contributor

@MonicaLemay I didn't say it's a blocker / not a blocker for the MVP.
What I said is that if there are more stories to test that do not involve hashCheck=true, you can continue with these, right?
And I agree that there might be a thing here in NooBaa but I do think that we need to do the build changes to check if it still occours. Anyway, I'll try to find more info on the logs we currently have.

@MonicaLemay
Copy link
Author

@romayalon Thanks for the clarification. I have 2 more stories to test that do not involve hashCheck=true. I can continue.

@MonicaLemay
Copy link
Author

@romayalon The test results for removing hashCheck=true from the read stage does NOT cause the outage (can't cp file). Here is the xml file.

[root@c83f1-dan4 RW_workloads]# cat /root/cosbench/workloads/RW_workloads/rand_read_write_delete_GB_1hour/s5001_GB_random_size_num_of_objects.xml
<?xml version="1.0" encoding="UTF-8" ?>
<workload name="GBrandomsizenumobj" description="GBrandom read and write workload">
  <storage type="s3" config="accesskey=AMKwXLzOJEXzayywWPnO;secretkey=oUJLgVjRH7D0bvzHtoQ6zv8Bec6PQqLusVDlpz9D;path_style_access=true;endpoint=http://metallb:80"/>

  <workflow>

   <workstage name="prepare_rGB">
      <work name="prepare_rGB" workers="256" interval="10" rampup="0" type="prepare" ratio="100" config="cprefix=s5001b;oprefix=s5001o;osuffix=_rGB;containers=r(1,2);objects=r(1,3);sizes=r(1,2)GB;hashCheck=true"/>
    </workstage>

    <workstage name="rGB_write_read_delete">
      <work name="rGB_write_read_delete" workers="256" interval="10" rampup="0" runtime="300">
        <operation type="write" ratio="20" config="cprefix=s5001b;oprefix=s5001o;osuffix=_rGB;containers=r(5,6);objects=r(1,20);sizes=r(1,2)GB;hashCheck=true"/>
        <operation type="read" ratio="80" config="cprefix=s5001b;oprefix=s5001o;osuffix=_rGB;containers=r(1,2);objects=r(1,3)"/>
      </work>
    </workstage>


  </workflow>
</workload>

@rkomandu
Copy link
Collaborator

rkomandu commented May 9, 2022

@romayalon are you planning to have this in the 4.9.6 code ?

@romayalon
Copy link
Contributor

@rkomandu first I'll build these changes on 4.9 branch so we can see that it solves Monica's issue and then we can discuss if it'll get into 4.9.6.

@romayalon
Copy link
Contributor

@rkomandu @MonicaLemay
The fix can be found in noobaa/noobaa-core:moki-20220509, @MonicaLemay can you please try it?

@romayalon
Copy link
Contributor

@MonicaLemay an updated image named noobaa/noobaa-core:moki-20220510 is now available.

@romayalon
Copy link
Contributor

@MonicaLemay The most updated image named noobaa/noobaa-core:moki-20220511 is now available.

@MonicaLemay
Copy link
Author

I tested the patch today and it works. I no longer see the issue. Thank you.

@romayalon
Copy link
Contributor

@MonicaLemay is there something else we should do here or we can close?

@MonicaLemay
Copy link
Author

As far as I am concerned it can be closed. I don't know the Noobaa team's procedures. Some teams leave defects open until a fix or patch is in an official build. If this team does not do this, then it can be closed.

@romayalon
Copy link
Contributor

according to the last comments, will close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants