Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

too much data to send from app to usercache? #443

Closed
PatGendre opened this issue Aug 21, 2019 · 14 comments
Closed

too much data to send from app to usercache? #443

PatGendre opened this issue Aug 21, 2019 · 14 comments

Comments

@PatGendre
Copy link
Contributor

PatGendre commented Aug 21, 2019

Hi @shankari , coming back from 4 weeks in Asia, I used very little the app as I had no data connection but e-mission continued tracking, and apparently there was a problem with the sync and I have a lot of trips in draft mode. I checked in mongo and no data has been sent to the database since July 27.
Now when I try to force sync it fails. I suppose this is due to the size of the logs (25MB sqlite, so I had to send them via GDrive instead of GMail) ?

I join the logs (only for today, and I removed the detailed list for the 2 log lines BuiltinUserCache : After clear complete, cache has xxx entries because the list are more than 10k entries!

It is not clear to me where the ForceSync is actually log, nevertheless:

  • I noticed that there is a 400 error (and not a 500 error like in issue)
229027,1566379754.724,2019-08-21T02:29:14.724000-07:00,CommunicationHelper : Got response org.apache.http.message.BasicHttpResponse@e29351a with statusHTTP/1.1 400 Bad Request
229028,1566379754.731,2019-08-21T02:29:14.731000-07:00,ServerSyncAdapter : IO Error java.io.IOException while posting converted trips to JSON

So I guess it is not the same utf-8 encoding than in issue 333 (but we are in August - août with an accent, like février in February!).

  • I also noticed that there is a query limited to 10000 entries and I suppose the sqlite db has much more entries than that
229401,1566392802.8539999,2019-08-21T06:06:42.854000-07:00,BuiltinUserCache : Query is SELECT * from userCache WHERE (type = 'message' OR type = 'rw-document' OR type = 'sensor-data') AND (write_ts <= 1.566377298952E9) ORDER BY write_ts LIMIT 10000
229402,1566392803.4029999,2019-08-21T06:06:43.403000-07:00,BuiltinUserCache : Result count = 10000

Do we need to go in debug mode with Yann and insert breakpoints in CommunicationHelper like we did with you for issue 333? Thanks in advance!

logs2108only.txt.gz

@PatGendre PatGendre changed the title too much data to sent from app to usercache? too much data to send from app to usercache? Aug 21, 2019
@shankari
Copy link
Contributor

shankari commented Aug 21, 2019

@PatGendre I checked in a fix to the UTF-8 issue (e-mission/cordova-server-communication#22) on May 8, so I am pretty sure that is not an issue. I think this is almost certainly related to the server ability to accept large data sizes since the definition of error code 400 is

400 Bad Request
The server cannot or will not process the request due to an apparent client error (e.g., malformed request syntax, size too large, invalid request message framing, or deceptive request routing)

However, I do set the incoming MEMFILE_MAX to 1GB
https://github.com/e-mission/e-mission-server/blob/master/emission/net/api/cfc_webapp.py#L72
and I haven't had issues with prior trips with spotty internet access. Although in my case, I was only missing internet access for 2-3 days at a time, not a month.

I also noticed that there is a query limited to 10000 entries and I suppose the sqlite db has much more entries than that

We do this to limit the size of the data sent to the server. The data is chunked and 10,000 entries are sent at a time. If this push was successful, the next push should push the next 10,000 entries and so on until we catch up.

Let me check the logs.

@shankari
Copy link
Contributor

shankari commented Aug 21, 2019

It looks like the server is up and auth + communication with the server is fine since we are able to get data from the server properly

229035,1566379756.303,2019-08-21T02:29:16.303000-07:00,CommunicationHelper : Got response org.apache.http.message.BasicHttpResponse@7c58e27 with status HTTP/1.1 200 OK
229036,1566379756.8679998,2019-08-21T02:29:16.868000-07:00,"CommunicationHelper : Result Summary JSON = {""server_to_phone"": [{""metadata"": {""write_ts"": 1566346800.6171312, ""type"": ""document"", ""key"": ""diary/trips-2019-07-28""}, ""data"": [{""type"": ""FeatureCollection"", ""properties"": {""source"": ""DwellSegmentat length 976211"
  • Do you see anything in the server error logs (e.g. .../emissionpy.err.log)?
  • Do you see anything in the webserver logs (e.g. .../webserver.log)?
  • Does just bumping up the MEMFILE_MAX to be larger (2GB/3GB depending on the size of your host) fix the problem?

You don't have a reverse proxy (e.g. nginx) in front of the server, do you?

@shankari
Copy link
Contributor

One caveat is that if the request entity is too large, we should see a 413 error
#283 (comment)
but you are seeing a 400 error. I wonder if something else is malforming the request on the way to the server.

Let's see what you find in the logs, otherwise, we may need to add additional logging on the server to figure out what is going on.

@PatGendre
Copy link
Contributor Author

PatGendre commented Aug 21, 2019

It looks like the server is up and auth + communication with the server is fine since we are able to get data from the server properly

Yes I think so

Do you see anything in the server error logs (e.g. .../emissionpy.err.log)?

I don't think we have such a file, where would it be located?
We have only a webserver-errors.log but it has no errors, except only always this same duration error:
2019-08-21 10:24:42,719:ERROR:140185425659648:old style duration 6.8045647144317 63 != timer based duration 0.0028424691408872604

Do you see anything in the webserver logs (e.g. .../webserver.log)?

Not really, I don't know where to look at... For instance I don't see much data when I grep usercache/get, only stats data.

Does just bumping up the MEMFILE_MAX to be larger (2GB/3GB depending on the size of your host) fix the problem?

I would just have to change the value[ here in bottle.py](MEMFILE_MAX = 102400)? and restart the web server I suppose?

You don't have a reverse proxy (e.g. nginx) in front of the server, do you?

No

@shankari
Copy link
Contributor

I don't think we have such a file, where would it be located?

Depends on your configuration for supervisord (in supervisord.conf). If you are not using supervisord, then this will be on the console

I would just have to change the value[ here in bottle.py](MEMFILE_MAX = 102400)? and restart the web server I suppose?

Yup

@PatGendre
Copy link
Contributor Author

PatGendre commented Aug 22, 2019

Hi @shankari
thanks so I put MEMFILE_MAX = 1024000 (10x more, but 1GB "only", the server has 4GB).
I looked at the console live, and when I click on force sync, i takes ca. 30 seconds before I see the usercache/get on the console, I suppose it's the time the app takes to prepare the data.

The console logs looks like this
2019-08-22 10:46:14,193:DEBUG:140125698770688:START POST /usercache/put 2019-08-22 10:46:14,195:DEBUG:140125698770688:Called userCache.put 2019-08-22 10:46:14,201:ERROR:140125698770688:old style duration 9.11538314819336 != timer based duration 0.007599020376801491 2019-08-22 10:46:14,206:DEBUG:140125698770688:insert called with entry of type <class 'emission.core.wrapper.entry.Entry'> 2019-08-22 10:46:14,207:DEBUG:140125698770688:entry was fine, no need to fix it 2019-08-22 10:46:14,208:DEBUG:140125698770688:Inserting entry Entry({'_id': ObjectId('5d5e56564e276deee1ce39d1'), 'user_id': '', 'metadata': Metadata({'key': 'stats/server_api_error', 'platform': 'server', 'write_ts': 1566463574.2050643, 'time_zone': 'America/Los_Angeles', 'write_local_dt': LocalDate({'year': 2019, 'month': 8, 'day': 22, 'hour': 1, 'minute': 46, 'second': 14, 'weekday': 3, 'timezone': 'America/Los_Angeles'}), 'write_fmt_time': '2019-08-22T01:46:14.205064-07:00'}), 'data': {'name': 'MISMATCH_POST_/usercache/put', 'ts': 1566463574.201437, 'reading': 9.107784127816558}}) into timeseries 2019-08-22 10:46:14,212:DEBUG:140125698770688:END POST /usercache/put 9.11538314819336

There is a MISMATCH_POST_/usercache/put data name so I suppose sthg went wrong?
In the e-mission-server code, the only place where I've found an uppercase MISMATCH is in the bottle.py code.

And for the moment the Force Sync still doesn't work, no data seems to be sent to the usercache in mongo

@PatGendre
Copy link
Contributor Author

PatGendre commented Aug 22, 2019

Just in case, I also checked the pipeline errors (as there was no webserver errors logged).

There are 2 kinds of errors (apparently unrelated, so may be this not relevant here and should be filed in another issue? ) :

  1. this errors occurs quite frequently
    2019-08-14 02:02:19,995:ERROR:139902233872128:Section counts = [161, 161, 161, 161, 161, 161, 161, 161, 161, 161, 161, 160, 160, 160, 160, 160, 160, 160], expecting 1 2019-08-14 02:02:20,197:ERROR:139902233872128:check start: [0.0, 0.03924761639671469, 4.990648078602423, 4.990681339097829, 5.242294002871493, 4.331063203503419, 3.4161164690810915, 2.96829309976264 58, 6.888797199766008, 9.661295334173385] != [0.0, 0.0, 4.990648078602423, 4.990681339097829, 5.242294002871493, 4.331063203503419, 3.4161164690810915, 2.9682930997626458, 6.888797199766008, 9.66129 5334173385]

  2. this error is why I mention the pipeline errors because this is a "document too large" error ; however it concerns 2 other users, not me:
    2019-08-22 02:12:00,344:ERROR:139911404447488:Storing views to cache failed for user 81770b1a-ba16-4216-b316-8061b4509977 Traceback (most recent call last): File "/var/emission/e-mission-server/emission/net/usercache/builtin_usercache_handler.py", line 109, in storeViewsToCache self.storeTimelineToCache(time_query) File "/var/emission/e-mission-server/emission/net/usercache/builtin_usercache_handler.py", line 199, in storeTimelineToCache uc.putDocument("diary/trips-%s"%day, day_gj_list) File "/var/emission/e-mission-server/emission/net/usercache/builtin_usercache.py", line 106, in putDocument upsert=True) File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/pymongo/collection.py", line 2535, in update collation=collation) File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/pymongo/collection.py", line 766, in _update codec_options=self.__write_response_codec_options).copy() File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/pymongo/pool.py", line 482, in command self._raise_connection_failure(error) File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/pymongo/pool.py", line 610, in _raise_connection_failure raise error File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/pymongo/pool.py", line 477, in command collation=collation) File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/pymongo/network.py", line 99, in command name, size, max_bson_size + message._COMMAND_OVERHEAD) File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/pymongo/message.py", line 626, in _raise_document_too_large raise DocumentTooLarge("command document too large") pymongo.errors.DocumentTooLarge: command document too large

2019-08-22 02:14:30,214:ERROR:139911404447488:Storing views to cache failed for user f6356066-21a8-4113-8535-4b4df073b996 Traceback (most recent call last): File "/var/emission/e-mission-server/emission/net/usercache/builtin_usercache_handler.py", line 109, in storeViewsToCache self.storeTimelineToCache(time_query) ... File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/pymongo/network.py", line 99, in command name, size, max_bson_size + message._COMMAND_OVERHEAD) File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/pymongo/message.py", line 626, in _raise_document_too_large raise DocumentTooLarge("command document too large") pymongo.errors.DocumentTooLarge: command document too large

@PatGendre
Copy link
Contributor Author

and I have at last (i takes to be partient!) an interesting error (timeout) in the console log 👍

START 2019-08-22 12:01:34.842994 POST /usercache/put old style duration 10.442207098007202 != timer based duration 0.004675650037825108 END 2019-08-22 12:01:45.291159 POST /usercache/put 10.442207098007202 START 2019-08-22 12:01:45.699741 POST /usercache/get END 2019-08-22 12:01:46.908423 POST /usercache/get bd8aeff5-ec66-43b0-8962-cef1ca23b86b 1.208510160446167 START 2019-08-22 12:03:06.699147 POST /usercache/get END 2019-08-22 12:03:15.086852 POST /usercache/get f6356066-21a8-4113-8535-4b4df073b996 8.387272357940674 socket.error 'The write operation timed out' Traceback (most recent call last): File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/cheroot/server.py", line 1152, in communicate req.respond() File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/cheroot/server.py", line 974, in respond self.server.gateway(self).respond() File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/cheroot/wsgi.py", line 136, in respond self.write(chunk) File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/cheroot/wsgi.py", line 217, in write self.req.write(chunk) File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/cheroot/server.py", line 1026, in write self.conn.wfile.write(chunk) File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/cheroot/makefile.py", line 28, in write self._flush_unlocked() File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/cheroot/makefile.py", line 37, in _flush_unlocked n = self.raw.write(bytes(self._write_buf)) File "/root/anaconda3/envs/emission/lib/python3.6/socket.py", line 604, in write return self._sock.send(b) File "/root/anaconda3/envs/emission/lib/python3.6/ssl.py", line 944, in send return self._sslobj.write(data) File "/root/anaconda3/envs/emission/lib/python3.6/ssl.py", line 642, in write return self._sslobj.write(data) socket.timeout: The write operation timed out

@PatGendre
Copy link
Contributor Author

PatGendre commented Aug 22, 2019

and I have at last (it's rewarding to be patient!) an interesting error (timeout) in the console log 👍

START 2019-08-22 12:01:34.842994 POST /usercache/put old style duration 10.442207098007202 != timer based duration 0.004675650037825108 END 2019-08-22 12:01:45.291159 POST /usercache/put 10.442207098007202 START 2019-08-22 12:01:45.699741 POST /usercache/get END 2019-08-22 12:01:46.908423 POST /usercache/get bd8aeff5-ec66-43b0-8962-cef1ca23b86b 1.208510160446167 START 2019-08-22 12:03:06.699147 POST /usercache/get END 2019-08-22 12:03:15.086852 POST /usercache/get f6356066-21a8-4113-8535-4b4df073b996 8.387272357940674 socket.error 'The write operation timed out' Traceback (most recent call last): File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/cheroot/server.py", line 1152, in communicate req.respond() File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/cheroot/server.py", line 974, in respond self.server.gateway(self).respond() File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/cheroot/wsgi.py", line 136, in respond self.write(chunk) File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/cheroot/wsgi.py", line 217, in write self.req.write(chunk) File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/cheroot/server.py", line 1026, in write self.conn.wfile.write(chunk) File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/cheroot/makefile.py", line 28, in write self._flush_unlocked() File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/cheroot/makefile.py", line 37, in _flush_unlocked n = self.raw.write(bytes(self._write_buf)) File "/root/anaconda3/envs/emission/lib/python3.6/socket.py", line 604, in write return self._sock.send(b) File "/root/anaconda3/envs/emission/lib/python3.6/ssl.py", line 944, in send return self._sslobj.write(data) File "/root/anaconda3/envs/emission/lib/python3.6/ssl.py", line 642, in write return self._sslobj.write(data) socket.timeout: The write operation timed out

@PatGendre
Copy link
Contributor Author

PatGendre commented Aug 22, 2019

The previous timeout error occurred many times today (4 times for write, 4 times for read).

There is also another handshake error occurring once;
Error in HTTPServer.tick Traceback (most recent call last): File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/cheroot/server.py", line 1538, in start self.tick() File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/cheroot/server.py", line 1622, in tick s, ssl_env = self.ssl_adapter.wrap(s) File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/cheroot/ssl/builtin.py", line 76, in wrap sock, do_handshake_on_connect=True, server_side=True) File "/root/anaconda3/envs/emission/lib/python3.6/ssl.py", line 407, in wrap_socket _context=self, _session=session) File "/root/anaconda3/envs/emission/lib/python3.6/ssl.py", line 817, in __init__ self.do_handshake() File "/root/anaconda3/envs/emission/lib/python3.6/ssl.py", line 1077, in do_handshake self._sslobj.do_handshake() File "/root/anaconda3/envs/emission/lib/python3.6/ssl.py", line 689, in do_handshake self._sslobj.do_handshake() OSError: [Errno 0] Error

and another type of error that occurred also once today (connection reset by peer)

START 2019-08-22 13:53:16.495629 POST /usercache/put
Traceback (most recent call last):
  File "/var/emission/e-mission-server/emission/net/api/bottle.py", line 992, in _handle
    self.trigger_hook('before_request')
  File "/var/emission/e-mission-server/emission/net/api/bottle.py", line 706, in trigger_hook
    return [hook(*args, **kwargs) for hook in self._hooks[__name][:]]
  File "/var/emission/e-mission-server/emission/net/api/bottle.py", line 706, in <listcomp>
    return [hook(*args, **kwargs) for hook in self._hooks[__name][:]]
  File "emission/net/api/cfc_webapp.py", line 434, in before_request
    request.params.start_ts = time.time()
  File "/var/emission/e-mission-server/emission/net/api/bottle.py", line 222, in __get__
    if key not in storage: storage[key] = self.getter(obj)
  File "/var/emission/e-mission-server/emission/net/api/bottle.py", line 1264, in params
    for key, value in self.forms.allitems():
  File "/var/emission/e-mission-server/emission/net/api/bottle.py", line 222, in __get__
    if key not in storage: storage[key] = self.getter(obj)
  File "/var/emission/e-mission-server/emission/net/api/bottle.py", line 1252, in forms
    for name, item in self.POST.allitems():
  File "/var/emission/e-mission-server/emission/net/api/bottle.py", line 222, in __get__
    if key not in storage: storage[key] = self.getter(obj)
  File "/var/emission/e-mission-server/emission/net/api/bottle.py", line 1397, in POST
    pairs = _parse_qsl(tonat(self._get_body_string(), 'latin1'))
  File "/var/emission/e-mission-server/emission/net/api/bottle.py", line 1363, in _get_body_string
    data = self.body.read(clen)
  File "/var/emission/e-mission-server/emission/net/api/bottle.py", line 1375, in body
    self._body.seek(0)
  File "/var/emission/e-mission-server/emission/net/api/bottle.py", line 222, in __get__
    if key not in storage: storage[key] = self.getter(obj)
  File "/var/emission/e-mission-server/emission/net/api/bottle.py", line 1344, in _body
    for part in body_iter(read_func, self.MEMFILE_MAX):
  File "/var/emission/e-mission-server/emission/net/api/bottle.py", line 1302, in _iter_body
    part = read(min(maxread, bufsize))
  File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/cheroot/server.py", line 315, in read
    data = self.rfile.read(size)
  File "/root/anaconda3/envs/emission/lib/python3.6/_pyio.py", line 1001, in read
    return self._read_unlocked(size)
  File "/root/anaconda3/envs/emission/lib/python3.6/_pyio.py", line 1041, in _read_unlocked
    chunk = self.raw.read(wanted)
  File "/root/anaconda3/envs/emission/lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  File "/root/anaconda3/envs/emission/lib/python3.6/ssl.py", line 1012, in recv_into
    return self.read(nbytes, buffer)
  File "/root/anaconda3/envs/emission/lib/python3.6/ssl.py", line 874, in read
    return self._sslobj.read(len, buffer)
  File "/root/anaconda3/envs/emission/lib/python3.6/ssl.py", line 631, in read
    v = self._sslobj.read(len, buffer)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/emission/e-mission-server/emission/net/api/bottle.py", line 1012, in _handle
    self.trigger_hook('after_request')
  File "/var/emission/e-mission-server/emission/net/api/bottle.py", line 706, in trigger_hook
    return [hook(*args, **kwargs) for hook in self._hooks[__name][:]]
  File "/var/emission/e-mission-server/emission/net/api/bottle.py", line 706, in <listcomp>
    return [hook(*args, **kwargs) for hook in self._hooks[__name][:]]
  File "emission/net/api/cfc_webapp.py", line 442, in after_request
    request.params.timer.__exit__()
AttributeError: 'str' object has no attribute '__exit__'

@PatGendre
Copy link
Contributor Author

@shankari
I tried again today and unsurpringly I get the same timeout error, so there are some parameters to tune on the web server.

@PatGendre
Copy link
Contributor Author

The timeout param default value is 10 seconds in cheroot/server.py
so I suppose I can try say 120 just to see if it works?

I will try and tell you if this works
~/anaconda3/envs/emission/lib/python3.6/site-packages/cheroot/server.py
timeout=120

There is again a handshake error? but the server still works.
Error in HTTPServer.tick Traceback (most recent call last): File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/cheroot/server.py", line 1539, in start self.tick() File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/cheroot/server.py", line 1623, in tick s, ssl_env = self.ssl_adapter.wrap(s) File "/root/anaconda3/envs/emission/lib/python3.6/site-packages/cheroot/ssl/builtin.py", line 76, in wrap sock, do_handshake_on_connect=True, server_side=True) File "/root/anaconda3/envs/emission/lib/python3.6/ssl.py", line 407, in wrap_socket _context=self, _session=session) File "/root/anaconda3/envs/emission/lib/python3.6/ssl.py", line 817, in __init__ self.do_handshake() File "/root/anaconda3/envs/emission/lib/python3.6/ssl.py", line 1077, in do_handshake self._sslobj.do_handshake() File "/root/anaconda3/envs/emission/lib/python3.6/ssl.py", line 689, in do_handshake self._sslobj.do_handshake() OSError: [Errno 0] Error

@shankari
Copy link
Contributor

FYI, if this recurs, and you want to change the app to reduce the batch size, it is here
https://github.com/e-mission/cordova-usercache/search?q=10000&unscoped_q=10000

Unfortunately, it is currently hardcoded, but I will open a new issue to make it configurable

@PatGendre
Copy link
Contributor Author

Ok, Thanks, this might be useful for another time. I will close the issue as finally we've found that it is was due to our code not taking into account the correction of issue 333 in the communication server plugin (and an accent in the French word for August...).
I will reopen questions about errors in the pipeline that I've mentioned in case they were related to the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants