Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that data is transmitted as UTF-8 #333

Closed
PatGendre opened this issue Mar 7, 2019 · 54 comments · Fixed by e-mission/cordova-server-communication#22
Closed

Ensure that data is transmitted as UTF-8 #333

PatGendre opened this issue Mar 7, 2019 · 54 comments · Fixed by e-mission/cordova-server-communication#22

Comments

@PatGendre
Copy link
Contributor

I could not find a similar issue so I open this one:
I have recorded tracks for the last few days, but no data is visible in mongo, except stats/server_api_time. There is no background/location nor background/filtered_location. And therefore no analysis data either.
I checked the server log, for 2 other users there is some location data in mongo.
In the server log, I see 2 kinds of error :

  • old style duration != timer based duration but I don't know if this could be related to my problem, I don't even know if the request concerns me or another user...
  • stats/server_api_error each time for 'user_id': '
    I looked at the phone logs but I don't what to look for, I send it to you (12MB) in case it is useful and you can have a look...
    Thanks if you see what could explain this problem !
    dblogs-7mars.tar.gz
@PatGendre
Copy link
Contributor Author

I followed the FAQ https://github.com/e-mission/e-mission-docs/blob/master/docs/e-mission-server/troubleshooting_tips_FAQ.md#confirm-that-the-entries-are-really-in-the-database and find 0 for len(data_list), which confirms what I've seen in mongo.

@PatGendre
Copy link
Contributor Author

I've run the intake pipeline, there are no apparent errors except at the beginning :
(emission) root@grfmapfabmob:/var/emission/e-mission-server# ./e-mission-py.bash bin/debug/intake_single_user.py -e patgendre storage not configured, falling back to sample, default configuration Connecting to database URL localhost analysis.debug.conf.json not configured, falling back to sample, default configuration google maps key not configured, falling back to nominatim transit stops query not configured, falling back to default ERROR:root:habitica not configured, game functions not supported
(for habitica I guess it's normal we don't use it)
the first messages saying "falling back to default conf" seem also normal? At least I remember that similar messages were displayed when the pipeline actually produced data, back in End January.

@PatGendre
Copy link
Contributor Author

another clue perhaps, there is only one intake_single.log file and an empty intake_single_error.log file, no other intake_.log files in /var/tmp (we use the default sample conf for intake logs) and there is not any data after February 7. I've run the intake pipeline several times manually since Feb 7, I can't find where the corresponding logs are...

@shankari
Copy link
Contributor

shankari commented Mar 7, 2019

couple of quick checks:

  • do you see /usercache/put for the user in the logs? If the data is not getting to the server, then we really need to look at the phone logs.

another clue perhaps, there is only one intake_single.log file and an empty intake_single_error.log file,

  • how are you running the pipeline? Are you running the multi-process pipeline or the single user pipeline in debug? If you are running the single user pipeline, then there should be only one log file, all runs get appended to that log file. If you are running the multi-user pipeline, do you see anything on the command line?

@PatGendre
Copy link
Contributor Author

PatGendre commented Mar 8, 2019

  • yes I see /usercache/put in the logs
    example
2019-03-04 15:06:21,988[DEBUG-140479756744448]{emission/net/api/cfc_webapp.py@427}:START POST /usercache/put
2019-03-04 15:06:21,989[DEBUG-140479756744448]{emission/net/api/cfc_webapp.py@252}:Called userCache.put
2019-03-04 15:06:21,990[ERROR-140479756744448]{emission/net/api/cfc_webapp.py@436}:old style duration 0.4203517436981201 != timer based duration 0.0014113839715719223
2019-03-04 15:06:21,990[ERROR-140479756744448]{emission/net/api/cfc_webapp.py@436}:old style duration 0.4203517436981201 != timer based duration 0.0014113839715719223
2019-03-04 15:06:21,991[DEBUG-140479756744448]{/var/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py@337}:insert called with entry of type <class 'emission.core.wrapper.entry.Entry'>
2019-03-04 15:06:21,991[DEBUG-140479756744448]{/var/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py@346}:entry was fine, no need to fix it
2019-03-04 15:06:21,992[DEBUG-140479756744448]{/var/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py@348}:Inserting entry Entry({'_id': ObjectId('5c7d30dd4e276dc733886f93'), 'user_id': '', 'metadata': Metadata({'key': 'stats/server_api_error', 'platform': 'server', 'write_ts': 1551708381.9908416, 'time_zone': 'America/Los_Angeles', 'write_local_dt': LocalDate({'year': 2019, 'month': 3, 'day': 4, 'hour': 6, 'minute': 6, 'second': 21, 'weekday': 0, 'timezone': 'America/Los_Angeles'}), 'write_fmt_time': '2019-03-04T06:06:21.990842-08:00'}), 'data': {'name': 'MISMATCH_POST_/usercache/put', 'ts': 1551708381.9902549, 'reading': 0.4189403597265482}}) into timeseries
  • I've found where the logs of the intake-multiprocess.py ; for whatever reason they have been logged in the file as the webserver.log !!
    The problem is the only data uploaded seems to be stats/server_api_time
    example here, there seems to be no data to process
2019-03-05 08:56:18,283[DEBUG-140619732752128]{/var/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py@185}:curr_query = {'invalid': {'$exists': False}, 'user_id': UUID('91bc554a-4228-4afd-b2ca-be8eb7e33760'), '$or': [{'metadata.key': 'analysis/recreated_location'}], 'data.ts': {'$lte': 1548920763.074, '$gte': 1548919380.077}}, sort_key = data.ts
2019-03-05 08:56:18,283[DEBUG-140619732752128]{/var/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py@188}:orig_ts_db_keys = [], analysis_ts_db_keys = ['analysis/recreated_location']
2019-03-05 08:56:18,298[DEBUG-140589322659584]{/var/emission/e-mission-server/emission/analysis/intake/segmentation/trip_segmentation_methods/dwell_segmentation_time_filter.py@129}:len(last10PointsDistances) = 10, len(last5MinsDistances) = 54
2019-03-05 08:56:18,298[DEBUG-140589322659584]{/var/emission/e-mission-server/emission/analysis/intake/segmentation/trip_segmentation_methods/dwell_segmentation_time_filter.py@131}:last5MinsTimes.max() = 298.0, time_threshold = 300
2019-03-05 08:56:18,298[DEBUG-140589322659584]{/var/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py@185}:curr_query = {'invalid': {'$exists': False}, 'user_id': UUID('a224fa8a-3a86-4140-ae50-beab3f5cf70a'), '$or': [{'metadata.key': 'statemachine/transition'}], 'data.ts': {'$lte': 1551459107.0, '$gte': 1551459102.0}}, sort_key = data.ts
2019-03-05 08:56:18,298[DEBUG-140589322659584]{/var/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py@188}:orig_ts_db_keys = ['statemachine/transition'], analysis_ts_db_keys = []
2019-03-05 08:56:18,298[DEBUG-140589322659584]{/var/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py@232}:finished querying values for ['statemachine/transition'], count = 0
2019-03-05 08:56:18,299[DEBUG-140589322659584]{/var/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py@232}:finished querying values for [], count = 0
2019-03-05 08:56:18,300[DEBUG-140589322659584]{/var/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py@204}:orig_ts_db_matches = 0, analysis_ts_db_matches = 0
2019-03-05 08:56:18,303[DEBUG-140589322659584]{/var/emission/e-mission-server/emission/storage/timeseries/builtin_timeseries.py@275}:Found 0 results
2019-03-05 08:56:18,303[DEBUG-140589322659584]{/var/emission/e-mission-server/emission/analysis/intake/segmentation/restart_checking.py@27}:In range 1551459102.0 -> 1551459107.0 found no transitions

@shankari
Copy link
Contributor

shankari commented Mar 8, 2019

I've found where the logs of the intake-multiprocess.py ; for whatever reason they have been logged in the file as the webserver.log !!

Are you sure you didn't change the settings in conf/logs/intake*. I can't think of any other reason why the intake logs should be in the webserver logs.

yes I see /usercache/put in the logs

Does the /usercache/put list actual entries, similar to the example in the troubleshooting guide?
I don't see that in your example

2017-10-08 17:45:59,146:DEBUG:139922601957120:Updated result for user = ea59084e-11d4-40 76-9252-3b9a29ce35e0, key = background/location, write_ts = 1507481639.3 = {'updatedExisting': False, u'nModified': 0, u'ok': 1, u'upserted': ObjectId('59da6457cb17471ac0bc8f48 '), u'n': 1}

@PatGendre
Copy link
Contributor Author

Are you sure you didn't change the settings in conf/logs/intake*. I can't think of any other reason why the intake logs should be in the webserver logs.

yes I am sure, the reason is not a good one, the was modified by hand in intake-multiprocess.py so as to redirect the logs to another file. We will correct this.

Does the /usercache/put list actual entries, similar to the example in the troubleshooting guide?
I don't see that in your example

Yes /usercache/put list entries but for other users, for instance:
2019-03-01 17:50:10,782[DEBUG-140479748351744]{/var/emission/e-mission-server/emission/net/api/usercache.py@58}:Updated result for user = a224fa8a-3a86-4140-ae50-beab3f5cf70a, key = stats/client_nav_event, write_ts = 1551428317.537 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('5c7962c232df604a3f5ca0e6'), 'ok': 1.0, 'updatedExisting': False} 2019-03-01 17:50:10,783[DEBUG-140479748351744]{/var/emission/e-mission-server/emission/net/api/usercache.py@58}:Updated result for user = a224fa8a-3a86-4140-ae50-beab3f5cf70a, key = stats/client_nav_event, write_ts = 1551428318.819 = {'n': 1, 'nModified': 0, 'upserted': ObjectId('5c7962c232df604a3f5ca0e8'), 'ok': 1.0, 'updatedExisting': False} ...
but for my uuid, I cannot match the string "Updated result for user" in the log file...

@PatGendre
Copy link
Contributor Author

I looked at the scripts in the bin directory, there is a debug/fix_usercache_processing.py script, do you think it is worthwhile to execute it?

Also I wanted to execute the delete_user.py script, but there is No module named 'get_database' ; I suspect the line from get_database import get_uuid_db, get_profile_db, get_moves_db should be corrected as something like from emission.core.get_database import get_uuid_db, get_profile_db, get_moves_db ? Does this script still work correctly?

@shankari
Copy link
Contributor

shankari commented Mar 8, 2019

I think you should first see if your data is getting to the server correctly because as I said, I saw connection errors in the phone logs that you sent earlier. Was the earlier usercache/put that you found for your UUID? what does it look like if it doesn't have any Updated result for user entries?

@PatGendre
Copy link
Contributor Author

PatGendre commented Mar 8, 2019

The usercache/put was not for my UUID. For my UUID I've found no occurrence of the Updated result for user in the log file. I've found occurrence of usercache/put though, such as cited above #333 (comment) .
But these lines are web server logs I guess, not pipeline logs (but there are in the same file as I told you, due a conf mistake)

@PatGendre
Copy link
Contributor Author

I cheked again and confirm, there is no other occurrence of usercache/put for my UUID, except when the server receives stats/server_api_time data ; no timeserie data is transmitted when I look at the log file.

@shankari
Copy link
Contributor

shankari commented Mar 8, 2019

so then your data is not making it from the phone to the server. You can mail your phone logs to yourself and look at them, or browse the log directly on the phone (Profile -> Check log). If you "Force sync" and then look at the logs on the phone, you should get a quick sense of whether the sync is succeeding or not.

@PatGendre
Copy link
Contributor Author

Ok, thanks !
I don't think the "force sync" work, I will try again.
I have around 50000 records in the sql db, only INFO and DEBUG (no ERROR if this exists).
Among them, only ca. 2000 background/locations and filtered_locations

@shankari
Copy link
Contributor

shankari commented Mar 8, 2019

Quick check: have you pulled the geofence changes and switched back to your own apk? If you are still on emTripLog with the interscity channel, then your data is coming to my server. Check the host that you are connected to (Profile -> Developer Zone -> Sync -> host).

Also, just to clarify, you can browse logs also in Profile -> Developer Zone -> Check Logs. I forgot to put "Developer Zone" in #333 (comment)

@PatGendre
Copy link
Contributor Author

the host is http://145.239.101.212:8080 , so I suppose it is the good one.
The for Sync doesn't seem to work, I confirm, because I don't see it in the webserver log.

@PatGendre
Copy link
Contributor Author

in the phone log, there is an ERROR serverSyncAdapter IO Error ... connection refused while posting converted trips to JSON

@shankari
Copy link
Contributor

shankari commented Mar 8, 2019

so you are having some issues with connectivity?! are you able to access that webserver from (say) the browser on the phone?

@PatGendre
Copy link
Contributor Author

PatGendre commented Mar 8, 2019 via email

@shankari
Copy link
Contributor

shankari commented Mar 8, 2019

I would just follow standard debugging steps:

  • are you able to access the server? (sounds like yes)
  • are you able to access the server from the phone? (sounds like yes)
  • are you able to access the server from the app? (e.g. from the dashboard)
  • are you able to access the server from the sync code in the app? (sounds like no)

There's a lot of potential causes for the server being inaccessible on stackoverflow (e.g. no INTERNET permission in the app, no support for proxy...).

@PatGendre
Copy link
Contributor Author

Ok, thank you! I will try to understand what is wrong.
Note that although I have in the phone log this serverSyncAdapter IO Error, I have also a DEBUG message saying "js : local and native values match, already sinced"
and I have some 200 HTTP INFO messages, as well and INFO message "CommunicationHelper : Got response ... with status HTTP 500", so the server is not completely unaccessible, it may depend on the request and on the payload. I will try to have a deeper look and give you additional feedback until Monday.

@shankari
Copy link
Contributor

shankari commented Mar 8, 2019

"js : local and native values match, already sinced"

That is just for the javascript local storage and the native database plugin

CommunicationHelper : Got response ... with status HTTP 500

Yeah this seems to indicate that you are connecting to the server but getting an error.
Since you are building the app, you can also run in a debugger. All communication with the server that is related to you personally (as opposed to the aggregate) should go through https://github.com/e-mission/cordova-server-communication/blob/master/src/android/CommunicationHelper.java#L76

As we already discussed, some aggregate calls go through $http, but none of the user calls do.

@PatGendre
Copy link
Contributor Author

thanks! I will try with Yann on Monday, he knows Android Studio and so the debugger too, I guess.

@PatGendre
Copy link
Contributor Author

Thanks to some interactive debugging and looking at the webserver logs, we now know the issue has to do with encoding février at some point along the collection and sync path.
The problem occurs in usercache/put, no data is sent because the JSON payload is not valid, in fact there is an encoding error, the error is due to the character "\xe9" several places in the file in the field "fmt_time":"28 f\xe9vr. 2019 16:14:54" , the month is févr. in French with an accent.
Once the encoding is corrected in the app, the JSON payload should be put correctly towards the server (the file is 1.4Mb).

@shankari
Copy link
Contributor

let's try to debug where we are not using UTF-8 along the data collection and sync path. First, is the phone saving the data correctly as UTF-8? If you look at the userCacheDB, how is févr. encoded?

@PatGendre
Copy link
Contributor Author

hi, in the sqlite usercachedb, I have for instance
{"accuracy":95.787,"altitude":102.0,"bearing":0.0,"elapsedRealtimeNanos":619090000000,"filter":"time","fmt_time":"28 févr. 2019 16:14:54","latitude":48.8397591,"longitude":2.3368957,"sensed_speed":0.0,"ts":1.551366894194E9}

I don't know how to check the encoding but the "é" string is shown (correctly) as "é" not as "\xe9"
The data field of table usercache is of type TEXT.

@shankari
Copy link
Contributor

so this seems overall consistent with the other stuff you have noticed. basically, the python implementation is not UTF-8 consistent, at least on your server. I am going to manually set the date/time and locale in the emulator and see if this is an overall issue.

@shankari
Copy link
Contributor

ok, so I tried to reproduce this in the emulator and it is not working because the location points are sent to the emulator by the laptop and that is still set to March 26.

Screen Shot 2019-03-26 at 12 21 16 PM

sqlite> select data from userCache where key="background/filtered_location 
{"accuracy":20.0,"altitude":5.0,"bearing":0.0,"elapsedRealtimeNanos":115739073019,"filter":"time","fmt_time":"26 mars 2019 8:16:03 PM","latitude":38.1219983,"longitude":-131.5999983,"sensed_speed":0.0,"ts":1.553627763E9 
{"accuracy":20.0,"altitude":5.0,"bearing":0.0,"elapsedRealtimeNanos":178993731763,"filter":"time","fmt_time":"26 mars 2019 8:17:06 PM","latitude":38.322,"longitude":-131.8,"sensed_speed":0.0,"ts":1.553627826E9 
{"accuracy":20.0,"altitude":5.0,"bearing":0.0,"elapsedRealtimeNanos":210005651859,"filter":"time","fmt_time":"26 mars 2019 8:17:37 PM","latitude":38.322,"longitude":-131.8,"sensed_speed":0.0,"ts":1.553627857E9}

@shankari
Copy link
Contributor

I really don't want to reset my laptop datetime to Feb. Let's try on a physical device instead...

@shankari
Copy link
Contributor

If that still doesn't work, I'll have to try some other python scripts because it looks like we won't be able to reproduce this until August (Août)

@shankari
Copy link
Contributor

so in python3, JSON is supposed to be represented in UTF-*

https://docs.python.org/3.6/library/json.html#character-encodings

The RFC requires that JSON be represented using either UTF-8, UTF-16, or UTF-32, with UTF-8 being the recommended default for maximum interoperability.

other encodings are supported when bytes are converted to strings
https://docs.python.org/3.6/library/stdtypes.html#str

Note that the body is in fact in bytes (this is the informal string representation starting with b')

body = b'{"user":"test_french_feb","phone_to_server":[{"metadata":{"key":"stats\\/client_nav_event",

bottle does support python3
https://github.com/e-mission/e-mission-server/blob/master/emission/net/api/bottle.py#L113
so I am not quite sure exactly why this is not already handled

@shankari
Copy link
Contributor

ok so b = self._get_body_string() is a byte string, which is why json_loads(b) fails for non-UTF-8. But why doesn't get_body_string() decode the body properly?

because JSON requests are supposed to be in UTF-8
https://stackoverflow.com/questions/29761905/default-encoding-of-http-post-request-with-json-body

Note that even the standard #333 (comment) says that we should use latin-1 only for text outputs.

So the real problem appears to be that the app is encoding the data in latin-1 (e.g. ISO-8859) instead of UTF-8.

There are two possible fixes:

The principled fix is better because ISO-8859 doesn't even support the full latin alphabet (https://en.wikipedia.org/wiki/ISO/IEC_8859-1#Languages_with_incomplete_coverage), let alone other languages.

@shankari
Copy link
Contributor

I will do a quick verification that the cheap fix will in fact work, though, since we may run out of time for the principled fix.

@shankari
Copy link
Contributor

Hm, I am a little confused. If I try to generate the iso-8859 string file by encoding the utf-8 one

In [25]: open("/tmp/utf-8.txt").read().encode("latin1")
Out[25]: b'{"metadata":{"key":"background\\\\/location","platform":"android","read_ts":0,"time_zone":"Europe\\\\/Brussels","type":"sensor-data","write_ts":1.549399287465E9},"data":{"accuracy":15.048,"altitude":-4.099999904632568,"bearing":0,"elapsedRealtimeNanos":2419284455000000,"filter":"time","fmt_time":"5 f\xe9vr. 2019 9:41:27 PM","latitude":37.3910475,"longitude":-122.0864541,"sensed_speed":0,"ts":1.549399287295E9}}\n'

In [26]: encoded_bytes = open("/tmp/utf-8.txt").read().encode("latin1")

In [28]: open("/tmp/iso-8859.txt", "wb").write(encoded_bytes)
Out[28]: 407

The file still shows the fmt_time as 5 févr. 2019 9:41:27 PM

@shankari
Copy link
Contributor

ok, that's just vim being smart and figuring out the encoding automatically. If I use cat instead, I get

{"metadata":{"key":"background\\/location","platform":"android","read_ts":0,"time_zone":"Europe\\/Brussels","type":"sensor-data","write_ts":1.549399287465E9},"data":{"accuracy":15.048,"altitude":-4.099999904632568,"bearing":0,"elapsedRealtimeNanos":2419284455000000,"filter":"time","fmt_time":"5 f�vr. 2019 9:41:27 PM","latitude":37.3910475,"longitude":-122.0864541,"sensed_speed":0,"ts":1.549399287295E9}}

@shankari
Copy link
Contributor

ok, so the previous iso-8859.txt file that I created by copy-pasting from the raw binary is actually wrong. This is the correct one.
iso-8859.txt

When I try to load it directly, I still get an error

In [30]: json.load(open("/tmp/iso-8859.txt"))
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-30-ef49a420b762> in <module>()
----> 1 json.load(open("/tmp/iso-8859.txt"))
...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 297: invalid continuation byte

but if I try to load it with the proper encoding it works fine

In [31]: json.load(open("/tmp/iso-8859.txt", encoding='latin1'))
Out[31]:
{'data': {'accuracy': 15.048,
  'altitude': -4.099999904632568,
  'bearing': 0,
  'elapsedRealtimeNanos': 2419284455000000,
  'filter': 'time',
  'fmt_time': '5 févr. 2019 9:41:27 PM',
  'latitude': 37.3910475,
  'longitude': -122.0864541,
  'sensed_speed': 0,
  'ts': 1549399287.295},
 'metadata': {'key': 'background\\/location',
  'platform': 'android',
  'read_ts': 0,
  'time_zone': 'Europe\\/Brussels',
  'type': 'sensor-data',
  'write_ts': 1549399287.465}}

and the bottle trick of decoding and encoding also works

In [32]: json.loads(open("/tmp/iso-8859.txt", mode="rb").read().decode('latin1').encode(
    ...: 'utf-8'))
Out[32]:
{'data': {'accuracy': 15.048,
  'altitude': -4.099999904632568,
  'bearing': 0,
  'elapsedRealtimeNanos': 2419284455000000,
  'filter': 'time',
  'fmt_time': '5 févr. 2019 9:41:27 PM',
  'latitude': 37.3910475,
  'longitude': -122.0864541,
  'sensed_speed': 0,
  'ts': 1549399287.295},
 'metadata': {'key': 'background\\/location',
  'platform': 'android',
  'read_ts': 0,
  'time_zone': 'Europe\\/Brussels',
  'type': 'sensor-data',
  'write_ts': 1549399287.465}}

So the following one line fix is a cheap fix that we can fall back to...

-            b = self._get_body_string()
+            b = self._get_body_string().decode('latin1').encode('utf-8')

@shankari
Copy link
Contributor

Tried it with force sync.

/usercache/put was now successful.

START 2019-03-26 14:44:05.756223 POST /usercache/put
END 2019-03-26 14:44:05.800609 POST /usercache/put 39b0541a-79fa-4655-a6ec-d9fe09944912 0.044220924377441406
2019-03-26 14:44:05,759:DEBUG:123145509371904:START POST /usercache/put
2019-03-26 14:44:05,766:DEBUG:123145509371904:methodName = skip, returning <class 'emission.net.auth.skip.SkipMethod'>
2019-03-26 14:44:05,766:DEBUG:123145509371904:Using the skip method to verify id token test_french_feb of length 15
2019-03-26 14:44:05,768:DEBUG:123145509371904:retUUID = 39b0541a-79fa-4655-a6ec-d9fe09944912

and in fact, the data is in the database

In [2]: from uuid import UUID

In [3]: test_uuid = UUID("39b0541a-79fa-4655-a6ec-d9fe09944912")

In [4]: import emission.storage.timeseries.cache_series as esdc

In [5]: data_list = esdc.find_entries(test_uuid, time_query=None)

In [6]: len(data_list)
Out[6]: 68

In [8]: location_entries = [e for e in data_list if e["metadata"]["key"] == "background/
   ...: location"]


In [10]: [e["data"]["fmt_time"] for e in location_entries]
Out[10]:
['5 févr. 2019 9:41:27 PM',
 '5 févr. 2019 9:41:57 PM',
 '26 mars 2019 8:42:32 PM',
 '26 mars 2019 8:43:02 PM',
 '26 mars 2019 8:43:32 PM',
 '26 mars 2019 8:44:02 PM',
 '26 mars 2019 8:44:31 PM',
 '26 mars 2019 8:45:01 PM',
 '26 mars 2019 8:45:32 PM',
 '26 mars 2019 8:46:01 PM',
 '5 févr. 2019 9:46:28 PM']

@shankari
Copy link
Contributor

ok, so the server communication is in drum roll (https://github.com/e-mission/cordova-server-communication).

Looking more closely at the code, the only method that is really called externally (from
https://github.com/e-mission/cordova-server-communication/blob/master/www/servercomm.js) is pushGetJSON

pushGetJSON uses the legacy apache HTTP library (https://github.com/e-mission/cordova-server-communication/blob/master/src/android/CommunicationHelper.java#L82, https://hc.apache.org/)

And yes, the default encoding is 8859-1

https://hc.apache.org/httpclient-legacy/charencodings.html

The request or response body can be any encoding, but by default is ISO-8859-1.

We can set the charset in the content-type header

The encoding may be specified in the Content-Type header, for example:
Content-Type: text/html; charset=UTF-8
You can set the content type header for a request with the addRequestHeader method in each method

But we currently don't.
https://github.com/e-mission/cordova-server-communication/blob/master/src/android/CommunicationHelper.java#L84

So it sounds like the simple fix is to just set it there :)

@shankari
Copy link
Contributor

We should also seriously consider upgrading this plugin. Even the apache HTTP components are currently at 4.5, and the legacy version that we are using is EOL. And there are new android-specific options such as volley.

However, this is not the biggest priority right now (as it wasn't 3 years ago e-mission/cordova-server-communication@0ff1f05#diff-b45f027217ab8fcefc52d843dfc2a781) so will punt again.

But if it turns out that we touch this code more frequently (e.g. every 6 mos or so), we should revisit that decision.

@shankari
Copy link
Contributor

Retrying with the fix; again, we have a mix of entries, but we do have some with févr

sqlite> select data from userCache where key="background/filtered_location";            
{"accuracy":15.058,"altitude":-4.099999904632568,"bearing":0.0,"elapsedRealtimeNanos":2431578187000000,"filter":"time","fmt_time":"6 févr. 2019 1:06:21 AM","latitude":37.391058,"longitude":-122.086473,"sensed_speed":0.0,"ts":1.549411581027E9}
{"accuracy":14.898,"altitude":-4.099999904632568,"bearing":0.0,"elapsedRealtimeNanos":2431608375000000,"filter":"time","fmt_time":"6 févr. 2019 1:06:51 AM","latitude":37.3910585,"longitude":-122.0864731,"sensed_speed":0.0,"ts":1.549411611215E9}
{"accuracy":14.871,"altitude":-4.099999904632568,"bearing":0.0,"elapsedRealtimeNanos":2431638387000000,"filter":"time","fmt_time":"6 févr. 2019 1:07:21 AM","latitude":37.3910585,"longitude":-122.0864731,"sensed_speed":0.0,"ts":1.549411641227E9}
{"accuracy":14.983,"altitude":-4.099999904632568,"bearing":0.0,"elapsedRealtimeNanos":2431669951000000,"filter":"time","fmt_time":"6 févr. 2019 1:07:52 AM","latitude":37.3910586,"longitude":-122.0864732,"sensed_speed":0.0,"ts":1.549411672791E9}
{"accuracy":96.0,"altitude":21.0,"bearing":0.0,"elapsedRealtimeNanos":2431701595115718,"filter":"time","fmt_time":"27 mars 2019 12:08:25 AM","latitude":37.3910394,"longitude":-122.0865538,"sensed_speed":0.0,"ts":1.553641705E9}
{"accuracy":15.0,"altitude":26.0,"bearing":0.0,"elapsedRealtimeNanos":2431731591314977,"filter":"time","fmt_time":"27 mars 2019 12:08:55 AM","latitude":37.3910463,"longitude":-122.0865623,"sensed_speed":0.0,"ts":1.553641735E9}
{"accuracy":15.083,"altitude":-4.099999904632568,"bearing":0.0,"elapsedRealtimeNanos":2431760657000000,"filter":"time","fmt_time":"6 févr. 2019 1:09:23 AM","latitude":37.3910586,"longitude":-122.0864732,"sensed_speed":0.0,"ts":1.549411763497E9}
{"accuracy":30.0,"altitude":26.0,"bearing":0.0,"elapsedRealtimeNanos":2431792598220996,"filter":"time","fmt_time":"27 mars 2019 12:09:56 AM","latitude":37.3910553,"longitude":-122.0865681,"sensed_speed":0.0,"ts":1.553641796E9}
{"accuracy":24.0,"altitude":26.0,"bearing":0.0,"elapsedRealtimeNanos":2431822598976088,"filter":"time","fmt_time":"27 mars 2019 12:10:26 AM","latitude":37.3910554,"longitude":-122.0865628,"sensed_speed":0.0,"ts":1.553641826E9}
{"accuracy":19.0,"altitude":28.0,"bearing":0.0,"elapsedRealtimeNanos":2431850568434046,"filter":"time","fmt_time":"27 mars 2019 12:10:54 AM","latitude":37.3910573,"longitude":-122.0865693,"sensed_speed":0.0,"ts":1.553641854E9}
{"accuracy":15.267,"altitude":-4.099999904632568,"bearing":0.0,"elapsedRealtimeNanos":2431879920000000,"filter":"time","fmt_time":"6 févr. 2019 1:11:22 AM","latitude":37.3910578,"longitude":-122.086473,"sensed_speed":0.0,"ts":1.54941188276E9}

We revert the workaround

$ git checkout emission/net/api/bottle.py

and force sync aaaannnndddd... the fix didn't work.

on the phone

02-06 01:15:56.268 25619-25683/edu.berkeley.eecs.emission D/ConnectionSettings: in getConnectURL, returning http://10.0.0.7:8080
02-06 01:15:56.269 25619-25683/edu.berkeley.eecs.emission I/System.out: Posting data to http://10.0.0.7:8080/usercache/put
02-06 01:15:56.329 25619-25683/edu.berkeley.eecs.emission I/System.out: Got response org.apache.http.message.BasicHttpResponse@f6ff8d6 with status HTTP/1.1 400 Bad Request
02-06 01:15:56.335 25619-25683/edu.berkeley.eecs.emission I/CommunicationHelper: Got response org.apache.http.message.BasicHttpResponse@f6ff8d6 with statusHTTP/1.1 400 Bad Request
02-06 01:15:56.342 25619-25683/edu.berkeley.eecs.emission E/ServerSyncAdapter: IO Error java.io.IOException while posting converted trips to JSON

@shankari
Copy link
Contributor

so it turns out that that in addition to the HTTP headers, we also need to set the charset of the string entity that we use to generate the string body of the HTTP request.
https://github.com/e-mission/cordova-server-communication/blob/master/src/android/CommunicationHelper.java#L89

Changing all the instances of msg.setEntity(new StringEntity(....toString())); to

StringEntity se = new StringEntity(....toString())
msg.setEntity(se);

seems to fix the problem.

On the phone

02-06 01:37:10.860 5140-5484/edu.berkeley.eecs.emission D/ConnectionSettings: in getConnectURL, connectionSettings = {"connectUrl":"http:\/\/10.0.0.7:8080","android":{"auth":{"method":"prompted-auth","clientID":"ignored"}},"ios":{"auth":{"method":"prompted-auth","clientID":"ignored"}}}
02-06 01:37:10.863 5140-5484/edu.berkeley.eecs.emission D/ConnectionSettings: in getConnectURL, returning http://10.0.0.7:8080
02-06 01:37:10.864 5140-5484/edu.berkeley.eecs.emission I/System.out: Posting data to http://10.0.0.7:8080/usercache/put
02-06 01:37:14.639 5140-5484/edu.berkeley.eecs.emission I/System.out: Got response org.apache.http.message.BasicHttpResponse@f40d92d with status HTTP/1.1 200 OK
02-06 01:37:14.650 5140-5484/edu.berkeley.eecs.emission I/CommunicationHelper: Got response org.apache.http.message.BasicHttpResponse@f40d92d with statusHTTP/1.1 200 OK

on the server

2019-03-26 16:37:15,299:DEBUG:123145562148864:START POST /usercache/put
2019-03-26 16:37:15,299:DEBUG:123145562148864:Called userCache.put
2019-03-26 16:37:15,302:DEBUG:123145562148864:methodName = skip, returning <class 'emiss
ion.net.auth.skip.SkipMethod'>
2019-03-26 16:37:15,303:DEBUG:123145562148864:Using the skip method to verify id token t
est_french_feb of length 15
2019-03-26 16:37:15,304:DEBUG:123145562148864:retUUID = 39b0541a-79fa-4655-a6ec-d9fe0994
4912
2019-03-26 16:37:15,304:DEBUG:123145562148864:user_uuid 39b0541a-79fa-4655-a6ec-d9fe0994
4912
2019-03-26 16:37:15,307:DEBUG:123145562148864:Updated result for user = 39b0541a-79fa-46
55-a6ec-d9fe09944912, key = stats/client_nav_event, write_ts = 1549411452.059 = {'n': 1,
 'nModified': 0, 'upserted': ObjectId('5c9ab7ab40ff1ac9196893da'), 'ok': 1.0, 'updatedEx
isting': False}
...
2019-03-26 16:37:15,332:DEBUG:123145562148864:Updated result for user = 39b0541a-79fa-46
55-a6ec-d9fe09944912, key = statemachine/transition, write_ts = 1549411883.789 = {'n': 1
, 'nModified': 0, 'upserted': ObjectId('5c9ab7ab40ff1ac919689422'), 'ok': 1.0, 'updatedE
xisting': False}
2019-03-26 16:37:15,333:DEBUG:123145562148864:END POST /usercache/put 39b0541a-79fa-4655
-a6ec-d9fe09944912 0.041570186614990234

@shankari
Copy link
Contributor

shankari commented Mar 26, 2019

@PatGendre since you are building your own app, do you want to try out the fix first?
The patch is attached
utf-8-fix.patch.gz

Alternatively, I can just check this in and bump up the version number, so when you rebuild, you will get the fix. I'm going to wait a bit to push a new emTripLog release with this fix because the Indian team is starting their survey; they just finished the pilot testing and they are going to be paranoid about any change to the app.

And this fix doesn't really affect them.

Let me know if you have any concerns.

@shankari
Copy link
Contributor

@PatGendre Great catch BTW. This was such a corner case if you consider English and Spanish-speaking countries, but such an important component if you consider the rest of the world, specially countries with non-latin alphabets.

@shankari shankari changed the title absence of location data in mongo Ensure that data is transmitted at UTF-8 Mar 27, 2019
@shankari shankari changed the title Ensure that data is transmitted at UTF-8 Ensure that data is transmitted as UTF-8 Mar 27, 2019
@shankari
Copy link
Contributor

shankari commented Mar 27, 2019

Let's briefly check to see whether there is a similar issue with iOS. Here's where we communicate with the server in iOS.

https://github.com/e-mission/cordova-server-communication/blob/24309f837d0de238ce9b0f0ea58c416971d12ed7/src/ios/BEMCommunicationHelper.m#L159

and it's true that we don't set UTF-8 as the encoding.

    [request setValue:@"application/json"
        forHTTPHeaderField:@"Content-Type"];

But to fill in the data, we use

    NSData *jsonData = [NSJSONSerialization dataWithJSONObject:self.mJsonDict
                                                       options:kNilOptions
error:&parseError];

and that will return UTF-8
https://developer.apple.com/documentation/foundation/nsjsonserialization/1413636-datawithjsonobject

JSON data for obj, or nil if an internal error occurs. The resulting data is encoded in UTF-8.

So I should put in the charset=UTF-8 to be on the safe side, but I bet this works just fine on iOS.

@PatGendre are any of you testing with iPhone just to confirm?

@PatGendre
Copy link
Contributor Author

Hi @shankari , great debug job :-) Internationalisation is definitely an art!
I will check with Yann for patching the plugin but as the next problematic month for us is August (août) we can safely wait for the next emtripLog release.
As for iOS, no we don't currently test on iPhones. The FabMob association has just asked for a Apple account so we may compile the app for iOS and test at the earliest next week.

@shankari
Copy link
Contributor

Testing this assumption (#333 (comment)) on iOS before I finalize this fix:

So I should put in the charset=UTF-8 to be on the safe side, but I bet this works just fine on iOS.

@shankari
Copy link
Contributor

I have to test this on a physical phone, since the iOS emulator doesn't appear to allow you to set the date and time. After setting this, deploying the phone and manually starting a trip, I got locations whose dates were displayed in French while logging.

2019-02-16 16:01:13.260 emission[31002:4362240] Recieved location <+37.39097075,-122.08645784> +/- 10.00m (speed 0.66 mps / course 201.80) @ 16/02/2019 16:01:12 heure normale du Pacifique nord-américain, ongoing trip = true

However, when the trip ended and the data was synced, the fmt_time sent to the server did not have any words, so no accents would apply.

In [11]: edb.get_uuid_db().find_one({"uuid": test_uuid})
Out[11]:
{'_id': ObjectId('5cb901d4be04ee70f5724f5a'),
 'update_ts': datetime.datetime(2019, 4, 18, 16, 1, 40, 755000),
 'user_email': 'test_french_fevr',
 'uuid': UUID('bd030408-ac83-4f02-873c-61e1cbd51cf8')}

In [12]: for e in list(estcs.find_entries(test_uuid, ["background/location"])):
    ...:     print(e["data"]["fmt_time"])
    ...:
2019-02-16T16:01:08-0800
2019-02-16T16:01:12-0800
2019-04-05T07:11:14-0700

So I will set this to UTF-8 anyway, but I don't think it is an issue on iOS.

@shankari
Copy link
Contributor

shankari commented Apr 19, 2019

@PatGendre were you able to test with the patch? I actually feel pretty comfortable releasing these changes based on my testing, and then you can just release the plugin. But once you have built with them, it would be good if you could reset your phone and test before August. If there is something subtly broken, I don't want to find out once it is already broken and then scramble to fix it.

@shankari
Copy link
Contributor

shankari commented May 9, 2019

on iOS, changed to content header to be UTF-8 anyway and tested to ensure that there was no regression. There was not.

on the phone

2019-05-08 18:09:57.750433-0700 emission[3117:11140354] postToHost called with url = /usercache/get -- http://localhost:8080
2019-05-08 18:09:57.750614-0700 emission[3117:11140354] DEBUG: postToHost called with url = /usercache/get -- http://localhost:8080

on the server

START 2019-05-08 18:10:00.084269 POST /usercache/get
END 2019-05-08 18:10:00.149964 POST /usercache/get 49e72fc4-f1d8-4f4d-9666-57dc597d1133 0.06560707092285156

shankari added a commit to shankari/cordova-server-communication that referenced this issue May 9, 2019
On android, unless we explicitly serialize the data as UTF-8, it is serialized as ISO-8851.
On iOS, it is serialized as UTF-8 by default.

In both, it is probably best to explicitly mark the content as UTF-8 in the HTTP header

Bump up the version number to match

This fixes e-mission/e-mission-docs#333

Testing done:
- on android: set the locale to French, set the date to Fevr and synced data
    - before this change, it did not work
    - after this change, it works
- on iOS, set the locale to French, set the date to Fevr and synced data
    - the formatted data did not have any words so even before this fix, it worked
    - this did not cause a regression
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants