Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update time zone in South Korea adapter #1102

Merged
merged 3 commits into from
Apr 12, 2024
Merged

Conversation

majesticio
Copy link
Contributor

Fix to correctly assign timezone for the air-korea adapter. Time was being incorrectly set as UTC from the source when the source datetimes are reported in Asia/Seoul timezone.

Fix to be merged after datetime correction for measurements already recorded

Copy link
Collaborator

@caparker caparker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I think that I have figured out the mystery of the 60% coverage.
Recall that the db says there are ~ 1300 locations
And that when you run the adapter as is, it tells us that it found about 1300 stations
But at the same time when you look at the count for each parameter you should notice that it only counts about 650 for each paramter.
So where is the rest of the data? I think it comes down to this part

  return {
        location: station.STATION_ADDR,
        city: '',
        coordinates: {
            latitude: parseFloat(station.DM_Y),
            longitude: parseFloat(station.DM_X),
        },

You can see that you are using the STATION_ADDR as the location name. And then when we count of the number of locations we see 1300ish unique STATION_ADDR values. However, if you switch to using STATION_NAME that number falls to about 650

651 locations from 2024-04-10T16:00:00.000Z - 2024-04-10T16:00:00.000Z | Parameters for korea-air 
{"co":{"count":645,"errors":0,"max":1.26,"min":0,"nulls":0},
"no2":{"count":648,"errors":0,"max":0.0458,"min":0,"nulls":0},
"o3":{"count":645,"errors":0,"max":0.0797,"min":0,"nulls":0},
"pm10":{"count":646,"errors":0,"max":106,"min":0,"nulls":0},
"pm25":{"count":642,"errors":0,"max":39,"min":0,"nulls":0},
"so2":{"count":643,"errors":0,"max":0.0193,"min":0,"nulls":0}}

So my guess is that if we were to look at the coordinates we would see the same thing, about 650 unique sets of coordinates. So what is happening is that we are splitting the 650 data points across 1300 sensor nodes (locations) that were created based on the locations (addresses) used here. But then each time we ingest we are matching the locations based on the coordinates and not the location name. And so we are likely switching back and forth between 2ish different nodes for each set of unique coordinates.

@majesticio
Copy link
Contributor Author

Your assessment appears to be correct:

  • 657 unique physical stations were identified
  • Each unique station has ~2 associated addresses

Examples of locations with more than one address:

Coordinates: 37.564639,126.975961
Names: 중구
Addresses: 
- 서울 중구 덕수궁길 15 시청서소문별관 3동
- 서울 중구 덕수궁길 15
Total Entries: 6
---
Coordinates: 37.549389,126.971519
Names: 한강대로
Addresses: 
- 서울 용산구 한강대로 405 (서울역 앞)
- 서울 용산구 한강대로 405
Total Entries: 6

I have updated the adapter to use STATION_NAME which will return the correct number of locations.

@caparker caparker merged commit 7674b65 into main Apr 12, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants