Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New data: Italy #712

Open
espenairmine opened this issue Apr 21, 2020 · 5 comments
Open

New data: Italy #712

espenairmine opened this issue Apr 21, 2020 · 5 comments

Comments

@espenairmine
Copy link
Contributor

Find the right sources, use and then close
#670
#449
#420
#415
#303

Remove duplicates - #710

@espenairmine
Copy link
Contributor Author

@magsyg

@magsyg
Copy link
Contributor

magsyg commented Apr 29, 2020

Italy is currently being sourced from Arpa, for some regions only, hence lacking a lot of data.
EEA covers most all of Italy, and includes all local sources ( from Arpa ) except Sicilia.
I propose to add EEA as a filler source for Italy, and add more local sources (from Arpa) in addition for a more precise data collection when we have time.
There are 3 PRs for local Arpa sources for italy: #721 , #720, #366 #716
which we should use as a first approach.
The current sources should be kept, because EEA can be a bit unstable at times.
Where there is overlap, the sources will be filtered according to the following process:
Current sources - keep all locations
EEA - remove locations currently loaded from Arpa
For adding additional Arpa sources, EEA filterlist need to updated, meaning the Arpa sources should take priority

@sruti
Copy link
Contributor

sruti commented May 1, 2020

Thanks for linking up all the remaining issues, and for looking into the various Italy adapters! I'm thinking we go with a different approach.

Using the script I detailed in #710, I dug into which stations are likely to be duplicates and if we would lose any stations by disabling current adapters and switching to EEA (given the battuta issue is fixed, otherwise it's ~550 new stations):

New EEA Italy stations: 772 
Existing Italy stations: 104
Inactive stations: 6
Similar coordinates (diffThreshold: 0.00001): 14
[
  { new: '41.768188999999985,12.237048000000001', existing: '41.76819,12.23705' },
  { new: '41.73,13.338330000000001', existing: '41.73,13.33833' },
  { new: '41.77484900000001,12.223413', existing: '41.77485,12.22341' },
  { new: '42.137339999999995,11.79316', existing: '42.13734,11.79316' },
  { new: '42.159949999999995,11.74263', existing: '42.15995,11.74263' },
  { new: '42.102159999999984,11.784360000000001', existing: '42.10216,11.78436' },
  { new: '42.0989,11.81769', existing: '42.0989,11.81769' },
  { new: '42.081824999999995,11.809336', existing: '42.08183,11.80934' },
  { new: '42.07361,11.81592', existing: '42.07361,11.81592' },
  { new: '42.16096999999999,11.90002', existing: '42.16097,11.90002' },
  { new: '42.15223,11.93583', existing: '42.15223,11.93583' },
  { new: '42.26856,11.91091', existing: '42.26856,11.91091' },
  { new: '42.09704999999999,11.788350000000001', existing: '42.09705,11.78835' },
  { new: '42.086802999999996,11.806498000000001', existing: '42.0868,11.8065' }
]

If we increase the diffThreshold to 0.001, the number of stations with similar coordinates increases to 80 (didn't list all of them here). Looking at the numbers, it seems likely most of these are the same station and would be grouped together by the unique ID:

  { new: '44.842499999999994,11.61306', existing: '44.8425,11.6131' },
  { new: '44.82389,9.830279999999998', existing: '44.8239,9.8304' },
  { new: '44.63604999999999,10.90473', existing: '44.637,10.9057' },
  { new: '41.94749999999999,12.46972', existing: '41.94745,12.46959' },
  { new: '41.88306,12.508890000000001', existing: '41.88306,12.50894' },
  { new: '42.42194,12.10917', existing: '42.42206,12.10913' },
  { new: '41.595278,12.653611', existing: '41.59534,12.65358' },
  { new: '42.40417,12.85833', existing: '42.40409,12.85822' },
  { new: '41.46388900000001,12.913056', existing: '41.46402,12.91304' },
  { new: '41.75,13.149721999999999', existing: '41.75,13.14968' },
  { new: '41.768188999999985,12.237048000000001', existing: '41.76819,12.23705' },
  { new: '41.57,13.33722', existing: '41.57,13.33719' },
  { new: '42.157778,11.908611000000002', existing: '42.15774,11.90874' },
  { new: '42.091666999999994,11.8025', existing: '42.09163,11.80247' },
  { new: '42.091667,11.8025', existing: '42.09163,11.80247' },
  { new: '41.99555999999999,12.72639', existing: '41.99568,12.72637' },
  { new: '41.730833,13.004444', existing: '41.73084,13.00435' },
  { new: '41.725,13.009444000000002', existing: '41.72501,13.00957' },
  { new: '44.48333,11.355000000000002', existing: '44.4836,11.355' },
  { new: '44.42861,12.18667', existing: '44.4278,12.1865' },
  { new: '44.51611099999999,10.733889', existing: '44.5162,10.7339' },
  { new: '41.88944399999999,12.266389', existing: '41.88944,12.2663' },
  { new: '41.93277799999999,12.506944', existing: '41.93287,12.50697' },
  { new: '41.85777799999999,12.568611000000002', existing: '41.85772,12.56866' },
  { new: '42.57249999999999,12.961944', existing: '42.57259,12.96198' },

To double check, I mapped all the coordinates to see where there aren't overlaps and it looks like EEA covers pretty much all.
Red - existing stations, Green - new EEA stations, purple - inactive stations:
Screen Shot 2020-05-01 at 2 10 14 PM
Screen Shot 2020-05-01 at 2 11 12 PM
Screen Shot 2020-05-01 at 2 14 56 PM

@sruti
Copy link
Contributor

sruti commented May 1, 2020

Based on that, I would say let's disable current adapters, and add Italy through EEA. And then add in local sources if there are gaps. For Italy at least, EEA is more reliable than the current sources/adapters and it's easier to manage the 1 source instead of multiple adapters.

@espenairmine
Copy link
Contributor Author

@sruti - Thanks for good feedback.!

Can you clarify ".. would be grouped together by the unique ID"?
Does that mean that there exist a uniqe ID <= > lat/lon relationship?

What happens if sources A and B give a measurement for the same lat/lon +time? ( assuming A updates first then B)
1 - Update B will be discarded
2 - Update B will override value from A
3 - There will be two observations in the DB, having the same lat/lon/time

The answer above will have implications to how we treat multiple sources for the same country, with overlapping data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants