-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
As air quality expert, I would like to check if certain air quality data from the Province of Bolzano is correctly imported in the Open Data Hub #668
Comments
@dulvui today I got the information that the Data Provider (APPABZ) has fixed an issue, now the data should be available always as UTC+1. Can you please check? |
@rcavaliere For opendata I simplified now the timestamp conversion and it should be correct now. The sync triggers every morning at 10, so lets see tomorrow if the issues are solved. |
There is still missing data for some days. I will setup now some logging to understand better if we loose the data or if the API doesn't update the data |
@rcavaliere I just saw now that there might be a problem with the data provider, since the values for Egna are -1, and in that case values are not valid and get ignored by the data collector. Here an example response with -1 value, since this will change if called on another day. [{"DATE":"2024-06-23T12:00:00+01:00","SCODE":"ML5","MCODE":"CO","TYPE":"1","VALUE":-1},{"DATE":"2024-06-23T12:00:00+01:00","SCODE":"ML5","MCODE":"NO2","TYPE":"1","VALUE":-1}] I added now specific logging for the case that the value is -1, so I can see for which stations/data types this happens |
@dulvui are there any further news here? |
@rcavaliere But this graphs shows the data on a daily basis, and we are importing the data of every hour. I think that makes a difference and if I check here http://dati.retecivica.bz.it/services/airquality/sensors I can find may sensors with value -1 A possible problem could be that we get the data too early, and the data is not ready yet. I checked the cron job and the it runs every day at 10:00 UTC, so 12:00 at lunchtime local time, so it could be that not everyday the data is ready at this time. We are using the following endpoints |
@dulvui I think that's the issue, we are making too few API calls and I think the data is not there or not there anymore. Can we simply put the frequency at 10 minutes? For us it won't change nothing, if there is no new data we don't store nothing |
@rcavaliere yes I'll try that now |
@dulvui did you manage to find out in this example what happened? |
@rcavaliere no I still don't know why this happens |
@rcavaliere I'm trying to look into this. But right now for me too it looks like the API is just posting garbage: [
{
"DATE": "2024-09-19T16:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": 32.504
},
{
"DATE": "2024-09-19T17:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": 34.3841
},
{
"DATE": "2024-09-19T18:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": 36.2961
},
{
"DATE": "2024-09-19T19:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": 38.1125
},
{
"DATE": "2024-09-19T20:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": 29.9547
},
{
"DATE": "2024-09-19T21:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": 37.5708
},
{
"DATE": "2024-09-19T22:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": 30.6239
},
{
"DATE": "2024-09-19T23:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": 28.6163
},
{
"DATE": "2024-09-20T00:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": 29.8591
},
{
"DATE": "2024-09-20T01:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": 21.2551
},
{
"DATE": "2024-09-20T02:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": 18.2277
},
{
"DATE": "2024-09-20T03:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": 22.2748
},
{
"DATE": "2024-09-20T04:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": 21.51
},
{
"DATE": "2024-09-20T05:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": 28.7119
},
{
"DATE": "2024-09-20T06:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": 32.6315
},
{
"DATE": "2024-09-20T07:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": 32.8864
},
{
"DATE": "2024-09-20T08:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-20T09:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-20T10:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-20T11:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-20T12:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-20T13:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-20T14:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-20T15:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
}
] You can clearly see that at 8 AM (as Simon already found) it just stops posting data (it's -1 which gets ignored). |
https://ambiente.provincia.bz.it/aria/misurazione-attuale-aria.asp?air_actn=4&air_station_code=ML5&air_type=1 |
@clezag this is way data could be (manually) invalidated by air quality experts, therefore it's like this. But I remember that we have cases in which the data visualized there is present, while in the Open Data Hub is not. I can find again these examples, but unfortunately we can not check in the past what happened... |
[
{
"DATE": "2024-09-20T12:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-20T13:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-20T14:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-20T15:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-20T16:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-20T17:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-20T18:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-20T19:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-20T20:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-20T21:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-20T22:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-20T23:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-21T00:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-21T01:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-21T02:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-21T03:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-21T04:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-21T05:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-21T06:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-21T07:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-21T08:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-21T09:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-21T10:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
},
{
"DATE": "2024-09-21T11:00:00+01:00",
"SCODE": "ML5",
"MCODE": "NO2",
"TYPE": "1",
"VALUE": -1
}
] It's still like this. |
@clezag yes this is not surprising. We are speaking here of validated data, done by humans. People don't work on week-ends, so next update in relation to Friday data will take place at the beginning of next week. I think that it is because of these delays that we somehow miss data... |
@rcavaliere I think this is our issue. This is because the endpoint we are using only goes back 24h, and if they take longer than that to validate, we have a hole. I don't think we can solve this on our end, if not with an additional endpoint or parameter that allows us to go further back than 24h |
@clezag thanks for checking more! But what does it mean "goes back 24h"? Shouldn't we consider the last available data record in the DB and then import the new data available, sorting by timestamp? |
@rcavaliere The endpoint we are using always returns the data of the last 24h. If the validation window is longer than that, we never receive the data that is older than 24h. |
@clezag probably you are right, this could be the reason. Let me evaluate this with the Data Provider... |
Update: waiting for APPABZ explaining the data inconsistencies found |
@clezag APPABZ has now published on their end-point the data of the last 10 days (see e.g. https://dati.retecivica.bz.it/services/airquality/timeseries?station_code=ML5&meas_code=NO2&type=1). Let's see if this will solve the issue! |
There are still some issues, our graphs in analytics do not reflect the 10 day JSON we get today. The time series writer API only accepts records if their timestamp is newer than the last one, depending on how data gets validated, this could be the reason (e.g. if they first release the updated data of sunday, and afterwards saturday, we never store saturday's data). I will look into it. |
I've set up a data collector that polls that endpoint (station ML5) every hour, having the full raw data history should help us finding out what's going on |
With the raw data available, I checked again, and turns out it is indeed a bug on our side. |
The BrennerLEC partners have noticed that specifically on the ML5 air quality stations of the Province ("A22 Egna - A22, corsia sud km 103") we have frequent data holes, which should not appear.
Check for example: link
Generally, the open data (period = 3600) the data seem to have timestamp UTC + 2, while it should UTC +1.
Affected Data Collectors to check:
https://github.com/noi-techpark/bdp-commons/tree/main/data-collectors/environment-appa-bz-tenminutes
https://github.com/noi-techpark/bdp-commons/tree/main/data-collectors/environment-appa-bz-opendata
The text was updated successfully, but these errors were encountered: