Datasets featuring global, high-level flight schedules per aircraft, extracted from ADS-B position reports.
Published per quarter of a year, starting from 2024+ onwards. Covers all flights as long as within coverage of the ADSBlol initiative.
- This project uses the ADS-B data from the ADSBlol initiative. Consider supporting their great project.
- This project uses validation data from vradarserver/Andrew Whewell to check extracted routes with additional route data (based on aircraft callsign). Again, consider supporting this initiative.
Each day, ADSBlol publishes ADS-B data in two versions: prod-0 and staging-0. The largest file (by file size) is selected for each day.
After extracting the data, only the 'full' ADS-B transmissions are retained — approximately 1 out of every 4 transmissions. This ensures that processing all cumulative data for each quarter of a year remains feasible:
See the Releases section of this repository for a parquet file with the flights per aircraft, per quarter of a year.
The parquet filetype has been selected to keep flights data manageable in terms of size and processing/loading times. Each quarter features approx. 10-12+ million flights and ~500,000 aircraft, which in csv format would total approx. 3 GB. Hence the selection of a parquet filetype, which stays far below 1 GB. Loading a parquet file is very straightforward with python:
df = pandas.read_parquet('2024_Q1.parquet')
Furthermore, to check the parquet dataset without python, you can use tools like ParquetViewer which feature a user interface/GUI and can be installed on Windows as exe.
The data is published per quarter of a year. The 4 quarters of each year feature some overlap to ensure no flights are incomplete (not cut in half).
Given potentially limited ADS-B reception coverage of the ADSBlol initiative in certain continents, some aircraft tracks start after the airport of origin or end before the airport of destination. For those cases, the flights data has been enhanced by looking up the aircraft flight callsign and matching it with the open-source aircraft callsign vs route dataset of vradarserver/Andrew Whewell.
Given ADS-B transmissions simply sending location data, wrong location data as a result of GPS spoofing can also be transmitted. Once more, the added column with callsign vs route lookup allows to filter out those flights where aircraft emitted wrong position data.
Status Q2 2024 Number of receivers/antennas of ADSBlol initiative (image above)
Aircraft coverage of ADSBlol initiative. Time of day ~13:00 UTC to have reasonable ops in all continents - no midnight situation in major markets (image above)
Given the fact that ADSBlol coverage improves regularly, validation of the extracted flights is a never finished task, especially given the global scope.
At present, each quarter of extracted flights features approx. 10-13+ million flights and ~500,000 aircraft.
Validation Case Study - AMS/EHAM Reference Day 2024-06-14
-
Number of (commercial!) flights extracted from ADS-B data vs # flights from AMS schedule --> significantly close, within 5% error margin
-
Airline representation --> significantly close, within 5% error margin
-
Destination/origin of a flight accurate 73% of time purely based on ADS-B track data, improved to 95+% by using callsign vs route lookup
Please use in line with the license defined in this repository. No guarantee, no liability, no warranty. All open-source.
This concerns high-level/approximated RWY times in UTC, so lift-off time for departures and touchdown time for arrivals. This is generally reference to the first 'ground' entry for arrivals, and the last 'ground' entry for departures. However, there can also be cases with more limited ADS-B coverage, where the track does not start or stop at the airport:
For those cases, the beginning/end of the track has been selected as the time of the flight. For further implications, see section below.
Similar to the section above, for those cases where the track does not start or stop at the airport, multiple airports in the vicinity of the first/last position of the ADS-B track have been listed as options. To nevertheless determine the plausible airport of origin/destination, validation data from vradarserver/Andrew Whewell has been included to match the aircraft flight callsign with external route data.
In case of go-arounds/touch-and-go/balked landings, only the final touchdown is counted as touchdown time of the flight (with commercial flights in mind).