Skip to content

Latest commit

 

History

History
86 lines (48 loc) · 9.84 KB

documentation.md

File metadata and controls

86 lines (48 loc) · 9.84 KB

Description of R package hbGPS

Note: This is not a practical guide on how to use hbGPS, but a description of every step in the algorithm intended as a high level summary of what hbGPS does.

(in function hbGPS.R)

Function hbGPS forms the interface to all hbGPS package functionality. A typical user is not expected to interact with the other functions in the package. It internally loops over the available GPS files in csv format and at each iteration of the loop one GPS file is load. Given the speed of the algorithm we have not worked on using parallelisation yet, but that could be added in the future.

1. Loading and merging data

(in function load_and_tidy_up_GPS.R)

The participant ID is extracted from the GPS file as either the column name ptid as relevant for some older files or as the character string before the first underscore _ or . in the GPS file name when parameter idloc is set to 2 or 6, respectively. Column names are simplified and standardised, e.g. we try to avoid spaces and special characters. A timestamp column is generated by merging the time and date column that are typically available in GPS files for which the timestamp format is used as specified with parameter time_format. Further, a columns that are typically part of GPS files but not used by hbGPS are dropped. If a column e/w exists and indicates that the coordinate was in the western hemisphere, values for longitude are flipped to negative. If a column n/s exists and indicates that coordinate was in the southern hemisphere, values for latitude are flipped to negative.

(in function mergeGGIR.R)

Accelerometer data is assumed to have been processed with R package GGIR configured to output time series, which typically includes a columns for: timestamps, magnitude of acceleration, indicator of behavioural class (e.g. sleep, MVPA, LIPA, etc), indicator of whether data was invalid such as sensor not worn, and indicator of whether it was part of waking hours of a day. To merge the GGIR generated time series with the GPS data, hbGPS goes through the following steps:

  1. Files are matched based on a check whether the extracted recording ID from the GPS file occurs in the GGIR time series filename.
  2. The range in time from both time series is compared and used to derive the time window for which both time series have data plus 60 minutes before and after to avoid losing possibly relevant context.
  3. At this point the timestamps of GPS and accelerometer data may not exactly match as both may be collected at different resolution and offset relative to the full minute. Therefore, hbGPS resamples the accelerometer time series using linear interpolation to match the timestamps of the GPS time series.
  4. Behavioural classes by GGIR which are defined in numeric code are turned to factor values to ease interpretation.

During the merge the code keeps track of merge success and prints its observations to the console: (V) successful merge or (X) followed by a description of why the merge was not successful. Unsuccessful merges can be because: GPS ID not found in accelerometer files, accelerometer class dictionary not identified, problems with file paths, accelerometer recording does not overlap with GPS data, or matching accelerometer data did not include valid data points.

2. Preparing data for trip detection

(in function deriveVars.R)

In preparation for the trip detection we need to derive distance and speed between successive coordinates. Distance is calculated with function geodesicDistance.R and defaults to the Spherical Law of Cosines formula [NOTE: Needs references and motivation]. It also has an implementation of the Haversine formula which is currently not used by hbGPS. Additionally, the code also extracts inclination, bearing and change in bearing between successive coordinates.

(in function removeOutliers.R)

The trip detection currently uses only the GPS data by which we only need to pay attention to removing outliers or suspicious values in the GPS data: (1) Missing values in the speed estimates, (2) when speed is larger than 130 km/h and preceded or followed by a speed less than 30 km/h, (3) when there is an elevation change between successive values of more than 1000 meter.

If outliers are found we ignore not only their timepoints but also the timepoint before and after, because their distance and speed estimate are also likely to be affected. With the outlier and surrounding data points removed the distance and speed estimates are no longer valid. Therefore, we rederive variables such as speed and distance again with function deriveVar.R as described above.

(in function initialStateClassification.R)

Signal to noise ratio (SNR) variables (extracted with function signalNoiseRation.R) are now added to our time series. In short, SNR variables are traditionally used to assess whether a person is indoor (high noise signal), in a vehicle (medium noise level) or outdoor (low noise levels). More specifically the code calculates snr and snr_ratio [NOTE: Needs references and motivation]. When snr is equal to or below the threshold as specific with parameter threshold_snr or when snr_ratio is below a threshold as specified with parameter threshold_snr_ratio, the corresponding time points are classified as indoor. Further, the code classifies whether the person could be in a vehicle, defined as the combination of being classified as indoor, and both a speed > 20 km/h and an absolute bearing change of less than 90 degrees in two successive points in time.

3. Trip detection

As a first step to detect trips hbGPS classifies the time series in behaviour states, which is an iterative process:

  1. The default state for all time points is 2.
  2. First any sequence of 3 time points with a speed equal or higher than 1 km/h is classified as state 1.
  3. If there is a time gap relative to the previous time point large than 30 seconds then the label state 1 is removed.
  4. Next, all state 1 occurrences are relabelled as: State 4 if the speed is in the interval [1, 7) km/h, State 5 if the speed is in the interval [7, 10) km/h, State 6 if the speed is in the interval [10, 15) km/h, State 7 if the speed is in the interval [15, 35) km/h, and State 8 if the speed is in the interval [35, ∞) km/h. The distinction between these five states is not used for the trip detection but used for descriptive purposes. So, technically these could be collapsed into a single state.
  5. Note that state 3 is not assigned yet, this will follow later as it relates to breaks in trips.

In summary, the state for each data point is now a number in the set 2, 4, 5, 6, 7 or 8.

(in function hbGPS.R)

So far, the code has only considered time points and changes between time points. However, to detect trips we need to understand the longer temporal patterns in the data. First, breaks in trips are defined as time periods where the state is 2, lasting less than a user defined duration as specified with parameter maxBreakLengthSecond. These trip breaks are labelled as state 3. Next, all time points with state not equal to 3 and classified as indoor are labelled as a new state 1. In other words, the state for each data point can now be any integer number in the set [1, 8]. This means that trips (excluding trip pause points) performed indoors are not considered, because in these conditions we do not trust the GPS data for speed assessment.

An initial classification of trips now exist as all (sequences of) time points for which the state number is higher than 2.

(in function deriveSegments.R and deriveTrips.R)

However, at this point it is only a time series without description of the trips and may like to know characteristics of each trip. Further, we may like to ignore trips that are short in duration or in length. Therefore, the code keeps trips that are longer than a duration specified with parameter minTripDur and a distance longer than specified with parameter minTripDist_m.

4. Describing trips and their segments

(in function deriveSegments.R called from within deriveTrips)

First the code segments the data based on changes in state. In function deriveSegment.R we store per segment: segment number, average speed, 90th percentile of speed, whether majority of time was indoor, whether majority of time was vehicle, segment duration, segment distance, average state, start time numeric, end time numeric, start time, end time, average snr, average snr_ratio, longitude and latitude start, and longitude and latitude end.

(in function deriveTrips.R)

The segment summaries are used to identify the mode of transport (mot) in line with Carlson et al. 2015 MSSE 'Validity of PALMS GPS Scoring of Active and Passive Travel Compared with SenseCame': mot = 3 represents a 90th percentile of the speed equal or larger than 35 km/h or the segment is classified as vehicle. From the remaining data hbGPS classifies mot = 2 if the 90th percentile of speed is large or equal than 10 km/h. All other data is classified as mot = 1. Further, it classifies iov (indoor outdoor vehicle) as: iov = 3 when mot equal 3. From the remaining data iov = 2 when indoor is FALSE, and iov = 1 otherwise. Both mot and iov are stored as column in the output and not used for the trip detection. Next, about dozen trip level descriptives such as trip duration and trip distance are added to the time series.

(in function hbGPS.R)

Additionally, the numeric state codes are stored in the time series as factor variables with labels to ease interpretation.

(in function imitatePALMSformat.R)

If the user specified parameter outputFormat = "PALMS" (default) then hbGPS will attempt to imitate the PALMS output format. Here, it removes all columns from the time series that PALMS output does not normally produce.

(in function hbGPS.R)

Finally, the time series with columns are stored to one csv file per GPS input file, but also stored with all output csv files appended into a single file named “combined.csv”.