Traffic Noise Prediction Using Machine Learning: A Mobile Monitoring Approach

Data Processing. A flow chart of the data processing steps is shown in Figure 2. First, Strava GPS data, recorded every 2-5 s, depending on the satellite connection, were temporally interpolated to every second to match with the traffic noise measurements resulting in 112,820 geo-referenced points. We then mapped noise points to their corresponding road networks by adopting a map-matching algorithm.37 After averaging over three runs of each route, all of the noise points were then snapped and spatially aggregated into 10 m segments. To reduce the effect of stochastic traffic conditions as well as nontraffic noise sources, we used running averages of seven measurements—three measurements before and after each measurement. After data processing, we had 6647 traffic noise observations; based on their latitude and longitude, we extracted all of the aforementioned input data. Model Training and Validation. We trained four machine learning models including LR, RF, XGB, and NN using processed noise data and a total of 44 input (predictor) variables. Due to potential multicollinearity between some of the variables, we performed all subset regression using exhaustive search rather than fitting LR directly on all variables. The adjusted R2 and Mallow’s Cp were used to determine the best model among all possible subset models of 1 through all 44 predictor variables. Compared to LR, machine learning methods better address multicollinearity and can capture potentially complex nonlinear relationships between predictors (see Figure S5, Supporting Information). Briefly, RF uses random and uncorrelated regression trees generated via bootstrap sampling to associate input variables with the output variable. Since RF can generate highly layered decision trees, it is prone to overfitting. On the other hand, XGB, which has not been used previously in a noise prediction context, is a more efficient and scalable implementation of RF that uses an ensemble of decision trees. Compared to RF, where classifiers are trained independently for each iteration, XGB is built on weak learners, which allow each newly added classifier to be trained to further improve a previously trained ensemble. In contrast, NN uses a set of pattern recognition algorithms rather than decision trees. Additional technical details on RF, XGB, and NN have been published previously.38-40 The parameters of each machine learning algorithm were tuned by a 5-fold CV random grid search process that included measured LAeq from all of the routes together (Table S1, Supporting Information). After tuning, we trained each model on 70% of the data, predicting LAeq on the remaining 30% and repeating the process five times to obtain robust estimates. For the NN, we additionally used 20% drop-out rate at the second and third layers to avoid overfitting, meaning that the weight of one in five randomly selected neurons was set to 0 during training so that its contribution to the next layers was removed. We also conducted an LORO CV method, which involved leaving 1 entire route out of 16 for all four models for 16 repetitions. This method has been shown to provide spatially resolved comparisons of predictive performance41 in both R2 and RMSE metrics. With 16 routes in total, the LORO CV method generated 16 pairs of R2 and RMSE. Lastly, to compare our mobile monitoring method to a more traditional fixed measurement approach, we simulated a fixed-site monitoring campaign by randomly selecting one noise reading from the 3-run averaged 10 m segment of each road in our study area (a total of 135 noise points) and reran all of the machine learning models. We extracted the most important variables for all of the trained models. LR variable importance was determined using the absolute value of the t-statistic of each parameter. For RF and XGB, the most important variables were determined by total decreased RMSE, calculated based on how much the squared error over all trees decreased after a variable was selected to split in the tree-building process.42 The most important variables for the NN were calculated by permutation importance, which randomly shuffled the value in one column and calculated how much the mean squared error was changed from shuffling.43 We present all variable importance results as the attributed contribution of a variable such that the sum over all variables is one.