Abstract
Predicting vehicle traffic optimizes transportation management and urban planning. In this paper, we combine real-time data from vehicle-detection Internet of Things (IoT) devices with external variables from Google Trends. Integrating such heterogeneous, complex data streams is challenging for traditional machine learning models that struggle to capture the dynamics of traffic patterns, which are influenced by multiple interdependent factors. To effectively model these complex, interdependent factors, we introduce the Granger-Causal Transformer (GCT), a transformer-based architecture for traffic prediction that integrates an LSTM network with a modified multi-head attention mechanism. This mechanism extends Granger causality to the spatio-temporal domain to analyze all causality relations between features consistently, while capturing long-range dependencies and temporal patterns. Before applying GCT, we generate lagged versions of the Google Trends time series to capture lead and lag effects. Tourists usually make searches about their destination weeks before traveling, so peaks in search interest occur earlier than peaks in weekly traffic volume. Using lags aligns the predictors with weekly traffic volume and allows the model to use past searches to predict future traffic. We semantically validate the Google Trends terms by comparing each term with a reference string describing the study area, using a language model aligned with the data’s linguistic context. We then apply a dual filtering process comprising Granger noncausality and correlation tests to minimize noise and redundancy. We evaluate our proposed methodology against classical statistical models, deep learning models, large foundation models, and transformers across two case studies. The results demonstrate consistently superior performance and generalizability, with GCT achieving
improvements between 47% and 68% compared to the best performing baselines across both settings, alongside substantial reductions in MAE and MSE.