Abstract
We present a scalable data-driven machine learning approach for early and continuous TCP flow-length prediction, enabling Software-Defined Networking controllers to make proactive, latency-aware routing decisions. Unlike traditional Elephant Flow versus Mice Flow classification, which depends on static thresholds and delayed observation, our method performs a data-driven machine learning regression-based estimation using only the first 400ms of traffic. We aggregate IP packets through tokenization to preserve temporal dynamics while reducing monitoring overhead. An ensemble of Long Short-Term Memory layers extract temporal features, that are fused and processed by an uncertainty modelling Mixture Density Network to predict the total flow length. Experiments on real world CAIDA and MAWI datasets show that our approach reduces mean absolute error to 1.74s, nearly halving the error of state-of-the-art baselines.