Abstract
The current Web and data indexing and search mechanisms are mainly tailored to process text-based data and are limited in addressing the intrinsic characteristics of distributed, large-scale and dynamic Internet of Things (IoT) data networks. The IoT demands novel indexing solutions for large-scale data to create an ecosystem of system; however, IoT data are often numerical, multi-modal and heterogeneous. We propose a distributed and adaptable mechanism that allows indexing and discovery of real-world data in IoT networks. Comparing to the state-of-the-art approaches, our model does not require any prior knowledge about the data or their distributions. We address the problem of distributed, efficient indexing and discovery for voluminous IoT data by applying an unsupervised machine learning algorithm. The proposed solution aggregates and distributes the indexes in hierarchical networks. We have evaluated our distributed solution on a large-scale dataset, and the results show that our proposed indexing scheme is able to efficiently index and enable discovery of the IoT data with 71% to 92% better response time than a centralised approach.