Abstract
This paper focuses on service clustering and uses service descriptions to construct probabilistic models for service clustering.We discuss how service descriptions can be enriched with machine-interpretable semantics and then we investigate how these service descriptions can be grouped in clusters in order to make discovery, ranking, and recommendation faster and more effective. We propose using Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA) (i.e. two machine learning techniques used in Information Retrieval) to learn latent factors from the corpus of service descriptions and group services according to their latent factors. By creating an intermediate layer of latent factors between the services and their descriptions, the dimensionality of the model is reduced and services can be searched and linked together based on probabilistic methods in latent space. The model can cluster any newly added service with a direct calculation without requiring to re-calculate the latent variables or re-train the model.