Abstract: The goal of semi-supervised learning (SSL) methods is to reduce the amount of labeled training data required by learning from both labeled and unlabeled instances. Macskassy and Provost [1] proposed the weighted-vote relational neighbor classifier (wvRN) as a simple yet effective baseline for semi-supervised learning on network data. It is similar to many recent graph-based SSL methods (e.g., [2], [3]) and is shown to be essentially the same as the Gaussian-field classifier proposed by Zhu et al. [4] and proves to be very effective on some benchmark network datasets. We describe another simple and intuitive semi-supervised learning method based on random graph walk that outperforms wvRN by a large margin on several benchmark datasets when very few labels are available. Additionally, we show that using authoritative instances as training seeds instances that arguably cost much less to label dramatically reduces the amount of labeled data required to achieve the same classification accuracy. For some existing state-of-the-art semi-supervised learning methods the labeled data needed is reduced by a factor of 50.

Keywords: Semi-Supervised Learning(SSL), Clustering Algorithm, (wvRN), Network Traffic, Semi- Supervised Data.