Krishnakant Saboo

Parallel and Distributed approaches for Graph based semisupervised learning


Guide:
Prof. V.S. Borkar, IIT Bombay
Dr. K. Avrachenkov, INRIA, Sophia Antipolis

Background

Consider a multi-class graph where each node belongs to only one class. We are given a few nodes from each class. Can we find out the class of all the other nodes? There have been attempts to solve this problem using collaborative filtering and then finding latent features. We propose an algorithm which uses the structure of the graph to come up with a label for each node. Recommendation systems would be one, among many, applications of this.

Approach

A node belonging to a certain class is highly likely to have majority of neighbors belonging to the same class. Motivated by this principle, we perform a Markov Chain Monte Carlo walk on the graph and calculate the "belongingness" of the node to each of the classes. The class for which the node has the highest belongingness is then assigned to that node. Since we are doing a walk on the graph, the method is also able to deal with arrival and departure of nodes.

We studied the algorithm's performance on a real world dataset, Gaussian Mixture graphs and Stochastic Block model graphs modelled as M/M/K/K queue. The classification error for these is within 7%. Error occurs for nodes which have a majority of neighbors belonging to a different class.

Fig1: Gaussian Mixture graph with 3 classes and 500 nodes. The color of the node represents its true class.

Fig2: This shows classification after running the algorithm. Node color represents the detected class of the node. Wrongly classified nodes are shown in yellow.