Machine Learning

In my time at Microsoft Research and at IIT Kanpur, I have had the chance to explore both the practical and exciting problems in Machine Learning

  • Leveraging Distributional Semantics for Multi-Label Learning
    In this project, I worked on the problem of designing embedding based multi-label learning algorithms using distributional semantics which has more efficient training procedures. We extended the approach to naturally incorporate other sources of side-information, in particular, the label-label co-occurrence matrix. (Joint work with Rahul Wadbude (IIT Kanpur), Prof. Piyush Rai (IIT Kanpur), Dr. Nagarajan Natarajan ( Microsoft Research), Prof.Harish Karnick (IIT Kanpur) and Dr. Prateek Jain (Microsoft Research)

  • Efficient Estimation of Generalization Error and Bias-Variance Components of Ensembles
    In this project, I worked on the problem of an efficient Estimation of generalization error for ensembles using normality assumption on classification scores. We worked on efficient prediction of accuracy, ensemble parameters, bias and variance of generalization errors using the minimal number of ensembles. (Work done as part of a summer internship under Dr. Sundararajan Sellamanickam (Microsoft Research))

  • Bayes Optimal Classification for Hierarchy
    In this project, I worked on the problem of finding Bayes optimal classifier for Hierarchical classification for asymmetric loss. Showed under reasonable assumption over hierarchy that the Bayes optimal classification for this asymmetric loss can be determined in O(log(n)). We are currently extending the consistency of the hierarchical classification algorithm on asymmetric tree distance loss using calibrated surrogates. (Joint work with Dheeraj Mekala (IIT Kanpur), Prof. Purushottam Kar (IIT Kanpur) and Prof. Harish Karnick (IIT Kanpur)).

  • Resource Constrained Semi-Supervised Learning
    In this project, I worked in the semi-supervised setting where only a small subset of training examples was provided along with abundance of unlabeled data. We utilized the label propagation which uses the nearest neighbour algorithm on the distance -based graph between instances to propagate labels. Our objective was to develop a K-Nearest Neighbors algorithm earning models with fewer and sparse candidate points in each class called prototypes. This work was an extension of recent work done by our group EdgeML. This work finds applications in agriculture, industry, and healthcare, that will benefit from inexpensive smart edge devices.

  • Predictive Maintenance using Machine Learning in Industrial Setting
    In this project, I worked on the problem of predict downtime in industrial machines using Machine Learning. We developed multiple models on AzureML to predict and save machine downtimes. Our overall objective was to detect significant anomaly (more massive downtime anomaly) while keeping false alarms within a limited budget. Our model was demoed to top leadership and multiple customers. We are currently looking for the cost(downtime and lookahead) sensitive learning algorithm. We plan to submit this work in Machine Learning, Analytics and Data Science (MLADS, 2018) conference