Ph.D. ThesisInference and Reasoning for Semi-structured TablesAdvisor: Prof. Vivek Srikumar AbstractSemi-structured tabular data, such as one in e-commerce product descriptions, annual financial reports, sports score statistics, scientific articles, etc., are ubiquitous in real-world applications. This dissertation investigates how machines understand and reason about such data. Understanding the meaning of text fragments and their implicit connections is essential for processing such data. We introduce the InfoTabS dataset, which poses a challenge for traditional modeling techniques due to its semi-structured, multi-domain, and heterogeneous nature. We propose effective ways of incorporating knowledge into reasoning models to overcome these challenges. Our approach involves using simple pre-processing strategies and leveraging structured data knowledge graphs. Additionally, to address the challenge of multilingual tabular inference, we propose a cost-effective pipeline for translating tables, which allows us to extend InfoTabS to a multilingual version XInfoTabS. Through systematic probing, we observed that existing models needed to reason with tabular facts despite accurate predictions. Thus, we proposed a trustworthy tabular inference approach involving two-stage evidence extraction and inference prediction. We investigated semi-automatic data augmentation techniques and introduced the Auto-TNLI dataset to improve reasoning on the InfoTabS dataset. To enhance model robustness, we introduced a prompt-based learning approach that extracts knowledge from semi-structured tables, improving performance and robustness on adversarial tests. The work opens up several new directions for future work involving reasoning on dynamic, multilingual, and multi-modal semi-structured tabular information. Relevant publications: 1. Vivek Gupta, Maitrey Mehta, Pegah Nokhiz, Vivek Srikumar; “InfoTabS: Inference on Tables as Semi-structured Data”; ACL 2020 *represent equal contribution Other related publications: 1. Nupur Jain, Vivek Gupta, Anshul Rai, Gaurav Kumar; “TabPert: An Effective Platform for Tabular Perturbation”; EMNLP 2021 (Demo Track) *represent equal contribution DownLoad Links |