Ph.D. Thesis

Inference and Reasoning for Semi-structured Tables

Advisor: Prof. Vivek Srikumar
Committee : Prof. Vivek Srikumar, Prof. Jeff Phillips, Prof. Ellen Riloff, Prof. William Wang, Prof. Mohit Bansal
Supported by Bloomberg Data Science Fellowship 2021-23

Abstract

Semi-structured tabular data, such as one in e-commerce product descriptions, annual financial reports, sports score statistics, scientific articles, etc., are ubiquitous in real-world applications. This dissertation investigates how machines understand and reason about such data. Understanding the meaning of text fragments and their implicit connections is essential for processing such data.

We introduce the InfoTabS dataset, which poses a challenge for traditional modeling techniques due to its semi-structured, multi-domain, and heterogeneous nature. We propose effective ways of incorporating knowledge into reasoning models to overcome these challenges. Our approach involves using simple pre-processing strategies and leveraging structured data knowledge graphs. Additionally, to address the challenge of multilingual tabular inference, we propose a cost-effective pipeline for translating tables, which allows us to extend InfoTabS to a multilingual version XInfoTabS.

Through systematic probing, we observed that existing models needed to reason with tabular facts despite accurate predictions. Thus, we proposed a trustworthy tabular inference approach involving two-stage evidence extraction and inference prediction. We investigated semi-automatic data augmentation techniques and introduced the Auto-TNLI dataset to improve reasoning on the InfoTabS dataset. To enhance model robustness, we introduced a prompt-based learning approach that extracts knowledge from semi-structured tables, improving performance and robustness on adversarial tests.

The work opens up several new directions for future work involving reasoning on dynamic, multilingual, and multi-modal semi-structured tabular information.

Relevant publications:

1. Vivek Gupta, Maitrey Mehta, Pegah Nokhiz, Vivek Srikumar; “InfoTabS: Inference on Tables as Semi-structured Data”; ACL 2020

2. J. Neeraja*, Vivek Gupta*, and Vivek Srikumar; “Incorporating External Knowledge to Enhance Tabular Reasoning”; NAACL 2021

3. Yerram Varun*, Aayush Sharma*, Vivek Gupta*; “Trans-KBLSTM: An External Knowledge Enhanced Transformer BiLSTM model for Tabular Reasoning”; DeeLIO-2022 @ACL 2022, Best Paper Award

4. Vivek Gupta, Riyaz A. Bhat, Atreya Ghosal, Manish Srivastava, Maneesh Singh, Vivek Srikumar, “Is My Model Using The Right Evidence? Systematic Probes for Examining Evidence-Based Tabular Reasoning”, TACL 2022

5. Vivek Gupta, Shuo Zhang, Alakananda Vempala, Yujie He, Temma Choji, Vivek Srikuma; “Right for the Right Reason: Evidence Extraction for Trustworthy Tabular Reasoning”; ACL 2022

6. Dibyakanti Kumar*, Vivek Gupta*, Soumya Sharma, Shuo Zhang; “Efficient Realistic Data Generation Framework for Semi-Structured Tabular Inference”; EMNLP 2022 Findings and SUKI 2022 @ACL 2022 (Non-Archival)

7. Abhilash Shankarampeta*, Vivek Gupta*, Shuo Zhang; “Enhancing Tabular Reasoning with Pattern Exploiting Training’’; AACL 2022

8. Bhavnick Minhas*, Anant Shankhdhar*, Vivek Gupta*, Divyanshu Aggarwal, Shuo Zhang; ”XInfoTabS: Evaluating Multilingual Tabular Natural Language Inference"; MML-2022(non-archival) and FEVER-2022 (archival) at ACL 2022

*represent equal contribution

Other related publications:

1. Nupur Jain, Vivek Gupta, Anshul Rai, Gaurav Kumar; “TabPert: An Effective Platform for Tabular Perturbation”; EMNLP 2021 (Demo Track)

2. Aashna Jena*, Vivek Gupta*, Manish Shrivastava, Julian Martin Eisenschlos; “Leveraging Data Recasting to Enhance Tabular Reasoning”; EMNLP 2022 Findings and SUKI 2022 @ACL 2022 (Non-Archival)

3. Chaitanya Agarwal*, Vivek Gupta*, Anoop Kunchukuttan, Manish Shrivastava; “Bilingual Tabular Inference: A Case Study on Indic Languages”; NAACL 2022

4. Vivek Gupta, Akshat Shrivastava, Adithya Sagar, Armen Aghajanyan, Denis Savenkov; “RetroNLU: Retrieval Augmented Task Oriented Semantic Parsing”; Spa-NLP 2022 (non-archival) and NLP4ConvAI-2022 at ACL 2022 (Outstanding Paper Award at NLP4ConvAI-2022)

*represent equal contribution

DownLoad Links

[Thesis Document] [Defence Slides]