TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools

NeurIPS 2024
γGoogle Research,   δGoogle DeepMind  
arXiv

🤗

Dataset


We introduce TACT - Text And Calculations through Tables, a dataset designed to assess the complex reasoning capabilities of Large Language Models (LLMs) through complex, aggregative instructions. TACT features numerical instructions that require processing and integrating information dispersed across one or more texts for producing the correct answer. It includes also intermediate artifacts that are useful for analyzing the reasoning process of LLMs, and for designing our IE as a tool method. Evaluating LLMs on TACT shows that models still struggle on such aggregative instructions that requires complex reasoning and information consolidation skills.

The TACT Annotation Process

TACT was created by NLP and data science experts, who employed a rigorous annotation process to transform instances from the InstructIE dataset into a format suitable for aggregative instruction following. Creating the data includes the steps of assessing the text and the table, then formulating a query in natural language, and finally, translating the query into a Pandas command, and executing it on the table.


The TACT Properties

TACT requires models to solve multiple implicit sub-tasks, that require deep text comprehension and complex reasoning. It also offers a diverse set of Pandas commands, modeling multiple computational and aggregative challenges over texts. The Pandas command in TACT have varying lengths, independently from their associated tables.



IE as a Tool

To address the challenges in TACT, we broke down the problem into manageable tasks: table-generation, Pandas command-generation, and command-execution. By analyzing LLM performance with TACT's ground-truth data, we discovered opportunities to enhance model performance using targeted few-shot prompting. This led to the development of the IE as a tool framework, which uses few-shot prompted LLMs to independently handle each task, building on the outputs of the previous steps. This modular approach improves the performance of LLMs on TACT, as show that this new method achieves up to a 12% improvement in performance over traditional prompting techniques.


Evaluating LLMs on TACT

We evaluate several LLMs on our TACT benchmark. The performance results reveal that our new prompting approached, termed IE as a tool, consistently outperforms the other few-shot prompting techniques, yet there is still a room for improvement on this challenging task.


Conclusions

We introduced TACT — Text And Calculations through Tables, a dataset crafted to evaluate the reasoning capabilities of LLMs through complex, aggregative instructions. Utilizing the InstructIE dataset, experts converted text into a format conducive to aggregative reasoning, creating challenges in table-generation, Pandas command-generation, and command-execution. We discovered that tailored few-shot prompting significantly improves model performance on these tasks. Consequently, we developed the IE as a tool framework, which enhances task-specific strategies and shows significant improvements over conventional methods.

BibTeX

@misc{caciularu2024tact,
      title={TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools},
      author={Avi Caciularu and Alon Jacovi and Eyal Ben David and Sasha Goldshtein and Tal Schuster and Jonathan Herzig and Gal Elidan and Amir Globerson},
      year={2024},
      eprint={2406.03618},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}