Large Codebase Question Answering

A project aimed at developing techniques for answering questions about large software systems using multi-agent and graph-based approaches.

Project Description

Large programs are notoriously complex due to their sheer size, the number of files, and the intricate interdependencies between components. This complexity makes it challenging for both humans and large language models (LLMs) to maintain a coherent understanding of contextual information and relationships within the codebase.

Beyond the source code, valuable contextual knowledge is often embedded in other development artifacts such as version history, GitHub issues, discussions, and documentation. While LLMs perform well on small or isolated code snippets, studies show they struggle significantly when applied to large-scale software systems.

This limitation is not merely due to context window size. Even frontier models with million-token windows (e.g., Gemini, Claude) fail to reason consistently about entire programs. This highlights the need for more structured and interactive solutions.

To address this, our project explores hybrid approaches that combine Retrieval-Augmented Generation (RAG) with multi-agent systems. These techniques aim to enable scalable exploration and reasoning over large codebases by:

  • Structuring code and metadata using graph representations.
  • Coordinating specialized agents to analyze different artifacts (e.g., source code, Git history, documentation).
  • Supporting fine-grained reasoning about specific code sections while maintaining awareness of the broader architecture.

By integrating graph-based RAG with agentic coordination, this project aims to develop systems that can accurately, efficiently, and transparently answer developer questions about large codebases.


Technologies

  • Programming Language: Python
  • LLMs: Claude, GPT-4, LLaMA, DeepSeek, Mistral
  • Frameworks: AutoGen, LangChain
  • Tools: Neo4j, SQLite