Skip to content

Paper Reviews

Content

LAMBDA: A Large Model Based Data Agent

LAMBDA is an open-source, code-free multi-agent data analysis system designed to make data analysis accessible to users without programming experience. The system has several key objectives:

  • Enabling code-free data analysis by automatically generating programming code
  • Seamlessly integrating human domain knowledge with AI capabilities
  • Supporting data science education through interactive learning
  • Automatically generating comprehensive analysis reports and exportable code

Problem Statement & Motivation

Existing research has not adequately addressed the high degree of flexibility required in real-world data analysis scenarios, particularly when it comes to incorporating custom algorithms or statistical models based on user preferences. Additionally, traditional function-calling approaches face significant challenges in statistical and data science applications:

  • The sheer volume of APIs/functions, their complex interrelationships, and extensive documentation often exceed LLM capacity
  • As the number of available APIs increases, the model's ability to accurately select appropriate functions deteriorates

System Architecture

LAMBDA employs a dual-agent architecture consisting of:

  1. Data Scientist (Programmer) Agent
  2. Primary role: Code generation and user interaction
  3. Guided by system prompts defining its role, context, and I/O formats
  4. Workflow:

    • Writes code based on user/inspector instructions
    • Executes code through kernel
    • Generates comprehensive responses including results summary and next-step suggestions
  5. Inspector Agent

  6. Primary role: Error detection and correction
  7. Analyzes execution errors in programmer's code
  8. Provides actionable revision suggestions for code improvement
  9. Works iteratively until code executes successfully or reaches maximum attempts

alt text Figure: Overview of LAMBDA showing the interaction between programmer agent for code generation and inspector agent for error evaluation. The system supports human intervention when needed.

Key Features

Knowledge Integration Mechanism
  • Implements a Key-Value (KV) knowledge structure
  • Key: Resource descriptions (e.g., function docstrings)
  • Value: Corresponding code implementations
  • Enables domain-specific task execution
  • Provides flexibility for complex analysis challenges
  • Facilitates easy incorporation of user resources into the agent system
Technical Implementation
  • Uses IPython as the system kernel for sequential data processing
  • Supports comprehensive report generation including:
  • Data processing steps
  • Data visualizations
  • Model descriptions
  • Evaluation results
  • Enables code export functionality
User Interface & Interaction
  • Chat-based interface for natural interaction
  • Step-by-step guided prompting
  • Human-in-the-loop design allowing direct code modification
  • Extensive prompt templates for various tasks:
  • Data analysis
  • Dataset handling
  • Error resolution
  • Knowledge integration
  • Code debugging

alt text Figure: Detailed view of the collaborative process between programmer and inspector agents.

Resources

Can Large Language Models Serve as Data Analysts? A Multi-Agent Assisted Approach for Qualitative Data Analysis

Qualitative research is a type of research that focuses on collecting and analyzing non-numerical data (e.g., text, video, or audio) to understand concepts, opinions, or experiences.

This paper explores using LLMs to automate and expedite qualitative data analysis processes. The model adeptly interprets massive volumes of textual data and interview transcripts to autonomously perform the chosen approach for qualitative data analysis.

In the framework, each agent in the system is a specialized instance of an LLM, trained to handle different aspects of qualitative data analysis.

alt text Figure: A workflow overview of the proposed system for automation of qualitative data analysis.

Resource

TaskWeaver: A Code-First Agent Framework

TaskWeaver is a framework that converts user requests into executable code and treats user-defined plugins as callable functions. It provides support for rich data structures, flexible plugin usage, and dynamic plugin selection, leveraging LLM coding capabilities for complex logic.

Key limitations of existing frameworks that TaskWeaver addresses: - Lack of native support for handling rich data structures - Limited configuration options for incorporating domain knowledge - Inability to meet diverse user requirements

The core architecture consists of three main components:

  1. Planner: The system entry point that:
  2. Breaks down user requests into subtasks and manages execution with self-reflection
  3. Transforms execution results into human-readable responses

  4. Code Interpreter (CI): Contains two sub-components:

  5. Code Generator (CG): Generates code for subtasks from the Planner using available plugins
  6. Code Executor (CE): Executes generated code and maintains execution state

  7. Memory Module: Centralizes chat history between user and internal roles

alt text Figure: The overview of TaskWeaver.

A key use case is anomaly detection in databases, which uses a two-layer planning process: 1. Planner generates high-level steps to fulfill the request 2. CI devises detailed execution plans with chain-of-thought reasoning and code generation

The workflow begins with the Planner receiving a user query along with CI descriptions (plugin/function documentation). The CG receives comprehensive plugin definitions including function names, descriptions, arguments and return values. Execution results flow back to the Planner to determine next steps.

alt text Figure: the workflow of the anomaly detection use case.

Component Details

Planner As the system controller, the Planner: - Receives and decomposes user queries into sub-tasks - Generates initial plans based on LLM knowledge and domain examples - Refines plans considering sub-task dependencies - Assigns tasks to CI for code generation - Updates plans based on execution results following ReAct pattern - Manages the process until completion

Code Generator (CG) The CG synthesizes Python code by: - Combining plugin system and code interpreter capabilities - Leveraging user-customized plugins and examples - Using plugin schemas to understand capabilities - Ensuring plugin implementations match schemas

Code Executor (CE) The CE handles code execution by: - Running code generated by CG - Managing dependent modules and plugins - Preserving context and logs - Returning results to Planner

Resource

TradingAgents: Multi-Agents LLM Financial Trading Framework

Resource

SciAgents: Automating Scientific Discovery through Multi-Agent Intelligent Graph Reasoning

publish date: 2024-09-09 Objective: Developing AI systems that can not only explore and exploit existing knowledge to make significant scientific discoveries but also automate and replicate the broader research process, including acquiring relevant knowledge and data. SciAgents, an approach that leverages three core concepts: (1) the use of large-scale ontological knowledge graphs to organize and interconnect diverse scientific concepts, (2) a suite of large language models (LLMs) and data retrieval tools, and (3) multi-agent systems with in-situ learning capabilities.

Resource