II-Agent is an open-source intelligent assistant framework designed to simplify and enhance workflows across multiple domains, capable of independently executing complex tasks.
II-Agent Project Detailed Introduction
Project Overview
II-Agent is an open-source intelligent assistant designed to simplify and enhance workflows across multiple domains. It represents a significant advancement in how we interact with technology—moving from passive tools to intelligent systems capable of independently executing complex tasks.
Project Address: https://github.com/Intelligent-Internet/ii-agent
Core Features
II-Agent is built around providing a proxy interface for Anthropic Claude models, offering the following functionalities:
- CLI Interface: Direct command-line interaction
- WebSocket Server: Support for modern React frontends
- Google Cloud Vertex AI Integration: Access to Anthropic models via API
Application Areas and Functions
Domain | II-Agent Functionality |
---|---|
Research & Fact-Checking | Multi-step web searches, triangulation of information sources, structured note-taking, rapid summarization |
Content Generation | Blog and article drafts, lesson plans, creative essays, technical manuals, website creation |
Data Analysis & Visualization | Data cleaning, statistical analysis, trend detection, chart creation, automated report generation |
Software Development | Code synthesis, refactoring, debugging, test writing, multi-language step-by-step tutorials |
Workflow Automation | Script generation, browser automation, file management, process optimization |
Problem Solving | Problem decomposition, alternative path exploration, step-by-step guidance, troubleshooting |
System Architecture
The II-Agent system employs a sophisticated approach to construct a versatile AI agent, with core methodologies including:
1. Core Agent Architecture and LLM Interaction
- Dynamically customized system prompts
- Comprehensive interaction history management
- Intelligent context management to handle token limits
- Systematized LLM calls and function selection
- Iterative optimization through execution cycles
2. Planning and Reflection
- Structured reasoning for complex problem-solving
- Problem decomposition and sequential thinking
- Transparent decision-making processes
- Hypothesis formation and testing
3. Execution Capabilities
- File system operations with intelligent code editing
- Command-line execution in a secure environment
- Advanced web interaction and browser automation
- Task completion and reporting
- Specialized functions for various modalities (experimental): PDF, audio, image, video, slides
- Deep research integration
4. Context Management
- Token usage estimation and optimization
- Strategic truncation for long interactions
- File-based archiving for large outputs
5. Real-time Communication
- Interactive interface based on WebSocket
- Isolated agent instances per client
- Streaming operation events for a responsive user experience
Performance Evaluation
II-Agent has been evaluated on the GAIA benchmark, which assesses LLM-based agents operating in real-world scenarios, covering multiple dimensions including multi-modal processing, tool utilization, and web search.
Several issues were identified with the GAIA benchmark during the evaluation process:
- Annotation Errors: Several incorrect annotations in the dataset
- Outdated Information: Some questions referenced websites or content that were no longer accessible
- Linguistic Ambiguity: Unclear wording leading to different interpretations of the questions
Despite these challenges, II-Agent performed well in the benchmark, particularly in areas requiring complex reasoning, tool use, and multi-step planning.
Installation and Configuration
System Requirements
- Python 3.10+
- Node.js 18+ (for the frontend)
- Google Cloud project with Vertex AI API enabled or Anthropic API key
Environment Configuration
Create a .env
file in the root directory:
# Image and video generation tools
OPENAI_API_KEY=your_openai_key
OPENAI_AZURE_ENDPOINT=your_azure_endpoint
# Search providers
TAVILY_API_KEY=your_tavily_key
#JINA_API_KEY=your_jina_key
#FIRECRAWL_API_KEY=your_firecrawl_key
# For image search and better search results, use SerpAPI
#SERPAPI_API_KEY=your_serpapi_key
STATIC_FILE_BASE_URL=http://localhost:8000/
# If using Anthropic client
ANTHROPIC_API_KEY=
# If using Google Vertex (recommended, extra throughput if you have permissions)
#GOOGLE_APPLICATION_CREDENTIALS=
Frontend environment configuration, create a .env
file in the frontend directory:
NEXT_PUBLIC_API_URL=http://localhost:8000
Installation Steps
Clone the repository
Set up the Python environment:
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e .
- Set up the frontend (optional):
cd frontend
npm install
Usage
CLI Usage
Using the Anthropic client:
python cli.py
Using Vertex:
python cli.py --project-id YOUR_PROJECT_ID --region YOUR_REGION
CLI Options:
--project-id
: Google Cloud project ID--region
: Google Cloud region (e.g., us-east5)--workspace
: Workspace directory path (default: ./workspace)--needs-permission
: Requires permission before executing commands--minimize-stdout-logs
: Reduce the amount of logs printed to stdout
Web Interface Usage
- Start the WebSocket server:
Using the Anthropic client:
export STATIC_FILE_BASE_URL=http://localhost:8000
python ws_server.py --port 8000
Using Vertex:
export STATIC_FILE_BASE_URL=http://localhost:8000
python ws_server.py --port 8000 --project-id YOUR_PROJECT_ID --region YOUR_REGION
- Start the frontend (in a separate terminal):
cd frontend
npm run dev
- Open your browser and visit http://localhost:3000
Project Structure
cli.py
: Command-line interfacews_server.py
: Frontend WebSocket serversrc/ii_agent/
: Core agent implementationagents/
: Agent implementationsllm/
: LLM client interfacestools/
: Tool implementationsutils/
: Utility functions
Technical Features
The II-Agent framework is architected around the reasoning capabilities of large language models such as Claude 3.7 Sonnet, presenting a comprehensive and robust approach to building versatile AI agents. Through the synergistic combination of a powerful LLM, a rich set of execution capabilities, explicit planning and reflection mechanisms, and intelligent context management strategies, II-Agent is capable of handling a wide range of complex, multi-step tasks.
Summary
II-Agent represents a significant advancement in intelligent agent technology, with its open-source nature and extensible design providing a solid foundation for continued research and development in the rapidly evolving field of agent AI. Through its multi-domain application capabilities and robust technical architecture, II-Agent provides users with a comprehensive and easy-to-use intelligent assistant platform.