mendableai/firecrawlView GitHub Homepage for Latest Official Releases

Transforms entire websites into LLM-ready Markdown or structured data, with scraping, crawling, and extraction through a single API.

AGPL-3.0TypeScriptfirecrawlmendableai 45.5k Last Updated: August 07, 2025

Firecrawl Project Detailed Introduction

Project Overview

Firecrawl is an API service that receives a URL, crawls it, and converts it into clean markdown or structured data. It crawls all accessible subpages, providing clean data for each page. No sitemap is required.

Core Features

1. Web Scraping

Scrapes a single URL and retrieves content in an LLM-ready format.
Supports multiple output formats: markdown, structured data, screenshots, HTML.
Extracts structured data via LLM.

2. Website Crawling

Crawls all URLs of a website and returns content in an LLM-ready format.
Discovers all accessible subpages without a sitemap.
Supports custom crawl depth and exclusion rules.

3. Website Mapping

Inputs a website and retrieves all website URLs - extremely fast.
Supports searching for specific URL patterns.

4. Web Search

Searches the web and retrieves full content from the results.
Customizable search parameters (language, country, etc.).
Option to retrieve various content formats from search results.

5. Data Extraction

Uses AI to extract structured data from single pages, multiple pages, or entire websites.
Supports defining extraction rules via prompts and JSON schemas.
Supports wildcard URL patterns.

6. Batching

New asynchronous endpoint for scraping thousands of URLs simultaneously.
Submit batch scraping jobs and return a job ID to check the status.

Technical Features

LLM-Ready Formats

Markdown: Clean document format.
Structured Data: Extracted data in JSON format.
Screenshots: Visual capture of the page.
HTML: Raw HTML content.
Links and Metadata: Page information extraction.

Handling Complex Situations

Proxies and Anti-Bot Mechanisms: Bypasses access restrictions.
Dynamic Content: Handles JavaScript-rendered content.
Output Parsing: Intelligent content parsing.
Orchestration: Complex process management.

Customization Capabilities

Exclude Tags: Filters unwanted content.
Authenticated Crawling: Crawls content requiring authentication using custom headers.
Maximum Crawl Depth: Controls the crawl scope.
Media Parsing: Supports PDF, DOCX, images.

Interactive Features (Actions)

Performs various actions before scraping content:

Click: Clicks on page elements.
Scroll: Page scrolling operations.
Input: Text input.
Wait: Waits for page loading.
Press Key: Keyboard operations.

API Usage Examples

Crawl Website

curl -X POST https://api.firecrawl.dev/v1/crawl \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-d '{
  "url": "https://docs.firecrawl.dev",
  "limit": 10,
  "scrapeOptions": {
    "formats": ["markdown", "html"]
  }
}'

Scrape Single Page

curl -X POST https://api.firecrawl.dev/v1/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
  "url": "https://docs.firecrawl.dev",
  "formats" : ["markdown", "html"]
}'

Structured Data Extraction

curl -X POST https://api.firecrawl.dev/v1/scrape \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
  "url": "https://www.mendable.ai/",
  "formats": ["json"],
  "jsonOptions": {
    "schema": {
      "type": "object",
      "properties": {
        "company_mission": {"type": "string"},
        "supports_sso": {"type": "boolean"},
        "is_open_source": {"type": "boolean"},
        "is_in_yc": {"type": "boolean"}
      },
      "required": ["company_mission", "supports_sso", "is_open_source", "is_in_yc"]
    }
  }
}'

SDK Support

Python SDK

pip install firecrawl-py

from firecrawl.firecrawl import FirecrawlApp
from firecrawl.firecrawl import ScrapeOptions

app = FirecrawlApp(api_key="fc-YOUR_API_KEY")

# Scrape Website
scrape_status = app.scrape_url(
    'https://firecrawl.dev',
    formats=["markdown", "html"]
)
print(scrape_status)

# Crawl Website
crawl_status = app.crawl_url(
    'https://firecrawl.dev',
    limit=100,
    scrape_options=ScrapeOptions(formats=["markdown", "html"]),
    poll_interval=30
)
print(crawl_status)

Node.js SDK

npm install @mendable/firecrawl-js

import FirecrawlApp, { CrawlParams, CrawlStatusResponse } from '@mendable/firecrawl-js';

const app = new FirecrawlApp({apiKey: "fc-YOUR_API_KEY"});

// Scrape Website
const scrapeResponse = await app.scrapeUrl('https://firecrawl.dev', {
  formats: ['markdown', 'html'],
});

if (scrapeResponse) {
  console.log(scrapeResponse)
}

// Crawl Website
const crawlResponse = await app.crawlUrl('https://firecrawl.dev', {
  limit: 100,
  scrapeOptions: {
    formats: ['markdown', 'html'],
  }
} satisfies CrawlParams, true, 30) satisfies CrawlStatusResponse;

Integration Support

LLM Framework Integration

Langchain: Python and JavaScript versions.
Llama Index: Data connector.
Crew.ai: AI agent framework.
Composio: Tool integration.
PraisonAI: AI orchestration.
Superinterface: Assistant features.
Vectorize: Vectorization integration.

Low-Code Frameworks

Dify: AI application building platform.
Langflow: Visual AI flows.
Flowise AI: No-code AI building.
Cargo: Data integration.
Pipedream: Workflow automation.

Other Integrations

Zapier: Automated workflows.
Pabbly Connect: Application integration.

License and Deployment

Open Source License

Primarily uses GNU Affero General Public License v3.0 (AGPL-3.0).
SDK and some UI components use the MIT license.

Hosted Service

A hosted version is available at firecrawl.dev.
Cloud solutions provide additional features and enterprise-level support.

Self-Hosting

Supports local deployment.
Currently under development, integrating custom modules into a monolithic repository.
Can be run locally, but not yet fully ready for self-hosted deployment.

Use Cases

AI Data Preparation: Provides clean training data for LLMs.
Content Aggregation: Collects and organizes content from multiple websites.
Competitive Analysis: Monitors competitor website changes.
SEO Research: Analyzes website structure and content.
Data Mining: Extracts structured information from websites.
Document Generation: Converts website content into document formats.

Usage Notes

Users are responsible for complying with website policies when using Firecrawl for scraping, searching, and crawling. It is recommended that users comply with the applicable website's privacy policy and terms of use before initiating any scraping activities. By default, Firecrawl complies with the instructions specified in the website's robots.txt file when crawling.

Project Status

The project is currently under active development, with the team integrating custom modules into a monolithic repository. While not yet fully ready for self-hosted deployment, it can be run locally for development and testing. The project has an active community and continuous updates, making it a leading solution in the field of web data extraction.