Generate AI-Ready llms.txt Files from Screaming Frog Crawls

Generate AI-Ready llms.txt Files from Screaming Frog Website Crawls

Automatically generate LLM-compliant llms.txt content index files from CSV data exported by Screaming Frog website crawls, supporting multilingual content and AI-powered categorization.

23 NodesAI & MLSEO optimization AI integration content management

Workflow Overview

This workflow automatically generates AI-ready llms.txt files from data exported by the Screaming Frog website crawler. The llms.txt format is a standardized file that helps Large Language Models (LLMs) better understand and discover website content. The workflow accepts Screaming Frog CSV export files via a form, processes them through data extraction, field mapping, URL filtering, and optional AI classification, and finally outputs a downloadable llms.txt file.

Workflow Name

Generate AI-Ready llms.txt Files from Screaming Frog Website Crawls

Core Features

This workflow implements the following core functionalities:

  1. Form-based Data Collection: Receives website name, description, and Screaming Frog export file via a web form
  2. CSV Data Parsing: Extracts structured data from the uploaded CSV file
  3. Multilingual Support: Automatically recognizes and handles Screaming Frog exports in English, French, German, Italian, and Spanish
  4. Intelligent Filtering: Filters URLs based on status code, indexability, content type, and other criteria
  5. AI Classification (Optional): Uses OpenAI models to intelligently classify content and identify high-quality pages
  6. Formatted Output: Generates files compliant with the llms.txt standard format

Detailed Workflow Nodes

1. Trigger Node

Form - Screaming frog internal_html.csv upload

  • Type: Form Trigger (formTrigger)
  • Function: Provides a user interface to collect the following information:
    • Website name
    • Short website description (must be in the target language of the website)
    • Screaming Frog CSV export file (internal_html.csv or internal_all.csv)
  • Trigger: Automatically initiates the workflow upon form submission

2. Data Extraction Node

Extract data from Screaming Frog file

  • Type: File Extraction Node (extractFromFile)
  • Function: Parses the uploaded CSV file and extracts data
  • Input: Binary file data received from the form
  • Output: Structured JSON array of data

3. Field Mapping Node

Set useful fields

  • Type: Set Node
  • Function: Extracts and maps key fields from Screaming Frog export data
  • Extracted fields:
    • url: Page URL
    • title: Page title
    • description: Meta description
    • status: HTTP status code
    • indexability: Indexability status
    • content_type: Content type
    • word_count: Word count
  • Multilingual field mapping: Supports column names in English, French, German, Italian, and Spanish

4. URL Filtering Node

Filter URLs

  • Type: Filter Node
  • Function: Filters URLs based on predefined conditions
  • Filtering criteria:
    • Status code = 200 (accessible)
    • Indexability = indexable (search-engine indexable)
    • Content type contains "text/html" (HTML pages)
  • Extensibility: Users can add additional filtering conditions (e.g., word count, URL path, meta description, etc.)

5. AI Classification Node (Disabled by Default)

Text Classifier

  • Type: Text Classifier (textClassifier)
  • Status: Disabled by default
  • Function: Uses an AI model to intelligently assess page content quality
  • Classification categories:
    • useful_content: High-quality content suitable for inclusion in llms.txt
    • other_content: Low-value content or paginated pages that should be excluded
  • Input data: URL, title, description, word count
  • Connected AI model: OpenAI Chat Model

6. AI Model Node

OpenAI Chat Model

  • Type: OpenAI Chat Model (lmChatOpenAi)
  • Model: gpt-4o-mini
  • Function: Provides AI inference capabilities for the text classifier
  • Requirement: OpenAI API credentials

7. Data Processing Node

Set Field - llms.txt Row

  • Type: Set Node
  • Function: Formats each URL into the standard llms.txt format
  • Output format: - [Page Title](URL): Description

8. Data Aggregation Node

Summarize - Concatenate

  • Type: Aggregate Node
  • Function: Combines all formatted rows into a single text string
  • Operation: Joins all records using line breaks

9. Content Assembly Node

Set Fields - llms.txt Content

  • Type: Set Node
  • Function: Assembles the complete llms.txt file content
  • Includes:
    • Website name
    • Website description
    • List of all filtered URLs

10. File Generation Node

Generate llms.txt file

  • Type: Convert to File (convertToFile)
  • Function: Converts text content into a downloadable file
  • Filename: llms.txt
  • Encoding: UTF-8
  • Output: File directly downloadable from the n8n UI

11. Auxiliary Node

No Operation, do nothing

  • Type: No Operation Node (noOp)
  • Function: Handles the data branch marked as "other_content" by the AI classifier

12. Annotation Nodes

The workflow includes multiple Sticky Note nodes providing detailed usage instructions and tips:

  • Main notes: Overall workflow introduction and usage steps
  • Form notes: Detailed explanations of input fields
  • Data extraction notes: Important considerations for CSV file processing
  • Field mapping notes: Details about multilingual support
  • Filtering notes: Filtering conditions and extensibility suggestions

Workflow Execution Flow

  1. User Input → User submits website info and CSV file via form
  2. Data Extraction → Parses CSV file to obtain raw data
  3. Field Mapping → Extracts key fields and standardizes field names
  4. URL Filtering → Filters URLs based on status, indexability, and content type
  5. AI Classification (Optional) → Uses AI to further filter high-quality content
  6. Format Conversion → Converts each URL into llms.txt format
  7. Data Aggregation → Merges all rows
  8. Content Assembly → Adds website header information
  9. File Generation → Produces a downloadable llms.txt file

Technical Features

Multilingual Support

The workflow intelligently recognizes Screaming Frog export files in different languages, supporting:

  • English
  • French (Français)
  • German (Deutsch)
  • Italian (Italiano)
  • Spanish (Español)

Flexibility

  • Supports both internal_html.csv and internal_all.csv export formats
  • Filtering conditions can be customized and extended as needed
  • AI classifier can be enabled or disabled on demand
  • Easily extendable with additional nodes (e.g., upload to Google Drive, OneDrive, etc.)

User-Friendly

  • Clear form interface
  • Detailed annotation notes
  • Direct download of result files from the n8n UI
  • Recommended to use the "Test Workflow" feature directly in the n8n UI

Use Cases

This workflow is suitable for the following scenarios:

  1. SEO Optimization: Creating AI-friendly content indexes for websites
  2. Content Management: Batch organizing indexable website pages
  3. AI Integration: Helping LLMs better understand website structure and content
  4. Website Auditing: Filtering and categorizing website pages
  5. Multilingual Websites: Uniformly processing website data across different language versions

Prerequisites

  1. Screaming Frog SEO Spider: For crawling websites and exporting data
  2. n8n Platform: To run the workflow
  3. OpenAI API (optional): Required only when enabling AI classification

Output

The generated llms.txt file includes:

  • Website name and description (header information)
  • Filtered list of pages, each formatted as: - [Page Title](URL): Page Description
  • UTF-8 encoding to ensure multilingual compatibility

Extension Suggestions

  1. Automated Deployment: Add nodes to automatically upload the generated file to the website root directory
  2. Scheduled Updates: Combine with schedule triggers for periodic regeneration
  3. Multi-source Integration: Enrich llms.txt content by integrating other data sources
  4. Quality Control: Add more filtering conditions (e.g., minimum word count, mandatory description, etc.)
  5. Notification Mechanism: Add email or Slack notification nodes to alert upon completion

Notes

  1. Uploaded files must be in Screaming Frog's standard export format; otherwise, subsequent steps may fail
  2. The AI classifier is disabled by default to save costs and must be manually enabled when needed
  3. The file must be manually downloaded from the last node in the n8n UI
  4. Using the AI classification feature requires valid OpenAI API credentials
  5. It is recommended to use internal_html.csv exports, though internal_all.csv also works

Summary

This is a well-designed n8n workflow that combines SEO tools (Screaming Frog) with AI technology to automate the generation of website content index files compliant with modern LLM standards. The workflow offers excellent user experience, robust multilingual support, and flexible extensibility, making it suitable for websites of all sizes.