Generate AI-Ready llms.txt Files from Screaming Frog Crawls

Generate AI-Ready llms.txt Files from Screaming Frog Website Crawls

Automatically generate LLM-compliant llms.txt content index files from CSV data exported by Screaming Frog website crawls, supporting multilingual content and AI-powered categorization.

23 NodesAI & MLSEO optimization AI integration content management

Workflow Overview

This workflow automatically generates AI-ready llms.txt files from data exported by the Screaming Frog website crawler. The llms.txt format is a standardized file that helps Large Language Models (LLMs) better understand and discover website content. The workflow accepts Screaming Frog CSV export files via a form, processes them through data extraction, field mapping, URL filtering, and optional AI classification, and finally outputs a downloadable llms.txt file.

Workflow Name

Generate AI-Ready llms.txt Files from Screaming Frog Website Crawls

Core Features

This workflow implements the following core functionalities:

Form-based Data Collection: Receives website name, description, and Screaming Frog export file via a web form
CSV Data Parsing: Extracts structured data from the uploaded CSV file
Multilingual Support: Automatically recognizes and handles Screaming Frog exports in English, French, German, Italian, and Spanish
Intelligent Filtering: Filters URLs based on status code, indexability, content type, and other criteria
AI Classification (Optional): Uses OpenAI models to intelligently classify content and identify high-quality pages
Formatted Output: Generates files compliant with the llms.txt standard format

Detailed Workflow Nodes

1. Trigger Node

Form - Screaming frog internal_html.csv upload

Type: Form Trigger (formTrigger)
Function: Provides a user interface to collect the following information:
- Website name
- Short website description (must be in the target language of the website)
- Screaming Frog CSV export file (internal_html.csv or internal_all.csv)
Trigger: Automatically initiates the workflow upon form submission

2. Data Extraction Node

Extract data from Screaming Frog file

Type: File Extraction Node (extractFromFile)
Function: Parses the uploaded CSV file and extracts data
Input: Binary file data received from the form
Output: Structured JSON array of data

3. Field Mapping Node

Set useful fields

Type: Set Node
Function: Extracts and maps key fields from Screaming Frog export data
Extracted fields:
- url: Page URL
- title: Page title
- description: Meta description
- status: HTTP status code
- indexability: Indexability status
- content_type: Content type
- word_count: Word count
Multilingual field mapping: Supports column names in English, French, German, Italian, and Spanish

4. URL Filtering Node

Filter URLs

Type: Filter Node
Function: Filters URLs based on predefined conditions
Filtering criteria:
- Status code = 200 (accessible)
- Indexability = indexable (search-engine indexable)
- Content type contains "text/html" (HTML pages)
Extensibility: Users can add additional filtering conditions (e.g., word count, URL path, meta description, etc.)

5. AI Classification Node (Disabled by Default)

Text Classifier

Type: Text Classifier (textClassifier)
Status: Disabled by default
Function: Uses an AI model to intelligently assess page content quality
Classification categories:
- useful_content: High-quality content suitable for inclusion in llms.txt
- other_content: Low-value content or paginated pages that should be excluded
Input data: URL, title, description, word count
Connected AI model: OpenAI Chat Model

6. AI Model Node

OpenAI Chat Model

Type: OpenAI Chat Model (lmChatOpenAi)
Model: gpt-4o-mini
Function: Provides AI inference capabilities for the text classifier
Requirement: OpenAI API credentials

7. Data Processing Node

Set Field - llms.txt Row

Type: Set Node
Function: Formats each URL into the standard llms.txt format
Output format: - [Page Title](URL): Description

8. Data Aggregation Node

Summarize - Concatenate

Type: Aggregate Node
Function: Combines all formatted rows into a single text string
Operation: Joins all records using line breaks

9. Content Assembly Node

Set Fields - llms.txt Content

Type: Set Node
Function: Assembles the complete llms.txt file content
Includes:
- Website name
- Website description
- List of all filtered URLs

10. File Generation Node

Generate llms.txt file

Type: Convert to File (convertToFile)
Function: Converts text content into a downloadable file
Filename: llms.txt
Encoding: UTF-8
Output: File directly downloadable from the n8n UI

11. Auxiliary Node

No Operation, do nothing

Type: No Operation Node (noOp)
Function: Handles the data branch marked as "other_content" by the AI classifier

12. Annotation Nodes

The workflow includes multiple Sticky Note nodes providing detailed usage instructions and tips:

Main notes: Overall workflow introduction and usage steps
Form notes: Detailed explanations of input fields
Data extraction notes: Important considerations for CSV file processing
Field mapping notes: Details about multilingual support
Filtering notes: Filtering conditions and extensibility suggestions

Workflow Execution Flow

User Input → User submits website info and CSV file via form
Data Extraction → Parses CSV file to obtain raw data
Field Mapping → Extracts key fields and standardizes field names
URL Filtering → Filters URLs based on status, indexability, and content type
AI Classification (Optional) → Uses AI to further filter high-quality content
Format Conversion → Converts each URL into llms.txt format
Data Aggregation → Merges all rows
Content Assembly → Adds website header information
File Generation → Produces a downloadable llms.txt file

Technical Features

Multilingual Support

The workflow intelligently recognizes Screaming Frog export files in different languages, supporting:

English
French (Français)
German (Deutsch)
Italian (Italiano)
Spanish (Español)

Flexibility

Supports both internal_html.csv and internal_all.csv export formats
Filtering conditions can be customized and extended as needed
AI classifier can be enabled or disabled on demand
Easily extendable with additional nodes (e.g., upload to Google Drive, OneDrive, etc.)

User-Friendly

Clear form interface
Detailed annotation notes
Direct download of result files from the n8n UI
Recommended to use the "Test Workflow" feature directly in the n8n UI

Use Cases

This workflow is suitable for the following scenarios:

SEO Optimization: Creating AI-friendly content indexes for websites
Content Management: Batch organizing indexable website pages
AI Integration: Helping LLMs better understand website structure and content
Website Auditing: Filtering and categorizing website pages
Multilingual Websites: Uniformly processing website data across different language versions

Prerequisites

Screaming Frog SEO Spider: For crawling websites and exporting data
n8n Platform: To run the workflow
OpenAI API (optional): Required only when enabling AI classification

Output

The generated llms.txt file includes:

Website name and description (header information)
Filtered list of pages, each formatted as: - [Page Title](URL): Page Description
UTF-8 encoding to ensure multilingual compatibility

Extension Suggestions

Automated Deployment: Add nodes to automatically upload the generated file to the website root directory
Scheduled Updates: Combine with schedule triggers for periodic regeneration
Multi-source Integration: Enrich llms.txt content by integrating other data sources
Quality Control: Add more filtering conditions (e.g., minimum word count, mandatory description, etc.)
Notification Mechanism: Add email or Slack notification nodes to alert upon completion

Notes

Uploaded files must be in Screaming Frog's standard export format; otherwise, subsequent steps may fail
The AI classifier is disabled by default to save costs and must be manually enabled when needed
The file must be manually downloaded from the last node in the n8n UI
Using the AI classification feature requires valid OpenAI API credentials
It is recommended to use internal_html.csv exports, though internal_all.csv also works

Summary

This is a well-designed n8n workflow that combines SEO tools (Screaming Frog) with AI technology to automate the generation of website content index files compliant with modern LLM standards. The workflow offers excellent user experience, robust multilingual support, and flexible extensibility, making it suitable for websites of all sizes.