N8N Workflow for Generating CSV Files Using GPT-4

Prepare CSV files with GPT-4

An automated workflow that leverages OpenAI GPT-4 to generate fictional user data and export it in bulk as CSV files.

11 NodesAI & MLAI data generation CSV export automated testing

Workflow Overview

This is an n8n automation workflow that uses GPT-4 to generate random user data and export it as CSV files. The workflow calls the OpenAI API to create fictional user information, processes this data, and saves it as multiple CSV files to the local disk.

Detailed Workflow Steps

1. Trigger Phase

  • Node: When clicking "Execute Workflow" (Manual Trigger)
  • Function: Manually initiates the entire workflow execution by clicking
  • Position: Starting node of the workflow

2. Data Generation Phase

  • Node: OpenAI
  • Type: n8n-nodes-base.openAi
  • Configuration:
    • Model used: GPT-4
    • Number of generations: 3 API calls
    • Max tokens: 2500
    • Temperature: 1 (increases randomness)
  • Prompt Content: Instructs GPT-4 to generate a JSON array of 10 random users containing the following fields:
    • user_name: Fictional character name (first and last name start with the same letter)
    • user_email: Email address
    • subscribed: Subscription status (boolean)
    • date_subscribed: Subscription date (a random date before 2023-10-01 if subscribed)

3. Batch Processing Phase

  • Node: Split In Batches
  • Function: Splits the 3 OpenAI responses into individual batches for processing
  • Batch Size: 1
  • Purpose: Ensures each generated user list is processed separately and saved as an independent CSV file

4. Data Parsing Phase

  • Node: Parse JSON
  • Type: Set Node
  • Function: Parses the JSON string returned by OpenAI into an actual JSON object
  • Operation: Extracts the message.content field and parses it into an array

5. Data Expansion Phase

  • Node: Make JSON Table
  • Type: Item Lists Node
  • Function: Expands each user object in the JSON array into individual data items
  • Field: Expands the "content" field

6. CSV Conversion Phase

  • Node: Convert to CSV
  • Type: Spreadsheet File Node
  • Configuration:
    • Output format: CSV
    • Filename: funny_names_[index].csv (index starts from 1)
    • Include header row: Yes

7. Data Cleanup Phase

7.1 Remove BOM Bytes

  • Node: Strip UTF BOM bytes
  • Type: Move Binary Data Node
  • Function: Removes UTF-8 BOM (Byte Order Mark) bytes
  • Importance: Prevents encoding issues when reading CSV files

7.2 Create Valid Binary

  • Node: Create valid binary
  • Type: Move Binary Data Node
  • Configuration:
    • Mode: JSON to binary
    • Encoding: UTF-8
    • MIME type: text/csv
    • Do not add BOM
  • Function: Converts the processed data into a properly formatted binary file

8. File Saving Phase

  • Node: Save to Disk
  • Type: Write Binary File Node
  • Path: ./.n8n/funny_names_[index].csv
  • Function: Saves the generated CSV files to the n8n working directory
  • Loop: After saving, returns to the Split In Batches node to process the next batch

Workflow Features

Advantages

  1. High Automation: Generates multiple CSV files filled with random data with a single click
  2. Standardized Data Format: Produces clearly structured data suitable for real-world business scenarios
  3. Batch Processing Capability: Can generate multiple distinct datasets simultaneously
  4. Robust Encoding Handling: Specifically addresses BOM byte issues to ensure file compatibility

Use Cases

  • Test data generation
  • Populating development environments
  • Demonstrations and training purposes
  • Testing CSV file processing workflows

Fixed Data Notes

The workflow includes 3 sets of pre-generated test data (pinData), each containing 10 user records. This data is already embedded in the OpenAI node for easy testing and demonstration without requiring repeated API calls.

Important Notes

  1. A valid OpenAI API credential must be configured
  2. Ensure the .n8n directory exists and has write permissions
  3. BOM byte handling is critical for cross-platform CSV file compatibility
  4. Generated data is fictional and intended solely for testing purposes