Home
Login

MediaCrawler is a powerful multi-platform social media crawler tool.

NOASSERTIONPython 23.7kNanmiCoder Last Updated: 2025-06-20

MediaCrawler - Multi-Platform Social Media Crawler

Project Overview

MediaCrawler is a powerful multi-platform social media crawler developed and maintained by NanmiCoder. This project is based on Playwright technology and can crawl public information from multiple mainstream social media platforms, including content and comments.

Technical Architecture

Core Technologies

  • Playwright: A browser automation framework that retains the logged-in browser environment.
  • Python: The primary development language, requiring version 3.9.6+.
  • JavaScript Execution: Obtains encrypted parameters by executing JS expressions.
  • Node.js: Requires version 16+.

Working Principle

The project uses Playwright as a bridge, retaining the context browser environment after successful login, and obtains some encrypted parameters by executing JavaScript expressions. This approach avoids the complex work of reproducing the core encrypted JS code, greatly reducing the difficulty of reverse engineering.

Environment Requirements

System Requirements

  • Python 3.9.6+
  • Node.js 16+

Dependency Management

The project has added uv to manage project dependencies. You can use uv to replace the traditional pip for dependency installation, which is more convenient and faster.

Installation and Deployment

Basic Installation Steps

# Enter the project root directory
cd MediaCrawler

# Create a virtual environment
python -m venv venv

# Activate the virtual environment
# macOS & Linux
source venv/bin/activate
# Windows
venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install playwright browser
playwright install

Database Initialization (Optional)

# Execute database initialization (only for the first time)
python db.py

Usage

Basic Commands

# Keyword search crawling
python main.py --platform xhs --lt qrcode --type search

# Crawl by specifying post ID
python main.py --platform xhs --lt qrcode --type detail

# View help information
python main.py --help

Configuration Instructions

  • The project does not enable comment crawling mode by default.
  • If you need to crawl comments, please modify the ENABLE_GET_COMMENTS variable in config/base_config.py.
  • Other function configurations can also be viewed in config/base_config.py, all with Chinese comments.

Data Storage

Supported Storage Methods

  1. MySQL Database: Supports relational database storage (requires pre-creation of the database).
  2. CSV File: Saved to CSV format files in the data/ directory.
  3. JSON File: Saved to JSON format files in the data/ directory.

Pro Version Advantages

The project also provides the MediaCrawlerPro version, which has the following advantages over the open-source version:

  • Multi-Account + IP Proxy Support (Key Feature)
  • Removes Playwright Dependency, making it easier to use
  • Supports Linux Environment usage
  • Code Refactoring Optimization, easier to read and maintain
  • Decoupled JS Signature Logic, higher code quality
  • Perfect Architectural Design, easier to extend
  • Added Social Media Video Downloader desktop software
  • Supports Multi-Platform Homepage Feed Recommendations (HomeFeed)

Legal Disclaimer

Disclaimer

  • This project is for learning and research purposes only and is prohibited for commercial use.
  • It is strictly forbidden to use it for illegal purposes or to infringe upon the legitimate rights and interests of others.
  • Users must abide by relevant laws and regulations and bear their own legal responsibilities.
  • The developer is not responsible for any legal liability arising from the use of this project.

Project Value

MediaCrawler is not just a crawler tool, but also an excellent learning project:

  1. Architecture Design Learning: The project has a mature architectural design, which is worth learning from.
  2. Technical Practice: Covers the comprehensive application of various technology stacks.
  3. Engineering Thinking: Complete engineering practice from code organization to deployment.
  4. Anti-Crawling Technology: Learn about solutions to modern anti-crawling technologies.