Stage 3: Data and Feature Engineering
A curated list of feature engineering techniques and resources for machine learning, covering feature engineering methods and tools for various data types such as numerical, text, image, categorical, and time series.
Awesome Feature Engineering Project Introduction
Project Overview
Awesome Feature Engineering is a curated list specifically collecting machine learning feature engineering technical resources. The project is maintained by Andrei Khobnia and follows the Creative Commons Attribution-Noncommercial-ShareAlike 3.0 Unported License.
This project provides a comprehensive repository of feature engineering techniques for machine learning practitioners, covering methods and tools for various data types.
Main Content Categories
1. Numeric Data
Data Transformation:
- Box-Cox Transformation:
scipy.stats.boxcox
- Log Transformation:
np.log (x + const)
- Box-Cox Transformation:
Automated Feature Engineering:
Featuretools
: For automated feature engineering
Feature Interaction:
sklearn.preprocessing.PolynomialFeatures
: Polynomial feature generation- Division operations
- Other interactive features
2. Textual Data
Bag-of-words Model:
- Bag-of-words model
- A Gentle Introduction to the Bag-of-Words Model
sklearn.feature_extraction.text.CountVectorizer
sklearn.feature_extraction.DictVectorizer
sklearn.feature_extraction.FeatureHasher
Word Embedding Techniques:
Feature Extraction Techniques:
3. Image Data
Traditional Feature Extraction:
Deep Learning Feature Extraction:
4. Categorical Data
One-Hot Encoding:
- Why One-Hot Encode Data in Machine Learning?
- How to One Hot Encode Sequence Data in Python
sklearn.preprocessing.OneHotEncoder
Keras - to_categorical
Target Encoding:
Feature Hashing:
5. Time Series Data
- Automatic Feature Extraction:
6. Geospatial Data
- Includes feature engineering techniques related to geographical location.
Project Features
- Comprehensiveness: Covers major data types in machine learning and their corresponding feature engineering techniques.
- Practicality: Provides specific tool libraries and code implementations.
- Open Source: Adopts an open-source license, welcoming community contributions.
- Authoritativeness: Links to authoritative documentation, tutorials, and academic resources.
- Actionability: Offers specific Python libraries and function call methods.
Value Proposition
This project is particularly valuable for the following groups:
- Machine Learning Engineers
- Data Scientists
- Feature Engineering Researchers
- Machine Learning Beginners
- Practitioners looking to improve model performance
Contribution Methods
The project encourages community contributions; new resources can be added or existing content improved by creating pull requests.
Summary
The Awesome Feature Engineering project provides a comprehensive and practical resource library for machine learning feature engineering, serving as an important reference for learning and applying feature engineering techniques. Through systematic categorization and rich resource links, it helps practitioners quickly find suitable feature engineering methods for specific data types.