One of the most important aspects of building any supervised learning model on numeric data is to understand the features well. Looking at partial dependence plots of a model helps you understand how the model’s output changes with any feature.
- Feature understanding
- Identifying noisy features (the most interesting part!)
- Feature engineering
- Feature importance
- Feature debugging
- Leakage detection and understanding
- Model monitoring
We can see that higher the trend-correlation threshold to drop features, higher is the leaderboard (LB) AUC.
关于一些概念的理解
例如feature debugging
- Checking if the feature’s population distribution looks right. I’ve personally encountered extreme cases like above numerous times due to minor bugs.
- Always hypothesize what the feature trend will look like before looking at these plots. Feature trend not looking like what you expected might hint towards some problem. And frankly, this process of hypothesizing trends makes building ML models much more fun!
leakage detection部分
Data leakage from target to features leads to overfitting. Leaky features have high feature importance.
有监督的学习部分,特征的探索