DevilKing's blog

冷灯看剑,剑上几分功名?炉香无需计苍生,纵一穿烟逝,万丈云埋,孤阳还照古陵

0%

ETL pipline in Python

原文链接

Pre-Processing steps in NLP:

  1. Normalisation.
  2. Remove stop words, punctuation and HTML.
  3. Tokenisation.
  4. Lemmatisation
  5. TF-IDF.

It is a widely used technique when trying to quantify what a document is about and tends to be used with algorithms such as Gaussian Mixture Models (GMM), K-means or Latent Dirichlet Allocation (LDA).

打着ETL的牌子。。其实上只是一些简单的处理