Skip to content
gqlxj1987's Blog
Go back

ETL pipline in Python

Edit page

原文链接

Pre-Processing steps in NLP:

  1. Normalisation.
  2. Remove stop words, punctuation and HTML.
  3. Tokenisation.
  4. Lemmatisation
  5. TF-IDF.

It is a widely used technique when trying to quantify what a document is about and tends to be used with algorithms such as Gaussian Mixture Models (GMM), K-means or Latent Dirichlet Allocation (LDA).

打着ETL的牌子。。其实上只是一些简单的处理


Edit page
Share this post on:

Previous Post
新手机
Next Post
Functional Reactive Programming