Data Augmentation in NLP

 

Word Substitution

 

  1. Synonym-based substitution

 

  1. Word embedding substitution

  1. Masked language model

  1. TF-IDF-based word substitution

The basic idea is that words with a low TF-IDF score are meaningless, so they can be replaced without affecting the true label of the sentence.

 

Back Translation

 

Text Surface Transformation

 

Random Noise Injection

 

  1. Misspelling injection

  1. QWERTY keyboard error injection

  1. empty noise injection

  1. Random injection

Choose a random word from sentences that are not stop words. Then, find its synonyms and insert them at random positions in the sentence.

  1. Sentence reorganization

 

Syntax Tree

 

reference

https://blog.csdn.net/lqfarmer/article/details/107006551

Logo

CSDN联合极客时间,共同打造面向开发者的精品内容学习社区,助力成长!

更多推荐