Data Augmentation in NLP
Data Augmentation in NLPWord SubstitutionSynonym-based substitutionWord embedding substitutionMasked language modelTF-IDF-based word substitutionThe basic idea is that words with a low TF-IDF score ar
·
Data Augmentation in NLP
Word Substitution
- Synonym-based substitution

- Word embedding substitution


- Masked language model

- TF-IDF-based word substitution
The basic idea is that words with a low TF-IDF score are meaningless, so they can be replaced without affecting the true label of the sentence.

Back Translation

Text Surface Transformation

Random Noise Injection
- Misspelling injection

- QWERTY keyboard error injection

- empty noise injection
![]()
- Random injection
Choose a random word from sentences that are not stop words. Then, find its synonyms and insert them at random positions in the sentence.

- Sentence reorganization

Syntax Tree

reference
更多推荐



所有评论(0)