Data Augmentation in NLP
Data Augmentation in NLPWord SubstitutionSynonym-based substitutionWord embedding substitutionMasked language modelTF-IDF-based word substitutionThe basic idea is that words with a low TF-IDF score ar
·
Data Augmentation in NLP
Word Substitution
- Synonym-based substitution
- Word embedding substitution
- Masked language model
- TF-IDF-based word substitution
The basic idea is that words with a low TF-IDF score are meaningless, so they can be replaced without affecting the true label of the sentence.
Back Translation
Text Surface Transformation
Random Noise Injection
- Misspelling injection
- QWERTY keyboard error injection
- empty noise injection
- Random injection
Choose a random word from sentences that are not stop words. Then, find its synonyms and insert them at random positions in the sentence.
- Sentence reorganization
Syntax Tree
reference
更多推荐
已为社区贡献3条内容
所有评论(0)