General and Domain-adaptive Chinese Spelling Check with Error-consistent Pretraining

Qi Lv; Ziqiang Cao; Lei Geng; Chunhui Ai; Xu Yan; Guohong Fu

doi:10.1145/3564271

Loading next page...

References (37)

Wei Wu, Yuxian Meng, Fei Wang, Qinghong Han, Muyu Li, Xiaoya Li, J. Mei, Ping Nie, Xiaofei Sun, Jiwei Li (2019)
Glyce: Glyph-vectors for Chinese Character Representations
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova (2019)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Shulin Liu, Tao Yang, Tianchi Yue, Feng Zhang, Di Wang (2021)
PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction
Lei Gu, Yong Wang, Xitao Liang (2014)
Introduction to NJUPT Chinese Spelling Check Systems in CLP-2014 Bakeoff
Yuen-Hsien Tseng, Lung-Hao Lee, Li-ping Chang, Hsin-Hsi Chen (2015)
Introduction to SIGHAN 2015 Bake-off for Chinese Spelling Check
Raymond Susanto, Shamil Chollampatt, Liling Tan (2020)
Lexically Constrained Neural Machine Translation with Levenshtein Transformer
Yih-Ru Wang, Y. Liao, Yeh-Kuang Wu, Liang-Chun Chang (2013)
Conditional Random Field-based Parser and Language Model for Tradi-tional Chinese Spelling Checker
Jianyong Duan, Lijian Pan, Hao Wang, Mei Zhang, Mingli Wu (2019)
Automatically Build Corpora for Chinese Spelling Check Based on the Input Method
Piji Li, Shuming Shi (2021)
Tail-to-Tail Non-Autoregressive Sequence Prediction for Chinese Grammatical Error Correction
Ruiqing Zhang, Chao Pang, Chuanqiang Zhang, Shuohuan Wang, Zhongjun He, Yu Sun, Hua Wu, Haifeng Wang (2021)
Correcting Chinese Spelling Errors with Phonetic Pre-training
Minh-Thuan Nguyen, G. Ngo, Nancy Chen (2020)
Domain-Shift Conditioning Using Adaptable Filtering Via Hierarchical Embeddings for Robust Chinese Spell Check
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29
Shaohua Zhang, Haoran Huang, Jicong Liu, Hang Li (2020)
Spelling Error Correction with Soft-Masked BERT
Chris Hokamp, Qun Liu (2017)
Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search
Li Huang, Junjie Li, Weiwei Jiang, Zhiyu Zhang, Minchuan Chen, Shaojun Wang, Jing Xiao (2021)
PHMOSpell: Phonological and Morphological Knowledge Guided Chinese Spelling Check
Junjie Yu, Zhenghua Li (2014)
Chinese Spelling Error Detection and Correction Based on Language Model, Pronunciation, and Shape
I. Loshchilov, F. Hutter (2017)
Decoupled Weight Decay Regularization
Georgiana Dinu, Prashant Mathur, Marcello Federico, Yaser Al-Onaizan (2019)
Training Neural Machine Translation to Apply Terminology Constraints
ArXiv, abs/1906.01105
Kai Song, Yue Zhang, Heng Yu, Weihua Luo, Kun Wang, Min Zhang (2019)
Code-Switching for Enhancing NMT with Pre-Specified Translation
ArXiv, abs/1904.09107
Shih-Hung Wu, Chao-Lin Liu, Lung-Hao Lee (2013)
Chinese Spelling Check Evaluation at SIGHAN Bake-off 2013
Heng-Da Xu, Zhongli Li, Qingyu Zhou, Chao Li, Zizhen Wang, Yunbo Cao, Heyan Huang, Xian-Ling Mao (2021)
Read, Listen, and See: Leveraging Multimodal Information Helps Chinese Spell Checking
Liang-Chih Yu, Lung-Hao Lee, Yuen-Hsien Tseng, Hsin-Hsi Chen (2014)
Overview of SIGHAN 2014 Bake-off for Chinese Spelling Check
Dingmin Wang, Yi Tay, Li Zhong (2019)
Confusionset-guided Pointer Networks for Chinese Spelling Check
Dingmin Wang, Yan Song, Jing Li, Jialong Han, Haisong Zhang (2018)
A Hybrid Approach to Automatic Corpus Generation for Chinese Spelling Check
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan Gomez, Lukasz Kaiser, Illia Polosukhin (2017)
Attention is All you Need
Ying Jiang, Tong Wang, Tao Lin, Fangjie Wang, Wenting Cheng, Xiaofei Liu, Chenghui Wang, Weijian Zhang (2012)
A rule based Chinese spelling and grammar detection system utility
2012 International Conference on System Science and Engineering (ICSSE)
Yinghui Li, Qingyu Zhou, Y. Li, Zhongli Li, Ruiyang Liu, Rongyi Sun, Zizhen Wang, Chao Li, Yunbo Cao, Haitao Zheng (2022)
The Past Mistake is the Future Wisdom: Error-driven Contrastive Probability Optimization for Chinese Spell Checking
ArXiv, abs/2203.00991
Elise Michon, J. Crego, Jean Senellart (2020)
Integrating Domain Terminology into Neural Machine Translation
E. Hasler, A. Gispert, Gonzalo Iglesias, B. Byrne (2018)
Neural Machine Translation Decoding with Terminology Constraints
Chao-Lin Liu, Min-Hua Lai, Yi-Hsuan Chuang, Chia-Ying Lee (2010)
Visually and Phonologically Similar Characters in Incorrect Simplified Chinese Words
J. Crego, Jungi Kim, Guillaume Klein, Anabel Rebollo, Kathy Yang, Jean Senellart, Egor Akhanov, Patrice Brunelle, Aurelien Coquard, Yongchao Deng, Satoshi Enoue, Chiyo Geiss, Joshua Johanson, Ardas Khalsa, Raoum Khiari, Byeongil Ko, Catherine Kobus, Jean Lorieux, L. Martins, Dang-Chuan Nguyen, A. Priori, Thomas Riccardi, N. Segal, Christophe Servan, Cyril Tiquet, Bo Wang, Jin Yang, Dakun Zhang, Jing Zhou, Peter Zoldan (2016)
SYSTRAN's Pure Neural Machine Translation Systems
ArXiv, abs/1610.05540
Xingyi Cheng, Weidi Xu, Kunlong Chen, Shaohua Jiang, Feng Wang, Taifeng Wang, Wei Chu, Yuan Qi (2020)
SpellGCN: Incorporating Phonological and Visual Similarities into Language Models for Chinese Spelling Check
Z. Han, Chengguo Lv, Qiansheng Wang, G. Fu (2019)
Chinese Spelling Check based on Sequence Labeling
2019 International Conference on Asian Language Processing (IALP)
Baoxin Wang, Wanxiang Che, Dayong Wu, Shijin Wang, Guoping Hu, Ting Liu (2021)
Dynamic Connected Networks for Chinese Spelling Check
Yuzhong Hong (2019)
FASPell: A Fast, Adaptable, Simple, Powerful Chinese Spell Checker Based On DAE-Decoder Paradigm
Chong Li, Ce Zhang, Xiaoqing Zheng, Xuanjing Huang (2021)
Exploration and Exploitation: Two Ways to Improve Chinese Spelling Correction Models
ArXiv, abs/2105.14813
Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare Voss, Jiawei Han (2017)
Automated Phrase Mining from Massive Text Corpora
IEEE Transactions on Knowledge and Data Engineering, 30
Zuyi Bao, Chen Li, Rui Wang (2020)
Chunk-based Chinese Spelling Check with Global Optimization

Publisher: Association for Computing Machinery
Copyright: Copyright © 2023 Association for Computing Machinery.
ISSN: 2375-4699
eISSN: 2375-4702
DOI: 10.1145/3564271
Publisher site: See Article on Publisher Site

Abstract

The lack of label data is one of the significant bottlenecks for Chinese Spelling Check. Existing researches use the automatic generation method by exploiting unlabeled data to expand the supervised corpus. However, there is a big gap between the real input scenario and automatically generated corpus. Thus, we develop a competitive general speller ECSpell, which adopts the Error-consistent masking strategy to create data for pretraining. This error-consistency masking strategy is used to specify the error types of automatically generated sentences consistent with the real scene. The experimental result indicates that our model outperforms previous state-of-the-art models on the general benchmark.Moreover, spellers often work within a particular domain in real life. Due to many uncommon domain terms, experiments on our built domain-specific datasets show that general models perform terribly. Inspired by the common practice of input methods, we propose to add an alterable user dictionary to handle the zero-shot domain-adaption problem. Specifically, we attach a User Dictionary guided inference module (UD) to a general token classification-based speller. Our experiments demonstrate that ECSpellUD, namely, ECSpell combined with UD, surpasses all the other baselines broadly, even approaching the performance on the general benchmark.1

Journal

ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) – Association for Computing Machinery

Published: May 9, 2023

Keywords: Chinese spelling check

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

General and Domain-adaptive Chinese Spelling Check with Error-consistent Pretraining

General and Domain-adaptive Chinese Spelling Check with Error-consistent Pretraining

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

General and Domain-adaptive Chinese Spelling Check with Error-consistent Pretraining

General and Domain-adaptive Chinese Spelling Check with Error-consistent Pretraining

References (37)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies