[1] Sebastian Spiegler. Statistics of the Common Crawl Corpus 2012, June 2013. URL https://docs.google.com/file/d/ 1_9698uglerxB9nAglvaHkEgU-iZNm1TvVGuCW7245-WGvZq47teNpb_uL5N9.
R. Smith. Tesseract OCR引擎概述。在第九届国际文件分析与识别会议(ICDAR 2007)第2卷,页码629-633,巴西帕拉纳库里蒂巴,2007年9月。IEEE。ISBN 978-0-7695-2822-9。doi: 10.1109/ICDAR.2007.4376991。URL http://ieeexplore.ieee.org/document/4376991/。ISSN: 。
Kyle Lo、Lucy Lu Wang、Mark Neumann、Rodney Kinney和Daniel Weld。S2ORC:语义学者开放研究语料库。在第58届计算语言学协会年会论文集中,页面4969-4983,2020年7月在线举行。计算语言学协会。doi:10.18653/v1/2020.acl-main.447。URL https://aclanthology.org/2020.acl-main.447。
[7] Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, and Furu Wei. TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models, September 2022. URL http://arxiv.org/abs/2109.10282. arXiv:2109.10282 [cs].
[8] Daniel Hernandez Diaz, Siyang Qin, Reeve Ingle, Yasuhisa Fujii, and Alessandro Bissacco. Rethinking Text Line Recognition Models, April 2021. URL http://arxiv.org/abs/2104.07787. arXiv:2104.07787 [cs].
Scott MacLean和George Labahn。一种利用关系语法和模糊集识别手写数学的新方法。《国际文档分析与识别杂志》(IJDAR),16(2):139-163,2013年6月。ISSN 1433-2825。doi:10.1007/s10032-012-0184-x。URL https://doi.org/10.1007/s10032-012-0184-x。
Ahmad-Montaser Awal, Harold Mouchre和Christian Viard-Gaudin。一种在线手写数学表达识别系统的全局学习方法。模式识别信件,35(C):68-77,2014年1月。ISSN 0167-8655。
Francisco Álvaro, Joan-Andreu Sánchez和José-Miguel Benedí。使用2D随机上下文无关文法和隐马尔可夫模型识别在线手写数学表达式。模式识别信件,35:58-67,2014年1月。ISSN 0167-8655。doi: 10.1016/j.patrec.2012.09.023。URL https://www.sciencedirect.com/science/article/pii/S016786551200308X。
Anh Duc Le和Masaki Nakagawa。通过生成的模式训练端到端系统进行手写数学表达式识别。在2017年第14届IAPR国际文件分析与识别会议(ICDAR)中,卷01,页面1056-1061,2017年11月。doi:10.1109/ICDAR.2017.175。ISSN:2379-2140。
Sumeet S. Singh. 教机器编码:具有视觉注意力的神经标记生成,2018年6月。网址http://arxiv.org/abs/1802.05415。arXiv:1802.05415 [cs]。
Jianshu Zhang, Jun Du和Lirong Dai. 多尺度密集编码器的手写数学表达式识别, 2018年1月. 网址http://arxiv.org/abs/1801.03530. arXiv:1801.03530 [cs].
[21] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention Is All You Need, December 2017. URL http://arxiv.org/abs/1706.03762. arXiv:1706.03762 [cs].
[23] Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, and Lidong Zhou. LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding, January 2022. URL http://arxiv.org/abs/2012.14740. arXiv:2012.14740 [cs].
[26] Srikar Appalaraju, Bhavan Jasani, Bhargava Urala Kota, Yusheng Xie, and R. Manmatha. DocFormer: Endto-End Transformer for Document Understanding, September 2021. URL http://arxiv.org/abs/2106.11539. arXiv:2106.11539 [cs].
Bodhisattwa Prasad Majumder, Navneet Potti, Sandeep Tata, James Bradley Wendt, 郑琦, 和 Marc Najork. 从表单类文档中进行信息提取的表示学习。在第58届计算语言学协会年会论文集中,页码6495-6504,2020年7月在线举行。计算语言学协会。doi: 10.18653/v1/2020.acl-main.580。URL https://aclanthology.org/2020.acl-main.580。
[28] Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, and Seunghyun Park. OCR-free Document Understanding Transformer, October 2022. URL http://arxiv.org/abs/2111.15664. arXiv:2111.15664 [cs].
Brian Davis, Bryan Morse, Bryan Price, Chris Tensmeyer, Curtis Wigington和Vlad Morariu. 使用Dessurt进行端到端文档识别和理解,2022年6月。URL http://arxiv.org/abs/2203.16618. arXiv:2203.16618 [cs]。
[30] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, August 2021. URL http://arxiv.org/abs/ 2103.14030. arXiv:2103.14030 [cs].
[31] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, June 2021. URL http://arxiv.org/abs/ 2010.11929. arXiv:2010.11929 [cs].
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov和Luke Zettlemoyer。BART: 自然语言生成、翻译和理解的去噪序列到序列预训练,2019年10月。URL http://arxiv.org/abs/1910.13461。arXiv:1910.13461 [cs, stat]。
Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas Scialom, Anthony Hartshorn, Elvis Saravia, Andrew Poulton, Viktor Kerkez和Robert Stojnic。Galactica: 一个用于科学的大型语言模型,2022年11月。URL http://arxiv.org/abs/2211.09085。arXiv:2211.09085 [cs, stat]。
[34] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization, January 2019. URL http://arxiv.org/ abs/1711.05101. arXiv:1711.05101 [cs, math] version: 3.
[36] Alexander Buslaev, Vladimir I. Iglovikov, Eugene Khvedchenya, Alex Parinov, Mikhail Druzhinin, and Alexandr A. Kalinin. Albumentations: Fast and Flexible Image Augmentations. Information, 11(2):125, February 2020. ISSN 2078-2489. doi: 10.3390/info11020125. URL https://www.mdpi.com/2078-2489/11/2/125.
Ali Furkan Biten、Rubèn Tito、Lluis Gomez、Ernest Valveny和Dimosthenis Karatzas。OCR-IDL:工业文档库数据集的OCR注释,2022年2月。URL http://arxiv.org/abs/2202.12985。arXiv:2202.12985 [cs]。
Christopher Clark和Santosh Divvala. PDFFigures 2.0: 从研究论文中挖掘图表。在第16届ACM/IEEE-CS数字图书馆联合会议论文集中,第143-152页,美国新泽西州纽瓦克,2016年6月。ACM。ISBN 978-1-4503-4229-2。doi: 10.1145/2910896.2910904。URL https://dl.acm.org/doi/10. 1145/2910896.2910904。
V. Levenshtein. 能够纠正删除、插入和颠倒的二进制码。苏联物理学报告,1965年。URL https://www.semanticscholar.org/paper/Binary-codes-capable-of-correcting-deletions% 2C-and-Levenshtein/b2f8876482c97e804bb50a5e2433881ae31d0cdd。
Zellig S. Harris. 分布结构. WORD, 10(2-3):146-162, 1954. doi: 10.1080/00437956. 1954.11659520. URL https://doi.org/10.1080/00437956.1954.11659520. 出版商: Routledge _eprint: https://doi.org/10.1080/00437956.1954.11659520.
[41] Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, and Ari S. Morcos. Beyond neural scaling laws: beating power law scaling via data pruning, November 2022. URL http://arxiv.org/abs/2206.14486. arXiv:2206.14486 [cs, stat].