Zeyu Han , Chao Gao , Jinyang Liu , Jeff (Jun) Zhang , and Sai Qian Zhang 东北大学 加州大学河滨分校 亚利桑那州立大学 纽约大学{han.zeyu,liu.jinyan}@northeastern.edu, cgao037@ucr.edu, jeffzhang@asu.edu, sai.zhang@nyu.edu
图像分类:在目标视觉数据集上进行图像分类是一种非常普遍的需求,并有着广泛的应用。多种方法利用 PEFT 技术实现了高效的模型调整 [186]、[182]、[187]、[188]。例如,AdaptFormer [187] 在原始 ViT 模型的 FFN 中并行插入适配器模块,用于视觉识别任务。VPT(视觉提示调整)[186] 在每个转换器层的输入序列中预置了少量特定任务参数。将 ViT 应用于下游任务时,只有这些添加的参数和分类头被设置为可训练。文献[189]指出,与监督式 ViT 相比,自监督式 ViT 的 VPT 通常表现不佳。进一步的分析表明,不同的预训练方法和下游任务对不同位置的变压器块具有不同程度的依赖性。为解决这一问题,研究为 ViT 块引入了可调整的门。这些门可动态调节提示标记对 ViT 块的贡献,从而使模型更有针对性地适应手头的任务。
视频识别:有几项研究考虑了更具挑战性的适配问题,即将 ViT 移植到领域差距更大的下游任务。例如,ST-Adapter(时空适配器)[190] 和 AIM [191]都在预训练的 ViT 块中插入了适配器层。它们的主要目标是建立时空信息模型,从而实现 ViT 从图像模型到视频任务的高效适配。值得注意的是,这两种方法的性能都超过了传统的全模型微调方法。
扩散模型[219]、[220]是一类生成模型,通过渐进式去噪过程将随机噪音转化为结构化输出,从而学会生成数据。在训练过程中,扩散模型利用去噪网络学习逆转添加到训练数据中的噪声,而在推理过程中,它们从噪声开始,利用去噪网络迭代创建与训练示例相同分布的数据。扩散模型有多种应用[221]、[222]、[223]、[224]、[225],其中最著名的是稳定扩散[226],它以其强大的功能直接从文本描述生成连贯且与上下文相关的图像,在文本和图像之间架起了一座桥梁。许多研究利用 PEFT 技术将预先训练好的扩散模型用于下游任务,包括加快采样速度[227]、[228]、文本到视频的适配[229]、[230]、文本到 3D 的适配[231]等。本节主要关注两种情况:除了基于文本的调节外,还集成了其他输入模式,以及基于预训练扩散模型的定制内容生成。
[1] T. Brown、B. Mann、N. Ryder、M. Subbiah、J. D. Kaplan、P. Dhariwal、A. Neelakantan、P. Shyam、G. Sastry、A. Askell 等:《语言模型是少数学习者》,《神经信息处理系统进展》,第 33 卷,第 1877-1901 页,2020 年。
[2] Y. Zhuang、Y. Yu、K. Wang、H. Sun 和 C. Zhang,"Toolqa:A dataset for question answering with external tools," arXiv preprint arXiv:2306.13304, 2023.
[3] W. Zhu, H. Liu, Q. Dong, J. Xu, L. Kong, J. Chen, L. Li, and S. Huang, "Multilingual machine translation with large language models:Empirical results and analysis," arXiv preprint arXiv:2304.04675, 2023.
[4] M. U. Hadi、R. Qureshi、A. Shah、M. Irfan、A. Zafar、M. Shaikh、N. Akhtar、J. Wu 和 S. Mirjalili,"大型语言模型调查:应用、挑战、限制和实际使用",TechRxiv,2023。
[5] B. Xu、X. Liu、H. Shen、Z. Han、Y. Li、M. Yue、Z. Peng、Y. Liu、Z. Yao 和 D. Xu,"Gentopia:A collaborative platform for tool-augmented arXiv preprint arXiv:2308.04030, 2023.
[6] G. Li、H. A. A. K. Hammoud、H. Itani、D. Khizbullin 和 B. Ghanem,"Camel:大型语言模型社会的 "心智 "探索交流代理》,第三十七届神经信息处理系统大会,2023 年。
[8] H. Zhang、X. Liu 和 J. Zhang,"Summit:Iterative text summarization via chatgpt," arXiv preprint arXiv:2305.14835, 2023.
[9] B. Zhang and R. Sennrich, "Root mean square layer normalization," Advances in Neural Information Processing Systems, vol. 32, 2019.
[10] J. Su、Y. Lu、S. Pan、A. Murtadha、B. Wen 和 Y. Liu,"Roformer:带旋转位置嵌入的增强型变压器",arXiv 预印本 arXiv:2104.09864, 2021。
[11] A. Wang、A. Singh、J. Michael、F. Hill、O. Levy 和 S. R. Bowman,"Glue:A multi-task benchmark and analysis platform for natural language understanding," arXiv preprint arXiv:1804.07461, 2018.
[12] T. Mihaylov、P. Clark、T. Khot 和 A. Sabharwal,"一套盔甲能导电吗?",2018 年 EMNLP 会议。
[13] Y. Bisk、R. Zellers、R. L. Bras、J. Gao 和 Y. Choi,"Piqa:Piqa: Reasoning about physical commonsense in natural language," in Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020.
[14] M. Sap、H. Rashkin、D. Chen、R. LeBras 和 Y. Choi,"Socialiqa:关于社会互动的常识推理》,arXiv 预印本 arXiv:1904.09728, 2019。
[15] R. Zellers、A. Holtzman、Y. Bisk、A. Farhadi 和 Y. Choi,"Hellaswag:机器真的能完成你的句子吗?"《2019 年第 57 届计算语言学协会年会论文集》。
[16] C. e. a. Clark, "Boolq: Exploring the surprising difficulty of natural yes/no questions," in NAACL, 2019.
[17] K. Sakaguchi、R. L. Bras、C. Bhagavatula 和 Y. Choi,"Winogrande:ACM 通信》,第 64 卷,第 9 期,第 99-106 页,2021 年。
[18] P. Clark, I. Cowhey, O. Etzioni, T. Khot, A. Sabharwal, C. Schoenick, and O. Tafjord, "Think you have solved question answering? try arc, the ai2 reasoning challenge," arXiv:1803.05457v1, 2018.
[21] H. Kuehne、H. Jhuang、E. Garrote、T. Poggio 和 T. Serre,"Hmdb:用于人体动作识别的大型视频数据库",2011 年计算机视觉国际会议。IEEE, 2011, pp.
[22] T. -Y.Lin、M. Maire、S. Belongie、J. Hays、P. Perona、D. Ramanan、P. Dollár 和 C. L. Zitnick,"Microsoft coco:语境中的常见对象",计算机视觉-ECCV 2014:13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13.Springer, 2014, pp.
[23] B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, and A. Torralba, "Scene parsing through ade20k dataset," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp.
[24] M. Everingham、L. Van Gool、C. K. Williams、J. Winn 和 A. Zisserman,"帕斯卡尔视觉对象类别(voc)挑战",《国际计算机视觉杂志》,第 88 卷,第 303-338 页,2010 年。
[25] N. Houlsby、A. Giurgiu、S. Jastrzebski、B. Morrone、Q. De Laroussilhe、A. Gesmundo、M. Attariyan 和 S. Gelly,"Parameter-efficient transfer learning for nlp",国际机器学习会议。PMLR, 2019, pp.
[26] J. He, C. Zhou, X. Ma, T. Berg-Kirkpatrick, and G. Neubig, "Towards a unified view of parameter-efficient transfer learning," arXiv preprint arXiv:2110.04366, 2021.
[27] Y. Zhu, J. Feng, C. Zhao, M. Wang, and L. Li, "Counterinterference adapter for multilingual machine translation," arXiv preprint arXiv:2104.08154, 2021.
[28] T. Lei、J. Bai、S. Brahma、J. Ainslie、K. Lee、Y. Zhou、N. Du、V. Y. Zhao、Y. Wu、B. Li 等人,"Conditional adapters:参数高效转移学习与快速推理》,arXiv preprint arXiv:2304.04947, 2023.
[29] J. Pfeiffer、A. Kamath、A. Rücklé、K. Cho 和 I. Gurevych,"Adapterfusion:用于迁移学习的非破坏性任务组合",arXiv 预印本 arXiv:2005.00247, 2020。
[30] Y. Wang, S. Mukherjee, X. Liu, J. Gao, A. H. Awadallah, and J. Gao, "Adamix:Mixture-of-adapter for parameter-efficient tuning of large language models," arXiv preprint arXiv:2205.12410,vol. 1,no. 2,p. 4,2022。
[31] H. Zhao, J. Fu, and Z. He, "Prototype-based hyperadapter for sampleefficient multi-task tuning," arXiv preprint arXiv:2310.11670, 2023.
[32] A. Chronopoulou、M. E. Peters、A. Fraser 和 J. Dodge,"Adaptersoup:Weight averaging to improve generalization of pretrained language models," arXiv preprint arXiv:2302.07027, 2023.
[33] S. He, R.-Z.Fan、L. Ding、L. Shen、T. Zhou 和 D. Tao,"Mera:Merging pretrained adapters for few-shot learning," arXiv preprint arXiv:2308.15982, 2023.
[34] R. K. Mahabadi、S. Ruder、M. Dehghani 和 J. Henderson,《通过共享超网络对变压器进行参数有效的多任务微调》,arXiv 预印本 arXiv:2106.04489,2021 年。
[35] X. L. Li and P. Liang, "Prefix-tuning:Optimizing continuous prompts for generation," arXiv preprint arXiv:2101.00190, 2021.
[36] J. Li、W. Aitken、R. Bhambhoria 和 X. Zhu,"前缀传播:Parameter-efficient tuning for long sequences," arXiv preprint arXiv:2305.12086, 2023.
[37] X. Liu, K. Ji, Y. Fu, W. L. Tam, Z. Du, Z. Yang, and J. Tang, "P-tuning v2:P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks," arXiv preprint arXiv:2110.07602, 2021.
[38] Z. -R.Zhang, C. Tan, H. Xu, C. Wang, J. Huang, and S. Huang, "Towards adaptive prefix tuning for parameter-efficient language model fine-tuning," arXiv preprint arXiv:2305.15212, 2023.
[39] X. Liu, Y. Zheng, Z. Du, M. Ding, Y. Qian, Z. Yang, and J. Tang, "Gpt understands, too," arXiv preprint arXiv:2103.10385, 2021.
[40] B. Lester, R. Al-Rfou, and N. Constant, "The power of scale for parameter-efficient prompt tuning," arXiv preprint arXiv:2104.08691, 2021.
[41] F. Ma、C. Zhang、L. Ren、J. Wang、Q. Wang、W. Wu、X. Quan 和 D. Song,"Xprompt:Xprompt: Exploring the extreme of prompt tuning," arXiv preprint arXiv:2210.04457, 2022.
[42] Z. Wu, S. Wang, J. Gu, R. Hou, Y. Dong, V. Vydiswaran, and H. Ma, "Idpg: An instance-dependent prompt generation method," arXiv preprint arXiv:2204.04497, 2022.
[43] X. Liu, T. Sun, X. Huang, and X.Qiu, "Late prompt tuning:晚期提示可能比多次提示更好",arXiv 预印本 arXiv:2210.11292, 2022。
[44] W. Zhu 和 M. Tan,"Spt:Learning to selectively insert prompts for better prompt tuning," in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp.
[46] T. Vu、B. Lester、N. Constant、R. A1-Rfou 和 D. Cer,"Spot:Better frozen model adaptation through soft prompt transfer," arXiv preprint arXiv:2110.07904, 2021.
[47] Y. Su, X. Wang, Y. Qin, C.-M. Chan, Y. Lin, H. Wang, K. Wen, Z. Liu, P. Li, J. Li et al.Chan、Y. Lin、H. Wang、K. Wen、Z. Liu、P. Li、J. Li 等:"On transferability of prompt tuning for natural language processing," arXiv preprint arXiv:2111.06719, 2021.
[48] J. Wu、T. Yu、R. Wang、Z. Song、R. Zhang、H. Zhao、C. Lu、S. Li 和 R. Henao,"Infoprompt:用于自然语言理解的信息论软提示调整",arXiv preprint arXiv:2306.04933, 2023.
[49] L. Chen, H. Huang, and M. Cheng, "Ptp: Boosting stability and performance of prompt tuning with perturbation-based regularizer," arXiv preprint arXiv:2305.02423, 2023.
[51] J.-Y. Choi, J. Kim, J.-H.Choi、J. Kim、J.-H.Park, W.-L. Mok, and S. Lee, "Smop:Towards efficient and effective prompt tuning with sparse mixture-of-prompts," in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp.
[52] Z. Shi 和 A. Lipani,"Dept:Decomposed prompt tuning for parameterefficient fine-tuning," arXiv preprint arXiv:2309.05173, 2023.
[53] H. Liu, D. Tam, M. Muqeeth, J. Mohta, T. Huang, M. Bansal, and C. A. Raffel, "Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning," Advances in Neural Information Processing Systems, vol. 35, pp.
[54] T. Zadouri、A. Üstün、A. Ahmadian、B. Ermiş、A. Locatelli 和 S. Hooker,《将专家混合物推向极限:用于指令调整的参数效率极高的 moe》,arXiv 预印本 arXiv:2309.05444, 2023。
[55] D. Lian, D. Zhou, J. Feng, and X. Wang, "Scaling & shifting your features:A new baseline for efficient model tuning," Advances in Neural Information Processing Systems, vol. 35, pp.
[57] D. Guo, A. M. Rush, and Y. Kim, "Parameter-efficient transfer learning with diff pruning," arXiv preprint arXiv:2012.07463, 2020.
[58] N. Lawton、A. Kumar、G. Thattai、A. Galstyan 和 G. V. Steeg,"用于对大型预训练语言模型进行参数高效微调的神经架构搜索",arXiv 预印本 arXiv:2305.16597, 2023。
[59] B. Liao, Y. Meng, and C. Monz, "Parameter-efficient fine-tuning without introducing new latency," arXiv preprint arXiv:2305.16742, 2023.
[60] Y.-L. Sung、V. Nair 和 C. A. Raffel,《使用固定稀疏掩码训练神经网络》,《神经信息处理系统进展》,第 34 卷,第 24 193-24 205 页,2021 年。
[61] S. S. S. Das、R. H. Zhang、P. Shi、W. Yin 和 R. Zhang,《通过样本感知动态稀疏微调的统一低资源序列标注》,arXiv 预印本 arXiv:2311.03748, 2023。
[62] A. Ansell, E. M. Ponti, A. Korhonen, and I. Vulić, "Composable sparse fine-tuning for crosslingual transfer," arXiv preprint arXiv:2110.07560, 2021
[63] Z. Fu、H. Yang、A. M. -C.So, W. Lam, L. Bing, and N. Collier, "On the effectiveness of parameter-efficient fine-tuning," in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 11, 2023, pp. .
[64] R. Xu、F. Luo、Z. Zhang、C. Tan、B. Chang、S. Huang 和 F. Huang,"Raise a child in large language model:Towards effective and generalizable fine-tuning," arXiv preprint arXiv:2109.05687, 2021.
[65] D. Vucetic, M. Tayaranian, M. Ziaeefard, J. J. Clark, B. H. Meyer, and W. J. Gross, "Efficient fine-tuning of bert models on the edge," in 2022 IEEE International Symposium on Circuits and Systems (ISCAS).IEEE, 2022, pp.
[66] E. B. Zaken、S. Ravfogel 和 Y. Goldberg,"Bitfit:基于变换器的掩码语言模型的简单参数微调",arXiv 预印本 arXiv:2106.10199, 2021。
[67] M. Gheini, X. Ren, and J. May, "Cross-attention is all you need:Adapting pretrained transformers for machine translation," arXiv preprint arXiv:2104.08771, 2021.
[68] H. He, J. Cai, J. Zhang, D. Tao, and B. Zhuang, "Sensitivity-aware visual parameter-efficient fine-tuning," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp.
[69] A. Aghajanyan, L. Zettlemoyer, and S. Gupta, "Intrinsic dimensionality explains the effectiveness of language model fine-tuning," arXiv preprint arXiv:2012.13255, 2020.
[70] E. J. Hu、Y. Shen、P. Wallis、Z. Allen-Zhu、Y. Li、S. Wang、L. Wang 和 W. Chen,"Lora:Lora: Low-rank adaptation of large language models," arXiv preprint arXiv:2106.09685, 2021
[71] R. Karimi Mahabadi、J. Henderson 和 S. Ruder,"Compacter:Efficient low-rank hypercomplex adapter layers," Advances in Neural Information Processing Systems, vol. 34, pp.
[72] A. Edalati、M. Tahaei、I. Kobyzev、V. P. Nia、J. J. Clark 和 M. Rezagholizadeh,"Krona:使用克朗克尔适配器进行参数高效调整》,arXiv 预印本 arXiv:2212.10650, 2022。
[73] X. He, C. Li, P. Zhang, J. Yang, and X. E. Wang, "Parameter-efficient model adaptation for vision transformers," in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 1, 2023, pp.
[74] D. J. Kopiczko、T. Blankevoort 和 Y. M. Asano,"Vera:基于向量的随机矩阵适应," arXiv 预印本 arXiv:2310.11454, 2023.
[75] S.-Y. Liu, C.-Y.Liu, C.-Y. Wang, H. Yin, P. Molchanov, Y.-C.Wang, H. Yin, P. Molchanov, Y.-C. F. Wang, K.T. Cheng, and M.-H.F.Wang、K.T. Cheng 和 M.-H. Chen,"Dora:Dora.Chen, "Dora:Weight-decomposed low-rank adaptation," arXiv preprint arXiv:2402.09353, 2024.
[76] M. Valipour, M. Rezagholizadeh, I. Kobyzev, and A. Ghodsi, "Dylora:Parameter efficient tuning of pre-trained models using dynamic searchfree low-rank adaptation," arXiv preprint arXiv:2210.07558, 2022.
[77] Q. Zhang, M. Chen, A. Bukharin, P. He, Y. Cheng, W. Chen, and T. Zhao, "Adaptive budget allocation for parameter-efficient finetuning," arXiv preprint arXiv:2303.10512, 2023.
[78] N. Ding, X. Lv, Q. Wang, Y. Chen, B. Zhou, Z. Liu, and M. Sun, "Sparse low-rank adaptation of pre-trained language models," arXiv preprint arXiv:2311.11696, 2023.
[79] S. Haobo、H. Zhao、S. Majumder 和 T. Lin,"免费增加模型容量:参数高效微调的简单策略",《第十二届学习表征国际会议》,2023 年。
[80] R. Zhang、R. Qiang、S. A. Somayajula 和 P. Xie,"Autolora:基于元学习在低阶适应中自动调整矩阵等级》,arXiv 预印本 arXiv:2403.09113, 2024。
[81] A. X. Yang, M. Robeyns, X. Wang, and L. Aitchison, "Bayesian lowrank adaptation for large language models," arXiv preprint
[82] Y. Lin, X. Ma, X. Chu, Y. Jin, Z. Yang, Y. Wang, and H. Mei, "Lora dropout as a sparsity regularizer for overfitting control," arXiv preprint arXiv:2404.09610, 2024
[84] S. Hayou, N. Ghosh, and B. Yu, "Lora+:Efficient low rank adaptation of large models," arXiv preprint arXiv:2402.12354, 2024.
[85] C. Huang、Q. Liu、B. Y. Lin、T. Pang、C. Du 和 M. Lin,"Lorahub:Efficient cross-task generalization via dynamic lora composition," arXiv preprint arXiv:2307.13269, 2023.
[87] W. Feng, C. Hao, Y. Zhang, Y. Han, and H. Wang, "Mixture-of-loras:大型语言模型的高效多任务调整",arXiv 预印本 arXiv:2403.03432, 2024
[88] X. Wu, S. Huang, and F. Wei, "Mixture of lora experts," arXiv preprint arXiv:2404.13628, 2024
[89] D. Li、Y. Ma、N. Wang、Z. Cheng、L. Duan、J. Zuo、C. Yang 和 M. Tang,"Mixlora:用基于 lora 的专家混合物增强大型语言模型微调》,arXiv 预印本 arXiv:2404.15159, 2024。
[90] Y. Mao, L. Mathias, R. Hou, A. Almahairi, H. Ma, J. Han, W.-t.Yih, and M. Khabsa, "Unipelt:A unified framework for parameter-efficient language model tuning," arXiv preprint arXiv:2110.07577, 2021.
[91] J. Chen, A. Zhang, X. Shi, M. Li, A. Smola, and D. Yang, "Parameterefficient fine-tuning design spaces," arXiv preprint arXiv:2301.01821, 2023.
[92] Y. Zhang, K. Zhou, and Z. Liu, "Neural prompt search," 2022.
[93] H. Zhou、X. Wan、I. Vulić 和 A. Korhonen,"Autopeft:参数高效微调的自动配置搜索",arXiv 预印本 arXiv:2301.12132, 2023.
[94] Z. Hu, Y. Lan, L. Wang, W. Xu, E.-P. Lim, R. K.-W.Lim, R. K.-W.Lee, L. Bing, and S. Poria, "Llm-adapters:An adapter family for parameter-efficient finetuning of large language models," arXiv preprint arXiv:2304.01933, 2023.
[95] S. Hu, Z. Zhang, N. Ding, Y. Wang, Y. Wang, Z. Liu, and M. Sun, "Sparse structure search for parameter-efficient tuning," arXiv preprint arXiv:2206.07382, 2022
[96] A. Petrov, P. H. Torr, and A. Bibi, "When do prompting and prefixtuning work? a theory of capabilities and limitations," arXiv preprint arXiv:2310.19698, 2023.
[97] Y. Wang, J. Chauhan, W. Wang, and C.-J. Hsieh, "Universality and limitations of prompt tuning," arXiv preprint arXiv:2305.18787, 2023.
[98] Y. Choi 和 J. -H.Lee, "Codeprompt:用于程序和语言生成的任务区分前缀调整",《计算语言学协会论文集》,ACL 2023,2023 年,第 5282-5297 页:ACL 2023, 2023, pp.
[99] H. Wu and X. Shi, "Adversarial soft prompt tuning for cross-domain sentiment analysis," in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022, pp.
[100] J. Frankle 和 M. Carbin,"彩票假设:Finding sparse, trainable neural networks," arXiv preprint arXiv:1803.03635, 2018.
[101] E. Malach、G. Yehudai、S. Shalev-Schwartz 和 O. Shamir,"证明彩票假设:Pruning is all you need," in International Conference on Machine Learning.PMLR, 2020, pp.
[102] V. Fomenko、H. Yu、J. Lee、S. Hsieh 和 W. Chen,"A note on lora",arXiv preprint arXiv:2404.05086, 2024。
[103] A. Beck and M. Teboulle, "A fast iterative shrinkage-thresholding algorithm for linear inverse problems," SIAM journal on imaging sciences, vol. 2, no. 1, pp.
[104] A. Chambolle、R. A. De Vore、N. -Y.Lee, and B. J. Lucier, "Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage," IEEE Transactions on image processing, vol. 7, no.3, pp.
[105] D. J. MacKay, "A practical bayesian framework for backpropagation networks," Neural computation, vol. 4, no.3, pp.
[106] J. Antorán, D. Janz, J. U. Allingham, E. Daxberger, R. R. Barbano, E. Nalisnick, and J. M. Hernández-Lobato, "Adapting the linearised laplace model evidence for modern deep learning," in International Conference on Machine Learning.PMLR,2022 年,第 796-821 页。
[107] J. Liu, A. Moreau, M. Preuss, J. Rapin, B. Roziere, F. Teytaud, and O. Teytaud, "Versatile black-box optimization," in Proceedings of the 2020 Genetic and Evolutionary Computation Conference, 2020, pp. .
[108] M. Chen、H. Peng、J. Fu 和 H. Ling,"Autoformer:搜索视觉识别变换器",《IEEE/CVF 计算机视觉国际会议论文集》,2021 年,第 12270-12280 页。
[109] P. I. Frazier,"贝叶斯优化教程",arXiv preprint arXiv:1807.02811,2018.
[110] A. Rücklé、G. Geigle、M. Glockner、T. Beck、J. Pfeiffer、N. Reimers 和 I. Gurevych,"Adapterdrop:On the efficiency of adapters in transformers," arXiv preprint arXiv:2010.11918, 2020.
[111] S. He、L. Ding、D. Dong、J. Zhang 和 D. Tao,"SparseAdapter:提高适配器参数效率的简便方法",计算语言学协会论文集:EMNLP 2022。阿拉伯联合酋长国阿布扎比:阿拉伯联合酋长国阿布扎比:计算语言学协会,2022 年 12 月,第 2184-2190 页。[Online].Available: https://aclanthology.org/2022.findings-emnlp.160
[112] L. Hedegaard, A. Alok, J. Jose, and A. Iosifidis, "Structured pruning adapters," arXiv preprint arXiv:2211.10155, 2022.
[114] G. Zeng, P. Zhang, and W. Lu, "One network, many masks:Towards more parameter-efficient transfer learning," arXiv preprint arXiv:2305.17682, 2023.
[115] S. Jie、H. Wang 和 Z.-H.Deng, "Revisiting the parameter efficiency of adapters from the perspective of precision redundancy," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. .
[116] J. Kim、J. H. Lee、S. Kim、J. Park、K. M. Yoo、S. J. Kwon 和 D. Lee,"通过亚 4 位整数量化实现压缩大型语言模型的内存效率微调",arXiv 预印本 arXiv:2305.14152,2023。
[117] T. Dettmers、A. Pagnoni、A. Holtzman 和 L. Zettlemoyer, "Qlora:Efficient finetuning of quantized 1lms," arXiv preprint arXiv:2305.14314, 2023 .
[118] Y. Li, Y. Yu, C. Liang, P. He, N. Karampatziakis, W. Chen, and T. Zhao, "Loftq: Lora-fine-tuning-aware quantization for large language models," arXiv preprint arXiv:2310.08659, 2023.
[119] H. Guo、P. Greengard、E. P. Xing 和 Y. Kim,"Lq-lora:Low-rank plus quantized matrix decomposition for efficient language model finetuning," arXiv preprint arXiv:2311.12023, 2023.
[120] Y. Xu、L. Xie、X. Gu、X. Chen、H. Chang、H. Zhang、Z. Chen、X. Zhang 和 Q. Tian,"Qa-lora:Quantization-aware lowrank adaptation of large language models," arXiv preprint arXiv:2309.14717, 2023.
[121] Y. Chai、J. Gkountouras、G. G. Ko、D. Brooks 和 G.-Y. Wei,"Int2.Wei,"Int2.1: Towards fine-tunable quantized large language models with error correction through lowrank adaptation," arXiv preprint arXiv:2306.08162, 2023
[122] H. Rajabzadeh、M. Valipour、T. Zhu、M. Tahaei、H. J. Kwon、A. Ghodsi、B. Chen 和 M. Rezagholizadeh,"Qdylora:Quantized dynamic lowrank adaptation for efficient large language model tuning," arXiv preprint arXiv:2402.10462, 2024.
[123] J. Liu, G. Xiao, K. Li, J. D. Lee, S. Han, T. Dao, and T. Cai, "Bitdelta: Your fine-tune may only be worth one bit," arXiv preprint arXiv:2402.10193, 2024.
[124] J. O. Zhang, A. Sax, A. Zamir, L. Guibas, and J. Malik, "Sidetuning: a baseline for network adaptation via additive side networks," in Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part III 16.Springer, 2020, pp.
[125] Y.-L. Sung, J. Cho, and M. Bansal, "Lst:Ladder side-tuning for parameter and memory efficient transfer learning," Advances in Neural Information Processing Systems, vol. 35, pp.
[126] Z. Jiang、C. Mao、Z. Huang、A. Ma、Y. Lv、Y. Shen、D. Zhao 和 J. Zhou,"Res-tuning:灵活高效的调谐范式
从骨干网解绑调谐器》,arXiv 预印本 arXiv:2310.19859, 2023。
[127] B. Liao, S. Tan, and C. Monz, "Make your pre-trained model reversible:从参数到内存高效微调》,arXiv preprint arXiv:2306.00477, 2023.
[128] L. Zhang、L. Zhang、S. Shi、X. Chu 和 B. Li,"Lora-fa:Memoryefficient low-rank adaptation for large language models fine-tuning," arXiv preprint arXiv:2308.03303, 2023.
[129] J. Phang、Y. Mao、P. He 和 W. Chen,"Hypertuning:Toward adapting large language models without back-propagation," in International Conference on Machine Learning.PMLR, 2023, pp.
[130] F. Jin, J. Zhang, and C. Zong, "Parameter-efficient tuning for large language model without calculating its gradients," in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp.
[131] S. Malladi, T. Gao, E. Nichani, A. Damian, J. D. Lee, D. Chen, and S. Arora, "Fine-tuning language models with just forward passes," arXiv preprint arXiv:2305.17333, 2023.
[132] J. Zhao、Z. Zhang、B. Chen、Z. Wang、A. Anandkumar 和 Y. Tian,"Galore:通过梯度低秩投影实现记忆高效的 训练",arXiv 预印本 arXiv:2403.03507, 2024。
[133] W. Kwon、Z. Li、S. Zhuang、Y. Sheng、L. Zheng、C. H. Yu、J. Gonzalez、H. Zhang 和 I. Stoica,"使用 pagedattention 为大型语言模型服务的高效内存管理",第 29 届操作系统原理研讨会论文集,2023 年,第 611-626 页。
[134] Y. Sheng, L. Zheng, B. Yuan, Z. Li, M. Ryabinin, B. Chen, P. Liang, C. Ré, I. Stoica, and C. Zhang, "Flexgen: High-throughput generative inference of large language models with a single gpu," in International Conference on Machine Learning.PMLR, 2023, pp.
[135] T. Zhou 和 D. Tao,"Godec:噪声情况下的随机低阶稀疏矩阵分解",第 28 届国际机器学习大会论文集,ICML 2011,2011 年
[136] J. Wright, A. Ganesh, S. Rao, Y. Peng, and Y. Ma, "Robust principal component analysis:Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization," Advances in neural information processing systems, vol. 22, 2009.
[137] A. N. Gomez、M. Ren、R. Urtasun 和 R. B. Grosse,"可逆残差网络:Backpropagation without storing activations," vances in neural information processing systems, vol. 30, 2017.
[138] Y. Huang、Y. Li、Y. Xu、L. Zhang、R. Gan、J. Zhang 和 L. Wang,"Mvp-tuning:多视图知识检索与常识推理的及时调整",《第 61 届计算语言学协会年会论文集》(第 1 卷:长篇论文),2023 年,第 13417-13 页 432
[139] Z. Zhao, L. Hu, H. Zhao, Y. Shao, and Y. Wang, "Knowledgeable parameter efficient tuning network for commonsense question answering," in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023, pp.
[140] H. Zhao, R. He, M. Xiao, and J. Xu, "Infusing hierarchical guidance into prompt tuning:多层次隐式话语关系识别的参数高效框架》,《计算语言学协会第 61 届年会(第 1 卷:长篇论文)论文集》,2023 年,第 6477-6492 页。
[141] Y. Ouyang, Y. Cao, Y. Gao, Z. Wu, J. Zhang, and X. Dai, "On prefixtuning for lightweight out-of-distribution detection," in Proceedings of 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023, pp.
[142] M. S. Ozdayi、C. Peris、J. Fitzgerald、C. Dupuy、J. Majmudar、H. Khan、R. Parikh 和 R. Gupta,"通过提示调整控制从大型语言模型中提取记忆数据",arXiv 预印本 arXiv:2305.11759, 2023
[143] G. Xiao, J. Lin, and S. Han, "Offsite-tuning:Transfer learning without full model," arXiv preprint arXiv:2302.04870, 2023.
[144] T. Che, J. Liu, Y. Zhou, J. Ren, J. Zhou, V. S. Sheng, H. Dai, and D. Dou, "Federated learning of large language models with parameterefficient prompt tuning and adaptive optimization," arXiv preprint arXiv:2310.15080, 2023
[145] Y. Li, M. Du, X. Wang, and Y. Wang, "Prompt tuning pushed farther, contrastive learning pulls closer:A two-stage approach to mitigate social biases," arXiv preprint arXiv:2307.01595, 2023.
[146] J. Cho、J. Lei、H. Tan 和 M. Bansal,"通过文本生成统一视觉和语言任务",国际机器学习会议。PMLR,2021 年,第 1931-1942 页。
[147] D. Zhu, J. Chen, X. Shen, X. Li, and M. Elhoseiny, "Minigpt-4: Enhancing vision-language understanding with advanced large language models," arXiv preprint arXiv:2304.10592, 2023.
[148] H. Liu, C. Li, Q. Wu, and Y. J. Lee, "Visual instruction tuning," arXiv preprint arXiv:2304.08485, 2023.
[149] S. J. Rennie、E. Marcheret、Y. Mroueh、J. Ross 和 V. Goel,"用于图像标题的自关键序列训练",《电气和电子工程师学会计算机视觉和模式识别会议论文集》,2017 年,第 页。
[150] Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo, "Image captioning with semantic attention," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp.
[151] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and tell:从 2015 年 mscoco 图像标题挑战赛中学到的经验," IEEE transactions on pattern analysis and machine intelligence, vol. 39 , no.4 , pp.
[152] M. Z. Hossain、F. Sohel、M. F. Shiratuddin 和 H. Laga,《图像字幕深度学习综合调查》,《ACM Computing Surveys(CsUR)》,第 51 卷,第 6 期,第 1-36 页,2019 年。
[153] P. Wang、Q. Wu、C. Shen、A. Dick 和 A. Van Den Hengel,"Fvqa:基于事实的可视化问题解答》,《IEEE 关于模式分析和机器智能的论文集》,第 40 卷,第 10 期,第 2413-2427 页,2017 年。
[154] Q. Wu、D. Teney、P. Wang、C. Shen、A. Dick 和 A. Van Den Hengel,"视觉问题解答:方法与数据集调查》,《计算机视觉与图像理解》,第 163 卷,第 21-40 页,2017 年。
[155] S. Antol、A. Agrawal、J. Lu、M. Mitchell、D. Batra、C. L. Zitnick 和 D. Parikh,"Vqa:Visual question answering," in Proceedings of the IEEE international conference on computer vision, 2015, pp.
[156] Y.-L. Sung、J. Cho 和 M. Bansal,"V1-adapter:Parameter-efficient transfer learning for vision-and-language tasks," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp.
[157] Z.-Y.Hu, Y. Li, M. R. Lyu, and L. Wang, "Vl-pet:Vision-and-language parameter-efficient tuning via granularity control," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp.
[158] R. Zhang、J. Han、A. Zhou、X. Hu、S. Yan、P. Lu、H. Li、P. Gao 和 Y. Qiao:"Llama-adapter:Llama-adapter:Llama-adapter"。Qiao, "Llama-adapter:Efficient fine-tuning of language models with zero-init attention," arXiv preprint arXiv:2303.16199, 2023.
[160] B. Zhao, H. Tu, C. Wei, J. Mei, and C. Xie, "Tuning layernorm in attention:Towards efficient multi-modalllm finetuning," arXiv preprint arXiv:2312.11420, 2023
[161] S. Lee,"Toward continual learning for conversational agents",arXiv preprint arXiv:1712.09943,2017
[162] C.-H. Chang, M. Kayed, M. R. Girgis, and K. F. Shaalan, "A survey of web information extraction systems," IEEE transactions on knowledge and data engineering, vol. 18, pp.Chang, M. Kayed, M. R. Girgis, and K. F. Shaalan, "A survey of web information extraction systems," IEEE transactions on knowledge and data engineering, vol. 18, no. 10, pp.
[163] W. Yang, Y. Xie, A. Lin, X. Li, L. Tan, K. Xiong, M. Li, and J. Lin, "End-to-end open-domain question answering with bertserini," arXiv preprint arXiv:1902.01718, 2019
[164] J. Kirkpatrick、R. Pascanu、N. Rabinowitz、J. Veness、G. Desjardins、A. A. Rusu、K. Milan、J. Quan、T. Ramalho、A. Grabska-Barwinska 等人,"克服神经网络中的灾难性遗忘",《美国国家科学院院刊》,第 114 卷,第 13 期,第 页。
[165] A. Madotto, Z. Lin, Z. Zhou, S. Moon, P. Crook, B. Liu, Z. Yu, E. Cho, and Z. Wang, "Continual learning in task-oriented dialogue systems," arXiv preprint arXiv:2012.15504, 2020
[166] Q. Zhu, B. Li, F. Mi, X. Zhu, and M. Huang, "Continual prompt tuning for dialog state tracking," arXiv preprint arXiv:2203.06654, 2022.
[167] Y. Dai, H. Lang, Y. Zheng, F. Huang, L. Si, and Y. Li, "Lifelong learning for question answering with hierarchical prompts," arXiv preprint arXiv:2208.14602, 2022.
[168] Z. Liang, F. Wei, Y. Jie, Y. Qian, Z. Hao, and B. Han, "Prompts can play lottery tickets well:通过彩票提示调整实现终身信息提取》,《第 61 届计算语言学协会年会论文集(第 1 卷:长篇论文)》,2023 年,第 277-292 页
[169] X. Wang, T. Chen, Q. Ge, H. Xia, R. Bao, R. Zheng, Q. Zhang, T. Gui, and X. Huang, "Orthogonal subspace learning for language model continual learning," arXiv preprint arXiv:2310.14152, 2023.
[170] S. Chen, S. Wong, L. Chen, and Y. Tian, "Extending context window of large language models via positional interpolation," arXiv preprint arXiv:2306.15595, 2023
[171] Y. Chen、S. Qian、H. Tang、X. Lai、Z. Liu、S. Han 和 J. Jia,"Longlora:Efficient fine-tuning of longcontext large language models," arXiv preprint arXiv:2309.12307, 2023.
[172] J. Yang,"Longqlora:Efficient and effective method to extend context length of large language models," arXiv preprint arXiv:2311.04879, 2023.
[173] S. Tan、X. Li、S. Patil、Z. Wu、T. Zhang、K. Keutzer、J. E. Gonzalez 和 R. A. Popa,"Lloco:离线学习长上下文,"arXiv 预印本 arXiv:2404.07979, 2024
[175] T. Dettmers、M. Lewis、Y. Belkada 和 L. Zettlemoyer,"Gpt3. int8 ():8-bit matrix multiplication for transformers at scale," Advances in Neural Information Processing Systems, vol. 35, pp.
[176] H. Kang、Q. Zhang、S. Kundu、G. Jeong、Z. Liu、T. Krishna 和 T. Zhao,"Gear:1lm 的近无损生成推理的高效 KV 缓存压缩接收器",arXiv 预印本 arXiv:2403.05527, 2024.
[178] A. Steiner, A. Kolesnikov, X. Zhai, R. Wightman, J. Uszkoreit, and L. Beyer, "How to train your vit? data, augmentation, and regularization in vision transformers," arXiv preprint arXiv:2106.10270, 2021.
[179] X. Chen, S. Xie, and K. He, "An empirical study of training selfsupervised vision transformers," in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp.
[180] K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, "Masked autoencoders are scalable vision learners," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. .
[181] M. Dehghani、J. Djolonga、B. Mustafa、P. Padlewski、J. Heek、J. Gilmer、A. P. Steiner、M. Caron、R. Geirhos、I. Alabdulmohsin 等人,"将视觉转换器扩展至 220 亿个参数",国际机器学习会议。PMLR,2023 年,第 7480-7512 页。
[182] Z. Chen, Y. Duan, W. Wang, J. He, T. Lu, J. Dai, and Y. Qiao, "Vision transformer adapter for dense predictions," arXiv preprint arXiv:2205.08534, 2022.Qiao, "Vision transformer adapter for dense predictions," arXiv preprint arXiv:2205.08534, 2022
[183] Z. Wang、Z. Zhang、C.Lee, H. Zhang, R. Sun, X. Ren, G. Su, V. Perot, J. Dy, and T. Pfister, "Learning to prompt for continual learning," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp.
[184] Q. Gao, C. Zhao, Y. Sun, T. Xi, G. Zhang, B. Ghanem, and J. Zhang, "A unified continual learning framework with general parameter-efficient tuning," arXiv preprint arXiv:2303.10070, 2023.
[185] L. Ren, C. Chen, L. Wang, and K. Hua, "Learning semantic proxies from visual prompts for parameter-efficient fine-tuning in deep metric learning," arXiv preprint arXiv:2402.02340, 2024.
[186] M. Jia, L. Tang, B.-C. Chen, C. Cardie, S. Belongie, B. Hariharan, and S.-N.Chen、C. Cardie、S. Belongie、B. Hariharan 和 S.-N. Lim,"视觉提示调谐",欧洲计算机视觉会议。Lim,"视觉提示调整",欧洲计算机视觉会议。Springer, 2022, pp.
[187] S. Chen、C. Ge、Z. Tong、J. Wang、Y. Song、J. Wang 和 P. Luo,"Adaptformer:Adapting vision transformers for scalable visual recognition," Advances in Neural Information Processing Systems, vol. 35, pp. .
[188] S. Jie 和 Z. -H.Deng, "Convolutional bypasses are better vision transformer adapters," arXiv preprint arXiv:2207.07039, 2022.
[189] S. Yoo、E. Kim、D. Jung、J. Lee 和 S. Yoon,《改进自监督视觉转换器的视觉提示调整》,arXiv 预印本 arXiv:2306.05067, 2023。
[190] J. Pan, Z. Lin, X. Zhu, J. Shao, and H. Li, "St-adapter:Parameterefficient image-to-video transfer learning," Advances in Neural Information Processing Systems, vol. 35, pp.
[191] T. Yang, Y. Zhu, Y. Xie, A. Zhang, C. Chen, and M. Li, "Aim: Adapting image models for efficient video action recognition," arXiv preprint arXiv:2302.03024, 2023
[192] A. Radford、J. W. Kim、C. Hallacy、A. Ramesh、G. Goh、S. Agarwal、G. Sastry、A. Askell、P. Mishkin、J. Clark 等人,"从自然语言监督中学习可转移的视觉模型",机器学习国际会议。PMLR, 2021, pp.
[193] C. Jia, Y. Yang, Y. Xia, Y.-T. Chen, Z. Parekh, H. Pham, Q. Le, Y.-H.Chen, Z. Parekh, H. Pham, Q. Le, Y.-H. Sung, Z. Li, and T. Duerig, "Scaling up visual and vision-language representation learning with noisy.Duerig, "Scaling up visual and vision-language representation learning with noisy text supervision," in International conference on machine learning.PMLR,2021 年,第 4904-4916 页。
[194] Y. Li, F. Liang, L. Zhao, Y. Cui, W. Ouyang, J. Shao, F. Yu, and J. Yan, "Supervision exists everywhere:A data efficient contrastive languageimage pre-training paradigm," arXiv preprint arXiv:2110.05208, 2021.
[195] A. Singh、R. Hu、V. Goswami、G. Couairon、W. Galuba、M. Rohrbach 和 D. Kiela,"Flava:A foundational language and vision alignment model," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. .
[196] M. Xu、Z. Zhang、F. Wei、H. Hu 和 X. Bai,"用于开放词汇语义分割的侧适配器网络",《IEEE/CVF 计算机视觉与模式识别会议论文集》,2023 年,第 2945-2954 页。
[197] Q. Yu、J. He、X. Deng、X. Shen 和 L.-C. Chen,"卷积会死得很难看。Chen, "Convolutions die hard:Open-vocabulary segmentation with single frozen convolutional clip," arXiv preprint arXiv:2308.02487, 2023.
[198] Z. Xu, Z. Chen, Y. Zhang, Y. Song, X. Wan, and G. Li, "Bridging vision and language encoders:参考图像分割的参数高效调整》,《IEEE/CVF 计算机视觉国际会议论文集》,2023 年,第 17 503-17 512 页。
[199] R. Zhang、Z. Guo、W. Zhang、K. Li、X. Miao、B. Cui、Y. Qiao、P. Gao 和 H. Li,"Pointclip:通过剪辑理解点云",《IEEE/CVF 计算机视觉与模式识别会议论文集》,2022 年,第 8552-8562 页。
[200] X. Zhu, R. Zhang, B. He, Z. Guo, Z. Zeng, Z. Qin, S. Zhang, and P. Gao, "Pointclip v2:Prompting clip and gpt for powerful 3d open-world learning," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp.
[201] Z. Wang、X. Yu、Y. Rao、J. Zhou 和 J. Lu,"P2p:利用点到像素提示为点云分析调整预训练图像模型",《神经信息处理系统进展》,第 35 卷,第 页。
[202] T. Huang、B. Dong、Y. Yang、X. Huang、R. W. Lau、W. Ouyang 和 W. Zuo,"Clip2point:带图像深度预训练的剪辑到点云分类",《IEEE/CVF 计算机视觉国际会议论文集》,2023 年,第 22 157-22 167 页。
[203] C. Ju, T. Han, K. Zheng, Y. Zhang, and W. Xie, "Prompting visuallanguage models for efficient video understanding," in European Conference on Computer Vision.Springer, 2022, pp.
[204] B. Ni, H. Peng, M. Chen, S. Zhang, G. Meng, J. Fu, S. Xiang, and H. Ling, "Expanding language-image pretrained models for general video recognition," in European Conference on Computer Vision. Springer, 2022, pp.Springer, 2022, pp.
[205] Z. Lin, S. Geng, R. Zhang, P. Gao, G. de Melo, X. Wang, J. Dai, Y. Qiao, and H. Li, "Frozen clip models are efficient video learners," in European Conference on Computer Vision. Springer, pp.Springer, 2022, pp.
[206] Z. Han, F. Zhu, Q. Lao, and H. Jiang, "Zero-shot referring expression comprehension via structural similarity between images and captions," arXiv preprint arXiv:2311.17048, 2023.
[207] S. Doveh、A. Arbelle、S. Harary、E. Schwartz、R. Herzig、R. Giryes、R. Feris、R. Panda、S. Ullman 和 L. Karlinsky,《向视觉与语言模型教授结构化视觉与语言概念》,《IEEE/CVF 计算机视觉与模式识别会议论文集》,2023 年,第 2657-2668 页。
[208] S. Nag、X. Zhu、Y. -Z.Song, and T. Xiang, "Zero-shot temporal action detection via vision-language prompting," in European Conference on Computer Vision.Springer, 2022, pp.
[209] K. Zhou, J. Yang, C. C. Loy, and Z. Liu, "Learning to prompt for vision-language models," International Journal of Computer Vision, vol. 130, no. 9, pp.
[211] B. Zhu、Y. Niu、Y. Han、Y. Wu 和 H. Zhang,"Prompt-aligned gradient for prompt tuning",《IEEE/CVF 计算机视觉国际会议论文集》,2023 年,第 15659-15 669 页。
[212] M. U. Khattak, H. Rasheed, M. Maaz, S. Khan, and F. S. Khan, "Maple: Multi-modal prompt learning," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19122
[213] M. Shu、W. Nie、D. -A.Huang, Z. Yu, T. Goldstein, A. Anandkumar, and C. Xiao, "Test-time prompt tuning for zero-shot generalization in vision-language models," Advances in Neural Information Processing Systems, vol. 35, pp.
[214] C.-M.Feng, K. Yu, Y. Liu, S. Khan, and W. Zuo, "Diverse data augmentation with diffusions for effective test-time prompt tuning," in the World Health and Technology.
IEEE/CVF 计算机视觉国际会议论文集》,2023 年,第 2704-2714 页。
[215] P. Gao、S. Geng、R. Zhang、T. Ma、R. Fang、Y. Zhang、H. Li 和 Y. Qiao,"Clip-adapter:Clip-adapter"。Clip-adapter:更好的视觉语言模型与特征适配器",《国际计算机视觉杂志》,第 1-15 页,2023 年。
[216] R. Zhang、R. Fang、W. Zhang、P. Gao、K. Li、J. Dai、Y. Qiao 和 H. Li,"Tip-adapter:Tip-adapter: Training-free clip-adapter for better visionlanguage modeling," arXiv preprint arXiv:2111.03930, 2021.
[217] E. Orhan:《图像识别的简单缓存模型》,《神经信息处理系统进展》,第31卷,2018年。
[218]E. Grave、M. M. Cisse 和 A. Joulin:《开放词汇在线语言建模的无界缓存模型》,《神经信息处理系统进展》,第 30 卷,2017 年。
[219] J. Ho, A. Jain, and P. Abbeel, "Denoising diffusion probabilistic models," Advances in neural information processing systems, vol. 33, pp.
[220] J. Sohl-Dickstein、E. Weiss、N. Maheswaranathan 和 S. Ganguli,"使用非平衡热力学的深度无监督学习",机器学习国际会议。PMLR, 2015, pp. .
[221] Z. Han, Y. Wang, L. Zhou, P. Wang, B. Yan, J. Zhou, Y. Wang, and D. Shen, "Contrastive diffusion model with auxiliary guidance for coarse-to-fine pet reconstruction," in International Conference on Medical Image Computing and Computer-Assisted Intervention.Springer, 2023, pp.
[222] L. Yang、Z. Zhang、Y. Song、S. Hong、R. Xu、Y. Zhao、W. Zhang、B. Cui 和 M.-H. Yang,"Diffusion models.Yang, "Diffusion models:ACM Computing Surveys, vol. 56, no. 4, pp.
[223] F.-A.Croitoru, V. Hondru, R. T. Ionescu, and M. Shah, "Diffusion models in vision:A survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
[224] P. Dhariwal 和 A. Nichol,《图像合成上的扩散模型击败甘斯》,《神经信息处理系统进展》,第 34 卷,第 8780-8794 页,2021 年。
[225] N. Ruiz、Y. Li、V. Jampani、Y. Pritch、M. Rubinstein 和 K. Aberman,"Dreambooth:微调用于主题驱动生成的文本到图像扩散模型",《IEEE/CVF 计算机视觉与模式识别会议论文集》,2023 年,第 22 500-22 510 页。
[226] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, "High-resolution image synthesis with latent diffusion models," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. .
[227] S. Luo、Y. Tan、S. Patil、D. Gu、P. von Platen、A. Passos、L. Huang、J. Li 和 H. Zhao,"Lcm-lora:A universal stable-diffusion acceleration module," arXiv preprint arXiv:2311.05556, 2023.
[228] W. Chai、D. Zheng、J. Cao、Z. Chen、C. Wang 和 C. Ma,"Speedupnet:加速文本到图像扩散模型的即插即用超网络",arXiv 预印本 arXiv:2312.08887, 2023。
[229] J. Z. Wu、Y. Ge、X. Wang、S. W. Lei、Y. Gu、Y. Shi、W. Hsu、Y. Shan、X. Qie 和 M. Z. Shou,"Tune-a-video:用于文本到视频生成的图像扩散模型的一次性调整",《IEEE/CVF 计算机视觉国际会议论文集》,2023 年,第 页。
[230] Z. Xing、Q. Dai、H. Hu、Z. Wu 和 Y.-G. Jiang,"Simda:Simda.Jiang, "Simda:Simda: Simple diffusion adapter for efficient video generation," arXiv preprint arXiv:2308.09710, 2023.
[231] B. Zeng、S. Li、Y. Feng、H. Li、S. Gao、J. Liu、H. Li、X. Tang、J. Liu 和 B. Zhang,"Ipdreamer:用图像提示生成外观可控的 3D 物体",arXiv 预印本 arXiv:2310.05375, 2023。
[232] J.-B.Alayrac, J. Donahue, P. Luc, A. Miech, I. Barr, Y. Hasson, K. Lenc, A. Mensch, K. Millican, M. Reynolds et al., "Flamingo: a visual language model for few-shot learning," Advances in Neural Information Processing Systems, vol. 35, pp.
[233] L. Zhang, A. Rao, and M. Agrawala, "Adding conditional control to text-to-image diffusion models," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp.
[234] R. Gandikota, J. Materzynska, T. Zhou, A. Torralba, and D. Bau, "Concept sliders:Lora adaptors for precise control in diffusion models," arXiv preprint arXiv:2311.12092, 2023.
[236] R. Gal、Y. Alaluf、Y. Atzmon、O. Patashnik、A. H. Bermano、G. Chechik 和 D. Cohen-Or,《一图胜一词:利用文本反转实现文本到图像的个性化生成》,arXiv 预印本 arXiv:2208.01618, 2022。
[237] N. Kumari, B. Zhang, R. Zhang, E. Shechtman, and J.-Y. Zhu, "Multiconcept customization of text-to-image diffusion," in Proceedings of IEEE/CVF Conference Computer Vision and Pattern Recognition, 2023.Zhu, "Multiconcept customization of text-to-image diffusion," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp.
[238] H. Ye, J. Zhang, S. Liu, X. Han, and W. Yang, "Ip-adapter:文本到图像扩散模型的文本兼容图像提示适配器," arXiv 预印本 arXiv:2308.06721, 2023
[240] G. Team、R. Anil、S. Borgeaud、Y. Wu、J.-B.Alayrac、J. Yu、R. Soricut、J. Schalkwyk、A. M. Dai、A. Hauth 等,《双子座:高能力多模态模型系列》,arXiv 预印本 arXiv:2312.11805, 2023。
[241] C. Gao 和 S. Q. Zhang,"Dlora:大型语言模型的分布式参数高效微调解决方案",arXiv 预印本 arXiv:2404.05182, 2024
[242] G. Xiao, J. Lin, and S. Han, "Offsite-tuning:Transfer learning without full model," arXiv preprint arXiv:2302.04870, 2023.
[243] Z. Zhou, X. Wei, J. Zhang, and G. Sun, " PetS : A unified framework for {Parameter-Efficient } transformers serving," in 2022 USENIX Annual Technical Conference (USENIX ATC 22), 2022, pp.
[245] L. Chen、Z. Ye、Y. Wu、D. Zhuo、L. Ceze 和 A. Krishnamurthy,"Punica:Punica: Multi-tenant lora serving," arXiv preprint arXiv:2310.18547, 2023.
[246] S. Mangrulkar、S. Gugger、L. Debut、Y. Belkada、S. Paul 和 B. Bossan,"Peft:最先进的参数高效微调方法",https://github.com/huggingface/peft 2022。
[247] C. Poth、H. Sterz、I. Paul、S. Purkayastha、L. Engländer、T. Imhof、I. Vulić、S. Ruder、I. Gurevych 和 J. Pfeiffer,"Adapters:A unified library for parameter-efficient and modular transfer learning," 2023.
[248] K. Chen, J. Wang, J. Pang, Y. Cao, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Xu, Z. Zhang, D. Cheng, C. Zhu, T. Cheng, Q. Zhao, B. Li, X. Lu, R. Zhu, Y. Wu, J. Dai, J. Wang, J. Shi, W. Ouyang, C. C. LoyCheng, Q. Zhao, B. Li, X. Lu, R. Zhu, Y. Wu, J. Dai, J. Wang, J. Shi, W. Ouyang, C. C. Loy, and D. Lin, "MMDetection:Open mmlab detection toolbox and benchmark," arXiv preprint arXiv:1906.07155, 2019.
[249] S. Q. Zhang、T. Tambe、N. Cuevas、G. -Y.Wei, and D. Brooks, "Camel:Co-designing ai models and embedded drams for efficient on-device learning," arXiv preprint arXiv:2305.03148, 2023.
[250] T. Brooks、B. Peebles、C. Holmes、W. DePue、Y. Guo、L. Jing、D. Schnurr、J. Taylor、T. Luhman、E. Luhman、C. Ng、R. Wang 和 A. Ramesh,"作为世界模拟器的视频生成模型",2024。[Online].Available: https: //openai.com/research/video-generation-models-as-world-simulators
[251] A. Gu and T. Dao, "Mamba:线性时序建模与选择性状态空间》,arXiv 预印本 arXiv:2312.00752, 2023.
[252] Y. Bai, X. Geng, K. Mangalam, A. Bar, A. Yuille, T. Darrell, J. Malik, and A. A. Efros, "Sequential modeling enables scalable learning for large vision models," arXiv preprint arXiv:2312.00785, 2023.
[253] A. Dosovitskiy 和 T. Brox,"用卷积网络反转视觉表征",《2016 年 IEEE 计算机视觉与模式识别会议论文集》,第 4829-4837 页。
[254] Z. He, T. Zhang, and R. B. Lee, "Model inversion attacks against collaborative inference," in Proceedings of the 35th Annual Computer Security Applications Conference, 2019, pp.