这是用户在 2024-6-7 11:35 为 https://aigc.openbot.ai/p/aigc-weekly-62 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

AIGC Newsletter

AIGC Weekly AIGC 每周简报

AIGC Weekly | #62 # AIGC 每周简报 | 第 62 期

AIGC Top Papers and AI news of the week
本周 AIGC 顶级论文和 AI 新闻

pxiaoer
Apr 08, 2024 2024 年 4 月 8 日
Share ### Share 在现代数字世界中,数据压缩技术变得越来越重要。无论是音频、视频还是图像,压缩技术都能显著减少文件大小,从而节省存储空间和带宽。常见的压缩格式包括 FLAC、JPEG 和 MP3 等。 #### 音频压缩 音频压缩技术可以分为有损压缩和无损压缩两类。有损压缩(如 MP3)通过丢弃一些音频信息来减少文件大小,而无损压缩(如 FLAC)则保留了所有原始音频信息。研究表明,无损压缩在音质上优于有损压缩,但文件大小也相对较大 [20]。 #### 图像压缩 图像压缩同样可以分为有损和无损两种。JPEG 是一种常见的有损压缩格式,广泛应用于数码摄影和网页图像。无损压缩格式(如 PNG)则常用于需要高质量图像的场合。图 1 显示了不同压缩格式的比较。 图 1: 不同图像压缩格式的比较 #### 视频压缩 视频压缩技术在流媒体和视频存储中起着关键作用。常见的视频压缩格式包括 H.264 和 H.265。H.265 相比 H.264 提供了更高的压缩效率,但也需要更高的计算能力 [20]。 总的来说,数据压缩技术在现代信息处理和传输中扮演着不可或缺的角色。随着技术的不断进步,压缩算法也在不断优化,以满足日益增长的需求

Top Papers of the week(April 1 - April 7)
本周热门论文(4 月 1 日 - 4 月 7 日)

1.) Mixture-of-Depths: Dynamically allocating compute in transformer-based language models ( paper )
1.) 深度混合:在基于 Transformer 的语言模型中动态分配计算资源 (paper)

Transformer-based language models spread FLOPs uniformly across input sequences. In this work we demonstrate that transformers can instead learn to dynamically allocate FLOPs (or compute) to specific positions in a sequence, optimising the allocation along the sequence for different layers across the model depth. Our method enforces a total compute budget by capping the number of tokens (k) that can participate in the self-attention and MLP computations at a given layer.
基于 Transformer 的语言模型在输入序列中均匀分布 FLOPs。在这项工作中,我们展示了 Transformer 可以学习动态分配 FLOPs(或计算)到序列中的特定位置,优化序列中不同层的分配。我们的方法通过限制在给定层中可以参与自注意力和 MLP 计算的 Token 数量 (k) 来强制执行总计算预算。

2.) Video Interpolation with Diffusion Models( webpage | paper )
2.) 使用扩散模型进行视频插值(网页 | 论文)

We present VIDIM, a generative model for video interpolation, which creates short videos given a start and end frame. In order to achieve high fidelity and generate motions unseen in the input data, VIDIM uses cascaded diffusion models to first generate the target video at low resolution, and then generate the high-resolution video conditioned on the low-resolution generated video.
我们介绍了 VIDIM,这是一种用于视频插值的生成模型,可以在给定起始帧和结束帧的情况下创建短视频。为了实现高保真度并生成输入数据中未见的运动,VIDIM 使用级联扩散模型,首先以低分辨率生成目标视频,然后在低分辨率生成的视频的条件下生成高分辨率视频。

3.) Many-shot jailbreaking ( webpage | paper )
3.)多样本越狱( webpage | paper )

We investigate a family of simple long-context attacks on large language models: prompting with hundreds of demonstrations of undesirable behavior. This is newly feasible with the larger context windows recently deployed by Anthropic, OpenAI and Google DeepMind. We find that in diverse, realistic circumstances, the effectiveness of this attack follows a power law, up to hundreds of shots. We demonstrate the success of this attack on the most widely used state-of-the-art closedweight models, and across various tasks. Our results suggest very long contexts present a rich new attack surface for LLMs
我们研究了一类针对大语言模型的简单长上下文攻击:通过数百个不良行为示例进行提示。随着 Anthropic、OpenAI 和 Google DeepMind 最近部署的更大上下文窗口,这种攻击变得新近可行。我们发现,在各种多样且现实的情况下,这种攻击的有效性遵循幂律,最多可达数百次。我们在最广泛使用的最先进的闭权重模型上展示了这种攻击的成功,并跨越各种任务。我们的结果表明,非常长的上下文为攻击提供了一个丰富的新表面。

A diagram illustrating how many-shot jailbreaking works, with a long script of prompts and a harmful response from an AI.

4.) AI and the Problem of Knowledge Collapse ( paper )
4.) AI 和知识崩溃问题(paper)

While artificial intelligence has the potential to process vast amounts of data, generate new insights, and unlock greater productivity, its widespread adoption may entail unforeseen consequences. We identify conditions under which AI, by reducing the cost of access to certain modes of knowledge, can paradoxically harm public understanding. While large language models are trained on vast amounts of diverse data, they naturally generate output towards the 'center' of the distribution. This is generally useful, but widespread reliance on recursive AI systems could lead to a process we define as "knowledge collapse", and argue this could harm innovation and the richness of human understanding and culture. However, unlike AI models that cannot choose what data they are trained on, humans may strategically seek out diverse forms of knowledge if they perceive them to be worthwhile.
虽然人工智能具有处理大量数据、生成新见解和提高生产力的潜力,但其广泛采用可能带来意想不到的后果。我们确定了在某些条件下,人工智能通过降低获取某些知识模式的成本,可能会对公众理解产生矛盾的负面影响。尽管大语言模型 (Large Language Model) 是在大量多样化数据上训练的,但它们自然会生成趋向于分布“中心”的输出。这通常是有用的,但对递归 AI 系统的广泛依赖可能导致我们定义为“知识崩溃”的过程,并认为这可能会损害创新以及人类理解和文化的丰富性。然而,与无法选择训练数据的 AI 模型不同,如果人类认为多样化的知识是有价值的,他们可能会战略性地寻求这些知识。

5.) sDPO: Don't Use Your Data All at Once ( paper )
5.)sDPO:不要一次性使用所有数据(paper)

As development of large language models (LLM) progresses, aligning them with human preferences has become increasingly important. We propose stepwise DPO (sDPO), an extension of the recently popularized direct preference optimization (DPO) for alignment tuning. This approach involves dividing the available preference datasets and utilizing them in a stepwise manner, rather than employing it all at once. We demonstrate that this method facilitates the use of more precisely aligned reference models within the DPO training framework. Furthermore, sDPO trains the final model to be more performant, even outperforming other popular LLMs with more parameters.
随着大语言模型 (LLM) 的发展,将它们与人类偏好对齐变得越来越重要。我们提出了逐步直接偏好优化 (sDPO),这是最近流行的直接偏好优化 (DPO) 的扩展,用于对齐调优。这种方法涉及将可用的偏好数据集分割开来,并以逐步的方式使用它们,而不是一次性全部使用。我们证明了这种方法在 DPO 训练框架内有助于使用更精确对齐的参考模型。此外,sDPO 训练的最终模型性能更佳,甚至超过了其他参数更多的流行模型。

6.) Bigger is not Always Better: Scaling Properties of Latent Diffusion Models ( paper )
6.)大不一定更好:潜在扩散模型的缩放特性(paper)

We study the scaling properties of latent diffusion models (LDMs) with an emphasis on their sampling efficiency. While improved network architecture and inference algorithms have shown to effectively boost sampling efficiency of diffusion models, the role of model size -- a critical determinant of sampling efficiency -- has not been thoroughly examined.
我们研究了潜在扩散模型 (LDMs) 的扩展特性,重点关注其采样效率。尽管改进的网络架构和推理算法已被证明可以有效提升扩散模型的采样效率,但模型规模——采样效率的关键决定因素——尚未得到彻底研究。

7.) InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation( webpage | paper )
7. ) InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation ( webpage | paper )

Tuning-free diffusion-based models have demonstrated sig- nificant potential in the realm of image personalization and customiza- tion. However, despite this notable progress, current models continue to grapple with several complex challenges in producing style-consistent image generation.
无需调优的基于扩散的模型在图像个性化和定制化领域展示了显著的潜力。然而,尽管取得了显著进展,当前的模型在生成风格一致的图像时仍面临一些复杂的挑战。

8.) PointInfinity: Resolution-Invariant Point Diffusion Models ( paper )
8.) PointInfinity: 分辨率不变的点扩散模型 ( 论文 )

We present PointInfinity, an efficient family of point cloud diffusion models. Our core idea is to use a transformer-based architecture with a fixed-size, resolution-invariant latent representation. This enables efficient training with low-resolution point clouds, while allowing high-resolution point clouds to be generated during inference.
我们介绍了 PointInfinity,这是一种高效的点云扩散模型家族。我们的核心思想是使用基于 Transformer 的架构,并采用固定大小、分辨率不变的潜在表示。这使得在低分辨率点云上进行高效训练成为可能,同时在推理过程中可以生成高分辨率点云。

9.) RL for Consistency Models: Faster Reward Guided Text-to-Image Generation ( webpage | paper )
9. ) 一致性模型的强化学习:更快的奖励引导文本到图像生成( [网页](webpage) | [论文](paper) )

Reinforcement learning (RL) has improved guided image generation with diffusion models by directly optimizing rewards that capture image quality, aesthetics, and instruction following capabilities. However, the resulting generative policies inherit the same iterative sampling process of diffusion models that causes slow generation. To overcome this limitation, consistency models proposed learning a new class of generative models that directly map noise to data, resulting in a model that can generate an image in as few as one sampling iteration. In this work, to optimize text-to-image generative models for task specific rewards and enable fast training and inference, we propose a framework for fine-tuning consistency models via RL.
强化学习 (Reinforcement learning, RL) 通过直接优化捕捉图像质量、美学和指令遵循能力的奖励,改进了使用扩散模型的引导图像生成。然而,生成的策略继承了扩散模型相同的迭代采样过程,导致生成速度缓慢。为克服这一限制,一致性模型 (consistency models) 提出了学习一种新的生成模型类别,直接将噪声映射到数据,从而使模型能够在一次采样迭代中生成图像。在这项工作中,为了优化文本到图像生成模型以获得特定任务的奖励,并实现快速训练和推理,我们提出了一个通过 RL 微调一致性模型的框架。

10.) Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model ( paper | model)
10.)Chinese Tiny LLM:预训练一个以中文为中心的大语言模型( 论文 | 模型)

In this study, we introduce CT-LLM, a 2B large language model (LLM) that illustrates a pivotal shift towards prioritizing the Chinese language in developing LLMs. Uniquely initiated from scratch, CT-LLM diverges from the conventional methodology by primarily incorporating Chinese textual data, utilizing an extensive corpus of 1,200 billion tokens, including 800 billion Chinese tokens, 300 billion English tokens, and 100 billion code tokens. This strategic composition facilitates the model's exceptional proficiency in understanding and processing Chinese, a capability further enhanced through alignment techniques.
在本研究中,我们介绍了 CT-1001,这是一种 20 亿参数的大语言模型 (LLM),展示了在开发 AIGC 时优先考虑中文的关键转变。CT-1001 从零开始独特地启动,与传统方法不同,主要包含中文文本数据,利用了一个包含 1.2 万亿 Token 的庞大语料库,其中包括 8000 亿中文 Token、3000 亿英文 Token 和 1000 亿代码 Token。这种战略性构成促进了模型在理解和处理中文方面的卓越能力,并通过对齐技术进一步增强了这一能力。

AIGC News of the week(April 1 - April 7)
本周 AIGC 新闻(4 月 1 日 - 4 月 7 日)

1.) SWE-agent: takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.29% of bugs in the SWE-bench evaluation set and takes just 1.5 minutes to run.( webpage | repo )
1.) SWE-agent:接收一个 GitHub 问题,并尝试使用 GPT-4 或你选择的大语言模型 (LM) 自动修复它。在 SWE-bench 评估集中,它解决了 12.29% 的错误,并且只需 1.5 分钟即可运行。( webpage | repo )

2.) llm-colosseum:Evaluate LLMs in real time with Street Fighter III ( repo )
2. ) llm-colosseum: 使用 Street Fighter III ( repo ) 实时评估 LLMs

3.) VAR:Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction" ( repo )
3.)VAR:官方实现“Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction”( 仓库 )

4.) ViTamin:Designing Scalable Vision Models in the Vision-language Era ( repo )
4.)ViTamin:在视觉-语言时代设计可扩展的视觉模型(repo)

5.) AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent ( repo )
5.)AutoWebGLM:基于大语言模型的 Web 导航智能体的自举和强化(repo)

more AIGC News: AINews 更多 AIGC 新闻:AINews

AIGC Newsletter is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Share ### Share 在现代数字世界中,数据压缩技术变得越来越重要。无论是音频、视频还是图像,压缩技术都能显著减少文件大小,从而节省存储空间和带宽。常见的压缩格式包括 FLAC、JPEG 和 MP3 等。 #### 音频压缩 音频压缩技术可以分为有损压缩和无损压缩两类。有损压缩(如 MP3)通过丢弃一些音频信息来减少文件大小,而无损压缩(如 FLAC)则保留了所有原始音频信息。研究表明,无损压缩在音质上优于有损压缩,但文件大小也相对较大 [20]。 #### 图像压缩 图像压缩同样可以分为有损和无损两种。JPEG 是一种常见的有损压缩格式,广泛应用于数码摄影和网页图像。无损压缩格式(如 PNG)则常用于需要高质量图像的场合。图 1 显示了不同压缩格式的比较。 图 1: 不同图像压缩格式的比较 #### 视频压缩 视频压缩技术在流媒体和视频存储中起着关键作用。常见的视频压缩格式包括 H.264 和 H.265。H.265 相比 H.264 提供了更高的压缩效率,但也需要更高的计算能力 [20]。 总的来说,数据压缩技术在现代信息处理和传输中扮演着不可或缺的角色。随着技术的不断进步,压缩算法也在不断优化,以满足日益增长的需求
Daily Papers and AI News Tracking(11.28)
每日论文与 AI 新闻追踪 (11.28)
Today's Top AI news and papers
今天的顶尖 AI 新闻和论文
Nov 28, 2023 • 
pxiaoer
[paper] DeepMind Genie: Generative Interactive Environments
Genie: a foundation world model
Feb 26 • 
pxiaoer
AIGC Weekly | #56
AIGC Top Papers and AI news of the week
Feb 26 • 
pxiaoer

Ready for more?

© 2024 pxiaoer
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great culture