요약

오늘의 AI 소식은 다양한 AI 기술과 그 적용에 관한 최신 정보를 제공합니다. OpenAI는 최근 인공지능을 이용한 비밀 영향 작전을 방해한 내용을 발표했고, Google은 Gemini 1.5 Pro와 1.5 Flash 모델의 새로운 기능을 소개했습니다. 또한, IEIT-Yuan은 Yuan2.0-M32 모델을 공개했으며, Google은 CodecLM을 통해 맞춤형 합성 데이터를 활용한 언어 모델 정렬을 발표했습니다. OpenAI는 교육 기관을 위한 ChatGPT Edu를 도입했고, Anthropic은 Claude 모델의 툴 사용 기능을 일반에 공개했습니다. 마지막으로, Tencent AI Lab은 V-Express 방법을 이용한 초상화 비디오 생성에 대한 연구를 발표했습니다.

OpenAI가 발표한 비밀 영향 작전 방해

링크 | 2024년 5월 30일

  • OpenAI는 비밀 영향 작전에 이용된 AI 모델을 차단했다고 발표
  • 지난 3개월 동안 다섯 개의 비밀 작전이 차단됨
  • 이 작전들은 주로 러시아, 중국, 이란, 이스라엘에서 발생했으며, 다양한 언어로 작성된 콘텐츠 생성 및 소셜 미디어 활동을 포함
  • 이러한 작전의 주제는 러시아의 우크라이나 침공, 가자지구 분쟁, 인도 선거 등 정치적 이슈 포함
  • OpenAI는 공격자들의 활동이 자사 서비스로 인해 실질적으로 증가하지 않았다고 보고
  • AI 모델의 안전 설계와 AI 도구의 효율성 덕분에 이러한 작전을 방해할 수 있었음

Google의 Gemini 1.5 Pro와 1.5 Flash 모델 발표

링크 | 2024년 5월 30일

  • Gemini 1.5 Pro와 1.5 Flash 모델의 안정적 출시 및 요금제 발표
  • 1.5 Flash 모델은 빠른 속도와 비용 효율성을 강조하며, 요청 제한을 증가시켜 1분당 1000개의 요청 처리 가능
  • 1.5 Flash 모델은 맞춤형 튜닝 지원, JSON 스키마 모드, 모바일 지원 및 라이트 모드 제공
  • Google AI Studio에서 무료로 사용할 수 있으며, 요금제를 활성화하면 더 높은 API 한도 이용 가능

IEIT-Yuan의 Yuan2.0-M32 모델 공개

링크 | 2024년 5월 28일

  • Yuan2.0-M32 모델은 32명의 전문가 중 2명이 활성화된 Mixture-of-Experts(MoE) 언어 모델
  • 새로운 Attention Router 네트워크를 도입하여 효율적인 전문가 선택 가능
  • 2000억 개의 토큰으로 학습되었으며, 동일한 규모의 밀집 모델에 비해 9.25%의 계산만 필요
  • MATH 및 ARC-Challenge 벤치마크에서 Llama3-70B 모델을 능가
  • 16K의 시퀀스 길이와 40억 개의 총 파라미터 보유

Google의 CodecLM 발표

링크 | 2024년 5월 30일

  • CodecLM은 고품질 합성 데이터를 사용하여 LLM(대규모 언어 모델)을 특정 다운스트림 작업에 맞게 정렬하는 프레임워크
  • Self-Rubrics 및 Contrastive Filtering을 통해 합성 데이터의 품질을 향상시키는 전략 도입
  • PaLM 2 LLM을 사용하여 다양한 공개 도메인 지침-따르기 벤치마크에서 성능 입증
  • 맞춤형 합성 데이터를 생성하여 LLM의 성능을 크게 향상시킴

OpenAI의 ChatGPT Edu 발표

링크 | 2024년 5월 30일

  • 대학을 위한 ChatGPT Edu 출시, GPT-4o 모델 기반으로 텍스트 및 비전 분석 가능
  • 데이터 분석, 웹 브라우징, 문서 요약 등 고급 기능 포함
  • 강력한 보안, 데이터 프라이버시 및 관리 제어 제공
  • 대학에서 AI를 학생, 교직원 및 연구진에게 확산시키기 위한 경제적인 옵션

Anthropic의 Claude 모델 툴 사용 기능 일반 공개

링크 | 2024년 5월 31일

  • Claude 3 모델 패밀리에서 외부 툴 및 API와 상호작용 가능
  • 구조화되지 않은 텍스트에서 데이터 추출, 자연어 요청을 구조화된 API 호출로 변환, 데이터베이스 검색 등을 통해 정확한 답변 제공
  • 스트리밍을 통한 실시간 응답, 이미지 입력 활용 가능
  • 고객 사례: StudyFetch, Intuned, Hebbia가 Claude의 툴 사용 기능을 통해 AI 학습 플랫폼 및 데이터 추출 기능 향상

Tencent AI Lab의 V-Express 연구 발표

링크 | 2024년 5월 28일

  • V-Express는 단일 이미지를 사용하여 초상화 비디오를 생성하는 방법을 제안
  • 다양한 강도의 제어 신호(텍스트, 오디오, 이미지 참조, 포즈 등)를 균형 있게 처리
  • 오디오 신호의 효과적인 제어를 위해 점진적 드롭 방법 사용
  • 오디오 신호에 의해 제어되는 초상화 비디오를 효과적으로 생성

Figma의 GPT-4o를 이용한 자동화

유튜브 링크

  • GPT-4o를 이용한 Figma 디자인 자동화
  • PRD(제품 요구 사항 문서)를 기반으로 자동으로 디자인 생성

오늘의 AI 소식은 인공지능의 다양한 분야에서 최신 기술과 그 적용 사례를 다룹니다. 인공지능이 어떻게 다양한 산업과 연구에 혁신을 가져오고 있는지에 대한 깊이 있는 이해를 제공합니다.

Sources This GPT assists users by creating a detailed daily newspaper in Korean based on provided links. It follows these steps: read the content, summarize each content with detailed points, and write a report. The report format is: # AI News for (today's date), ## Summary (overall short summary), ## Link1 Title, link, date - detailed summary1, - detailed summary2, - detailed summary..N, ## Link2 Title, link, date - detailed summary1, - detailed summary2, - detailed point..N, etc. The report should be written in Korean and use the 개조식 문체 style. give the very deep details for each link as much as possible. make summary with good details, note company name next to date if available.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
###
https://openai.com/index/disrupting-deceptive-uses-of-AI-by-covert-influence-operations/
May 30, 2024

Disrupting deceptive uses of AI by covert influence operations
We’ve terminated accounts linked to covert influence operations; no significant audience increase due to our services.

The image is an abstract background with soft, blended hues of purple, pink, and blue. The pastel colors mix seamlessly, creating a dreamy and serene atmosphere, reminiscent of a twilight sky or an ethereal mist.
OpenAI is committed to enforcing policies that prevent abuse and to improving transparency around AI-generated content. That is especially true with respect to detecting and disrupting covert influence operations (IO), which attempt to manipulate public opinion or influence political outcomes without revealing the true identity or intentions of the actors behind them.

In the last three months, we have disrupted five covert IO that sought to use our models in support of deceptive activity across the internet. As of May 2024, these campaigns do not appear to have meaningfully increased their audience engagement or reach as a result of our services.

This blog describes the threat actors we disrupted, attacker trends we identified, and important defensive trends - including how designing AI models with safety in mind in many cases prevented the threat actors from generating the content they desired, and how AI tools have made our own investigations more efficient. Alongside this blog, we are publishing a trend analysis that describes the behavior of these malicious actors in detail.

Read the full report(opens in a new window)

Threat actors work across the internet. So do we. By collaborating with industry, civil society, and government we tackle the creation, distribution, and impact of IO content. Our investigations and disruptions were made possible in part because there’s been so much detailed threat reporting over the years by distribution platforms and the open-source community. OpenAI is publishing these findings, as other tech companies do, to promote information sharing and best practices amongst the broader community of stakeholders.

Disruption of covert influence operations
Over the last three months, our work against IO actors has disrupted covert influence operations that sought to use AI models for a range of tasks, such as generating short comments and longer articles in a range of languages, making up names and bios for social media accounts, conducting open-source research, debugging simple code, and translating and proofreading texts.

Specifically, we disrupted:

A previously unreported operation from Russia, which we dubbed Bad Grammar, operating mainly on Telegram and targeting Ukraine, Moldova, the Baltic States and the United States. The people behind Bad Grammar used our models to debug code for running a Telegram bot and to create short, political comments in Russian and English that were then posted on Telegram.

An operation originating in Russia known as Doppelganger(opens in a new window). People acting on behalf of Doppelganger used our models to generate comments in English, French, German, Italian and Polish that were posted on X and 9GAG; translate and edit articles in English and French that were posted on websites linked to this operation; generate headlines; and convert news articles into Facebook posts.

A Chinese network known as Spamouflage(opens in a new window), which used our models to research public social media activity, generate texts in languages including Chinese, English, Japanese and Korean that were then posted across platforms including X, Medium and Blogspot, and debug code for managing databases and websites, including a previously unreported domain, revealscum[.]com.

An Iranian operation known as the International Union of Virtual Media(opens in a new window) (IUVM), which used our models to generate and translate long-form articles, headlines and website tags that were then published on a website linked to this Iranian threat actor, iuvmpress[.]co;

Activity by a commercial company in Israel called STOIC, because technically we disrupted the activity, not the company. We nicknamed this operation Zero Zeno, for the founder of the stoic school of philosophy. The people behind Zero Zeno used our models to generate articles and comments that were then posted across multiple platforms, notably Instagram, Facebook, X, and websites associated with this operation.

The content posted by these various operations focused on a wide range of issues, including Russia’s invasion of Ukraine, the conflict in Gaza, the Indian elections, politics in Europe and the United States, and criticisms of the Chinese government by Chinese dissidents and foreign governments.

So far, these operations do not appear to have benefited from meaningfully increased audience engagement or reach as a result of our services. Using Brookings’ Breakout Scale,(opens in a new window) which assesses the impact of covert IO on a scale from 1 (lowest) to 6 (highest), none of the five operations included in our case studies scored higher than a 2 (activity on multiple platforms, but no breakout into authentic communities).

Attacker trends
Based on the investigations into influence operations detailed in our report, and the work of the open-source community, we have identified the following trends in how covert influence operations have recently used artificial intelligence models like ours.

Content generation: All these threat actors used our services to generate text (and occasionally images) in greater volumes, and with fewer language errors than would have been possible for the human operators alone.

Mixing old and new: All of these operations used AI to some degree, but none used it exclusively. Instead, AI-generated material was just one of many types of content they posted, alongside more traditional formats, such as manually written texts or memes copied from across the internet.

Faking engagement: Some of the networks we disrupted used our services to help create the appearance of engagement across social media - for example, by generating replies to their own posts. This is distinct from attracting authentic engagement, which none of the networks we describe here managed to do to a meaningful degree.

Productivity gains: Many of the threat actors that we identified and disrupted used our services in an attempt to enhance productivity, such as summarizing social media posts or debugging code.

Defensive trends
While much of the public debate so far has focused on the potential or actual use of AI by attackers, it is important to remember the advantages that AI offers to defenders. Our investigations also benefit from industry sharing and open-source research.

Defensive design: We impose friction on threat actors through our safety systems, which reflect our approach to responsibly deploying AI. For example, we repeatedly observed cases where our models refused to generate the text or images that the actors asked for.

AI-enhanced investigation: Similar to our approach to using GPT-4 for content moderation and cyber defense, we have built our own AI-powered tools to make our detection and analysis more effective. The investigations described in the accompanying report took days, rather than weeks or months, thanks to our tooling. As our models improve, we’ll continue leveraging their capabilities to improve our investigations too.

Distribution matters: Like traditional forms of content, AI-generated material must be distributed if it is to reach an audience. The IO posted across a wide range of different platforms, including X, Telegram, Facebook, Medium, Blogspot, and smaller forums, but none managed to engage a substantial audience.

Importance of industry sharing: To increase the impact of our disruptions on these actors, we have shared detailed threat indicators with industry peers. Our own investigations benefited from years of open-source analysis conducted by the wider research community.

The human element: AI can change the toolkit that human operators use, but it does not change the operators themselves. Our investigations showed that these actors were as prone to human error as previous generations have been - for example, publishing refusal messages from our models on social media and their websites. While it is important to be aware of the changing tools that threat actors use, we should not lose sight of the human limitations that can affect their operations and decision making.

We are committed to developing safe and responsible AI, which involves designing our models with safety in mind and proactively intervening against malicious use. Detecting and disrupting multi-platform abuses such as covert influence operations can be challenging because we do not always know how content generated by our products is distributed. But we are dedicated to finding and mitigating this abuse at scale by harnessing the power of generative AI.

Announcements
Safety & Alignment
Authors
OpenAI

###
https://developers.googleblog.com/en/gemini-15-pro-and-15-flash-now-available/
GEMINI
Gemini 1.5 Pro and 1.5 Flash GA, 1.5 Flash tuning support, higher rate limits, and more API updates
MAY 30, 2024
Logan Kilpatrick
Senior Product Manager
Gemini API and Google AI Studio
Shrestha Basu Mallick
Group Product Manager
Gemini API

Share
Gemini 1.5 Pro-Flash
Building on the momentum from Google I/O, we're announcing important updates to the Gemini API and Google AI Studio, including:

Gemini 1.5 Flash and 1.5 Pro stable release and billing
Higher rate limits on Gemini 1.5 Flash
Gemini 1.5 Flash tuning
JSON schema mode
Mobile support and light mode in Google AI Studio
We’re incredibly excited to see what you build with these new models and are committed to building towards a world class developer experience. You can get started with Gemini 1.5 Flash and 1.5 Pro free of charge in Google AI Studio.


Gemini 1.5 Flash updates
Gemini 1.5 Flash was purpose-built as our fastest, most cost-efficient model yet for high volume tasks, at scale, to address developers’ feedback asking for lower latency and cost. Today, we are increasing the rate limit for 1.5 Flash to 1000 requests per minute (RPM) and removing the request per day limit. The 1.5 Pro rate limit will not be changed at this time, but if you need even higher limits to scale or have feedback, please reach out to us.

Customizing models can help you reach the performance threshold needed to take AI models into production. To support that, we will also be rolling out tuning support for Gemini 1.5 Flash on June 17th. Tuning will be supported in both Google AI Studio and the Gemini API directly. Currently, tuning jobs are free of charge, and using a tuned model does not incur any additional per-token costs. You can learn more about tuning in the Gemini API docs.


Gemini API billing
In addition to the free tier, starting today, developers can unlock higher API rate limits by turning on a billing account in Google AI Studio.

Set up billing in Google AI Studio
You can learn more about the Gemini 1.5 model pricing on ai.google.dev/pricing. If you run into any issues setting up billing, please let us know on our developer forum. For developers looking to scale with enterprise-grade features, the same models are available via Vertex AI, our enterprise-ready AI platform.


JSON schema mode
We launched JSON mode in the Gemini API and Google AI Studio earlier this year to give you more control over model output. Starting today, you can specify the desired JSON schema for the model to respond with, which unlocks many new use cases where you need the model to conform to certain output constraints like following a predefined structure or only outputting specific text. You can read more about JSON schema mode in the Gemini API docs.


Light mode and mobile support
To give developers more flexibility in AI Studio, you can now choose your preferred UI mode (light vs dark) or use your system defaults in the settings pane. We also rolled out our first set of mobile improvements for Google AI Studio to allow you to quickly test multi modal prompts on-the-go.


As we continue to improve our developer experience, please share your feedback on our Developer Forum. Happy building!

###
https://github.com/IEIT-Yuan/Yuan2.0-M32/tree/main
Yuan2.0-M32: Mixture of Experts with Attention Router
=====================================================

[](https://github.com/IEIT-Yuan/Yuan2.0-M32/tree/main#--yuan20-m32-mixture-of-experts-with-attention-router-)

👾 [ModelScope](https://www.modelscope.cn/profile/YuanLLM) - 🤗 [Hugging Face](https://huggingface.co/IEITYuan) - 💬 [WeChat](https://github.com/IEIT-Yuan/Yuan-2.0/blob/main/images/%E6%BA%90%E5%85%AC%E4%BC%97%E5%8F%B7%E4%BA%8C%E7%BB%B4%E7%A0%81.png)- 📎 [Yuan2.0-M32 Paper](https://arxiv.org/abs/2405.17976)

[![Code License](https://camo.githubusercontent.com/8a1af7455ed34ab5dd2d316b2518cbc6af01e63bf3bdbd9b7f211a6c349fc139/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f417061636865253230322e302532302d677265656e3f7374796c653d666c6174266c6162656c3d436f64652532304c6963656e7365266c696e6b3d68747470732533412532462532466769746875622e636f6d253246494549542d5975616e2532465975616e2d322e302d4d6f452533467461622533444170616368652d322e302d312d6f762d66696c65) ](https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/code_license)[![Model License](https://camo.githubusercontent.com/01772df1697a1100a2d8ec4631d26b2c668433b311bb4adb636f2702771f80d3/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f5975616e322e302532304c6963656e73652d626c75653f7374796c653d666c6174266c6f676f436f6c6f723d626c7565266c6162656c3d4d6f64656c2532304c6963656e736526636f6c6f723d626c7565266c696e6b3d68747470732533412532462532466769746875622e636f6d253246494549542d5975616e2532465975616e2d322e30253246626c6f622532466d61696e2532464c4943454e53452d5975616e)](https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/model_license)

####

English | [简体中文](https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/README_CN.md)

[](https://github.com/IEIT-Yuan/Yuan2.0-M32/tree/main#------------english---------%E7%AE%80%E4%BD%93%E4%B8%AD%E6%96%87----)

* * * * *

0\. Latest News 🎉🎉
--------------------

[](https://github.com/IEIT-Yuan/Yuan2.0-M32/tree/main#0-latest-news-)

- [2024-05-28] Yuan2.0-M32 released

1\. Introduction
----------------

[](https://github.com/IEIT-Yuan/Yuan2.0-M32/tree/main#1-introduction)

Yuan2.0-M32 is a Mixture-of-Experts (MoE) language model with 32 experts, of which 2 are active. A new router network, Attention Router, is proposed and has been adopted for more efficient expert selection, boosting accuracy by 3.8% over models using a classical router network. Yuan 2.0-M32 is trained from scratch with 2000B tokens, and its training computation is only 9.25% of that required by a dense model of the same parameter scale. Demonstrating competitive capabilities in coding, math, and various specialized fields, Yuan2.0-M32 operates with only 3.7B active parameters out of a total 40B, and a forward computation of 7.4 GFLOPS per token, which is just 1/19th of Llama3-70B's requirement. Yuan 2.0-M32 has surpassed Llama3-70B on the MATH and ARC-Challenge benchmarks, achieving accuracies of 55.9% and 95.8%, respectively. The basic information of the Yuan2.0-M32 model is as follows:

- Total Parameters : 40B

- Experts: 32

- Active Experts: 2

- Active Parameters: 3.7B

- Pretrained Tokens: 2000B tokens

- Sequence Length: 16K

The technical report for the Yuan2.0-M32 model has been released, and you can find more detailed technical information and evaluation results by referring to the [paper](https://arxiv.org/abs/2405.17976).

[![](https://github.com/IEIT-Yuan/Yuan2.0-M32/raw/main/docs/Yuan2.0-M32-Architecture.jpg)](https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/docs/Yuan2.0-M32-Architecture.jpg)

Fig.1: Yuan 2.0-M32 Architecture

2\. Model Downloads
-------------------

[](https://github.com/IEIT-Yuan/Yuan2.0-M32/tree/main#2-model-downloads)

| Model | Sequence Length | Type | Download |
| :-: | :-: | :-: | :-: |
| Yuan2.0-M32 | 16K | Megatron | [ModelScope](https://modelscope.cn/models/YuanLLM/Yuan2-M32/) | [HuggingFace](https://huggingface.co/IEITYuan/Yuan2-M32) | [Netdisk](https://pan.baidu.com/s/1K0LVU5NxeEujtYczF_T-Rg?pwd=cupw) |
| Yuan2.0-M32-HF | 16K | HuggingFace | [ModelScope](https://modelscope.cn/models/YuanLLM/Yuan2-M32-hf) | [HuggingFace](https://huggingface.co/IEITYuan/Yuan2-M32-hf) | [Netdisk](https://pan.baidu.com/s/1FrbVKji7IrhpwABYSIsV-A?pwd=q6uh) |
| Yuan2.0-M32-GGUF | 16K | GGUF | [ModelScope](https://modelscope.cn/models/YuanLLM/Yuan2-M32-gguf/summary) | [HuggingFace](https://huggingface.co/IEITYuan/Yuan2-M32-gguf) | [Netdisk](https://pan.baidu.com/s/1BWQaz-jeZ1Fe69CqYtjS9A?pwd=f4qc) |
| Yuan2.0-M32-GGUF-INT4 | 16K | GGUF | [ModelScope](https://modelscope.cn/models/YuanLLM/Yuan2-M32-gguf-int4/summary) | [HuggingFace](https://huggingface.co/IEITYuan/Yuan2-M32-gguf-int4) | [Netdisk](https://pan.baidu.com/s/1FM8xPpkhOrRcAfe7-zUgWQ?pwd=e6ag) |

3\. Evaluation
--------------

[](https://github.com/IEIT-Yuan/Yuan2.0-M32/tree/main#3-evaluation)

3.1 Benchmarks 🏆

We conducted a thorough evaluation of the Yuan2.0-M32 model across a range of benchmarks, including HumanEval, GSM8K, MMLU, Math, and ARC-Challenge. These benchmarks are designed to test the model's proficiency in key areas such as natural language understanding, knowledge acquisition, mathematical computation and reasoning, and code generation. The Yuan2.0-M32 has shown a consistent and significant advantage over other models like Llama3-8B and Mistral-8×7B, excelling in all evaluated tasks. Remarkably, its overall performance is on par with the more substantial Llama3-70B model.The detailed evaluation results are outlined in the subsequent table.

- We provided evaluation scripts for [HumanEval](https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/docs/eval_humaneval.md), [GSM8K](https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/docs/eval_gsm8k.md), [MMLU](https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/docs/eval_mmlu.md), [Math](https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/docs/eval_math.md) and [ARC-C](https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/docs/eval_arc.md) to support the replication of our evaluation results.

| Model | HumanEval | GSM8K | MMLU | Math | ARC-C* |
| --- | :-: | :-: | :-: | :-: | :-: |
| Llama3-70B | 81.7% | 93% | 80.3 | 50.4% | 93.3% |
| Llama3-8B | 62.2% | 79.6% | 68.4% | 30% | 78.6% |
| Phi-3-medium | 62.2% | 91.0% | 78.0% | - | 91.6% |
| Phi-3-small | 61% | 89.6% | 75.7% | - | 90.7% |
| Phi-3-mini | 58.5% | 82.5% | 68.8% | - | 84.9% |
| Mistral-8*22B | 45.1% | 78.6% | 77.8% | 41,8% | 91.3% |
| Mistral-8*7B | 40.2% | 58.4% | 70.86% | 28.4% | 85.9% |
| Yuan2.0-M32 | 74.4% | 92.7% | 72.2% | 55.9% | 95.8% |

* *ARC-C*: AI2 Reasoning Challenge (ARC) benchmark contains more complex parts that need further reasoning.

* * * * *

3.2 Computational Utilization for Model

| Model | Params (B) | Active Params (B) | GFLOPs/token (Inference) | GFLOPS/token (Fine-tune) | Mean Accuracy | Average Accuracy/GFLOPSs per token (Inference) |
| --- | :-: | :-: | :-: | :-: | :-: | :-: |
| Llama3-70B | 70 | 70 | 140 | 420 | 79.25 | 0.57 |
| Llama3-8B | 8 | 8 | 16 | 48 | 64.15 | 4.00 |
| Mistral-8*22B | 141 | 39 | 78 | 234 | 72.38 | 0.93 |
| Mistral-8*7B | 47 | 12.9 | 25.8 | 77.3 | 60.83 | 2.36 |
| Yuan2.0-M32 | 40 | 3.7 | 7.4 | 22.2 | 79.15 | 10.69 |

4\. Quick Start
---------------

[](https://github.com/IEIT-Yuan/Yuan2.0-M32/tree/main#4-quick-start)

4.1 Environment Config

We strongly recommend using the latest release of docker images of Yuan2.0-M32.You can launch an instance of the Yuan 2.0 container with the following Docker commands:


4.2 Data Preprocess

We have provided the data preprocess script. See documentation [here](https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/docs/data_process.md).

4.3 Model Pretrain

We've provided several scripts for pretraining in the [`example`](https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/examples). The details can be seen from documentation [here](https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/docs/pretrain.md).

4.4 Inference Service

For a detailed deployment plan, please refer to [vllm](https://github.com/IEIT-Yuan/Yuan2.0-M32/edit/main/vllm/README_Yuan_vllm.md).

5\. Statement of Agreement
--------------------------

[](https://github.com/IEIT-Yuan/Yuan2.0-M32/tree/main#5-statement-of-agreement)

The use of the source code in this repository requires compliance with the open source license agreement Apache 2.0. The Yuan2.0 model supports commercial use and does not require authorization. Please understand and comply with the [《Yuan2.0 Model License Agreement》](https://github.com/IEIT-Yuan/Yuan2.0-M32/blob/main/LICENSE-Yuan). Do not use the open source model and code, as well as derivatives generated from open source projects, for any purposes that may cause harm to the country and society, or for any services that have not undergone security assessment and filing. Although we have taken measures to ensure the compliance and accuracy of the data during training, the model has a huge number of parameters and is affected by probability and randomness factors. We cannot guarantee the accuracy of the output content, and the model is easily misled by input instructions. This project does not assume any data security, public opinion risks, or any model misleading, abusing, spreading caused by open-source models and code Risks and responsibilities arising from improper utilization You will be solely responsible for the risks and consequences arising from the use, copying, distribution, and modification of the model in this open source project

###
https://research.google/blog/codeclm-aligning-language-models-with-tailored-synthetic-data/
Blog
CodecLM: Aligning language models with tailored synthetic data
May 30, 2024

Zifeng Wang and Chen-Yu Lee, Research Scientists, Cloud AI Research Team

We propose CodecLM, an end-to-end data synthesis framework that tailors high-quality data to align LLMs for different downstream tasks without human annotation.

Instruction tuning is a critical step in LLM alignment, i.e., shaping the behavior of large language models (LLMs) to better align with the intended objective. It involves fine-tuning a pre-trained LLM on a varied set of instructions, each paired with a desired output. This process enables the model to generalize across various tasks and formats, ultimately improving its performance in understanding and responding to user instructions. In essence, instruction tuning empowers LLMs to follow instructions more effectively, thereby making them more useful and reliable tools for a wide range of applications. Recent progress in instruction tuning highlights the critical role of high-quality data in enhancing LLMs' instruction-following capabilities. However, acquiring such data through human annotation remains cost-prohibitive and difficult to scale, hindering further progress.

Alternatively, recent work explores synthesizing instruction–response pairs for LLM alignment by prompting models with example data and iteratively refining the results. While these methods are effective at generating varied instructions for LLM alignment broadly, real-world applications often prioritize tailoring the LLM to specific downstream tasks such as individual enterprise applications or personal assistant agents, which often involve different instruction distributions. This need for task-specific alignment brings us to a core question for data synthesis: how can we tailor synthetic data to align LLMs for different instruction-following tasks?

In “CodecLM: Aligning Language Models with Tailored Synthetic Data”, presented at NAACL 2024, we present a novel framework, CodecLM, that systematically generates tailored high-quality data to align LLMs for specific downstream tasks. Inspired by the principles of the encode-decode process, we leverage a strong LLM (i.e., an LLM that has strong instruction-following capability for data synthesis, such as Gemini Pro or text-unicorn) as a codec, to encode seed instructions from our target task into instruction metadata (keywords that capture the use case of the instruction, and the skills required for an LLM to respond to the instruction). We then decode the metadata into tailored synthetic instructions. In the decoding process, we propose two complementary strategies, Self-Rubrics and Contrastive Filtering, to enhance synthetic data quality. Self-Rubrics leverages the strong LLM to generate rubrics and actions to make synthetic instruction more challenging. Contrastive Filtering further selects the instructions to which the target LLM (the LLM to be aligned) fails to respond well. CodecLM achieves state-of-the-art performance on open-domain instruction-following benchmarks with various LLMs, demonstrating its effectiveness in LLM alignment for varied instruction distributions.

CodecLM1-Hero
Overview of CodecLM. We first encode seed instructions into metadata to capture the underlying distribution of instructions. This metadata is then decoded through two complementary strategies, Self-Rubrics and Contrastive Filtering, to tailor high-quality synthetic instructions that are aligned with the target instruction distribution. Intermediate instructions and responses are omitted in the figure for clarity.

CodecLM
The core idea of CodecLM is to customize synthetic data for different downstream tasks, which can then be used to fine-tune an LLM for the tasks of interest. To achieve this goal, we need to make sure 1) the synthetic data’s distribution is similar to that of the real downstream data, and 2) the quality of synthetic data is high enough to improve the target LLM to be tuned.

First, the strong LLM encodes the seed instruction into instruction metadata, specifying its use case and skills required for responses. Next, the strong LLM decodes metadata into basic instructions. Meanwhile, Self-Rubrics (more below) leverages the strong LLM to generate rubrics and actions to improve the basic instruction, tailoring them for the downstream task. Finally, Contrastive Filtering (more below) uses a scoring function to compare answers from both the strong and target LLMs. The most effective pairs are selected for aligning the LLM, while less effective instructions are sent for further improvement. Animated below, the strong LLM's (refined) answer is winning against the target LLM's (simplistic) answer, indicating the improved synthetic instruction is challenging enough for the target LLM. Hence, we select the corresponding pair for instruction tuning the target LLM.

Detailed workflow of CodecLM.

Encoding instructions via metadata
To capture the underlying instruction distribution from the downstream task, we extract a word-level abstraction of the input instruction distribution through instruction metadata. We define the metadata as encompassing two key aspects: use case and skills. The use case describes the intended task (e.g., question answering or creative writing), while the skills are the knowledge the LLM must have to successfully respond to the given instruction (e.g., algorithms or communication). With the metadata from the seed instruction, we can readily prompt the strong LLM to generate synthetic instructions based on the extracted metadata.

Tailoring instructions via Self-Rubrics
With the above method, however, the quality of the synthetic instructions generated by simply prompting the LLM with the metadata may not be high. A recent study found that tuning LLMs with more complex instructions can improve performance, indicating that complex instructions are often considered high quality. A common practice is to work with human experts to craft general guidance to complicate instructions, such as “add reasoning steps” (more below). However, this strategy falls short for tailoring guidance to different tasks, like solving calculus problems versus writing news articles. Therefore, we introduce Self-Rubrics, which leverages the strong LLM to tailor instructions by adjusting their complexity according to the extracted metadata.

Self-Rubrics first guides the LLM to generate distinct rubrics for assessing the instruction complexity of each metadatum. Then, informed by these rubrics, the LLM generates a corresponding set of actions to enhance the instruction’s complexity. Such actions generated by Self-Rubrics are domain-specific and unambiguous — for example, for the use case of “business plan development” and skills of “market research and planning”, generic rules like “add reasoning steps” are vague. On the contrary, Self-Rubrics is able to generate actions like “add SWOT analysis” and “include comparison with market competitors” to complicate the instruction. With these instructions, one can iteratively prompt the strong LLM to tailor higher quality instructions.

Selecting instructions via Contrastive Filtering
While Self-Rubrics tailors complex instructions based on instruction metadata, not all instructions, regardless of their complexity, are equally effective for instruction tuning. Intuitively, identifying instructions an LLM finds challenging can expose opportunities for improvement. We therefore introduce Contrastive Filtering, a method to select the instructions that can enhance the target LLM.

Given an input instruction, we obtain two responses from the strong LLM (the one used for data synthesis) and the target LLM (the one we target for tuning), respectively. We then measure the quality gap between the two responses using LLM-as-a-Judge: we prompt the strong LLM to generate numerical scores (e.g., from 1 to 10) reflecting each response’s quality, and define the absolute difference between two scores as the quality gap. Intuitively, a larger gap often means the target LLM produces a worse response than the strong LLM. In this case, we add the instruction and the higher-scoring response to our final pool of high-quality synthetic data. On the other hand, a smaller quality gap indicates that such instructions are unlikely to improve performance. We then save such instructions for the next iteration of Self-Rubrics for further improvement.

Effectiveness of CodecLM
We demonstrate the effectiveness of CodecLM with PaLM 2 LLMs. In particular, we use text-unicorn as the strong LLM for data synthesis, and text-bison as the target LLM for instruction tuning. We conduct experiments on multiple widely-used open domain instruction-following benchmarks, which contain instructions for various forms and complexities of task types to test LLMs’ instruction-following ability. Here we focus on the results on the Vicuna (Benchmark 1) and Evol-Instruct (Benchmark 2) test sets. We compare CodecLM with representative baselines, including Alpagasus and WizardLM+ (an enhanced version of WizardLM). Inspired by the LLM-as-a-Judge approach, we conduct LLM-based pairwise comparisons between the instruction-tuned target LLM and the strong LLM to measure how much capacity the target LLM recovers from the strong LLM. We name this metric capacity recovery ratio (CRR), where 100% CRR means the tuned target LLM performs as good as the strong LLM on the specific test set.

Consistently better performance
CodecLM outperforms comparable methods consistently on all benchmarks, highlighting its generalizability to different downstream instruction distributions. Note that common data synthesis approaches do not take the downstream instruction distribution into account, while CodecLM is able to tailor instructions for different downstream tasks, thanks to the synergy between instruction metadata, Self-Rubrics and Contrastive Filtering. Our paper has more results and in-depth analysis.

CodecLM3-Results
Results with PaLM 2–based target models on two open-domain instruction-following benchmarks. Each method trains a target model with synthetic data based on text-bison, and compares against the strong model, text-unicorn. Larger CRR means better performance.

Conclusion
Our proposed CodecLM is able to generate synthetic instruction-tuning data that is tailored to specific domains. We show that CodecLM effectively captures the underlying instruction distribution via instruction metadata, and further tailors the most effective instruction-response pairs through the novel strategies of Self-Rubrics and Contrastive Filtering. CodecLM provides a potent solution towards adapting LLMs for customized uses, without the necessity of human annotation. We believe CodecLM serves as a general framework for targeted LLM alignment, which opens the door to multiple promising research directions within the framework, such as richer metadata definition, better prompt design, and more reliable LLM-based scorers.

Acknowledgments
This research was conducted by Zifeng Wang, Chun-Liang Li, Vincent Perot, Long T. Le, Jin Miao, Zizhao Zhang, Chen-Yu Lee, Tomas Pfister. Thanks to Chih-Kuan Yeh and Sergey Ioffe for their valuable feedback.

###
https://openai.com/index/introducing-chatgpt-edu/
May 30, 2024

Introducing ChatGPT Edu
An affordable offering for universities to responsibly bring AI to campus.

An abstract expressionist painting of a desk and chair near a window in a warm color palette.
We're announcing ChatGPT Edu, a version of ChatGPT built for universities to responsibly deploy AI to students, faculty, researchers, and campus operations. Powered by GPT-4o, ChatGPT Edu can reason across text and vision and use advanced tools such as data analysis. This new offering includes enterprise-level security and controls and is affordable for educational institutions.

We built ChatGPT Edu because we saw the success universities like the University of Oxford, Wharton School of the University of Pennsylvania(opens in a new window), University of Texas at Austin, Arizona State University(opens in a new window), and Columbia University in the City of New York were having with ChatGPT Enterprise.

How campuses use ChatGPT today
ChatGPT can help with various tasks across campus, such as providing personalized tutoring for students and reviewing their resumes, helping researchers write grant applications, and assisting faculty with grading and feedback. Our university partners have found innovative ways to make AI accessible to students, faculty, researchers, and campus operations. A few examples include:

Professor Nabila El-Bassel at Columbia University is leading an initiative to integrate AI into community-based strategies to reduce overdose fatalities(opens in a new window). Her team built a GPT that analyzes and synthesizes large datasets to inform interventions, reducing weeks of research work into seconds.

Undergraduates and MBA students in Professor Ethan Mollick’s courses at Wharton completed their final reflection assignments through discussions with a GPT trained on course materials, reporting that ChatGPT got them to think more deeply about what they’ve learned.

Christiane Reves, an assistant professor at Arizona State University, is developing a custom Language Buddies GPT for students(opens in a new window) to engage in German conversations suited to their language level while receiving tailored feedback. The GPT will help students build communication skills and save faculty time on assessments.

Bringing AI into the new school year
To build on these applications, we designed ChatGPT Edu as an accessible option for universities to bring AI to their campuses at scale.

ChatGPT Edu includes:

Access to GPT-4o, our flagship model, excelling in text interpretation, coding, and mathematics

Advanced capabilities such as data analytics, web browsing, and document summarization

The ability to build GPTs, custom versions of ChatGPT, and share them within university workspaces

Significantly higher message limits than the free version of ChatGPT

Improved language capabilities across quality and speed, with over 50 languages supported

Robust security, data privacy, and administrative controls such as group permissions, SSO, SCIM 1, and GPT management

Conversations and data are not used to train OpenAI models

“Integrating OpenAI's technology into our educational and operational frameworks accelerates transformation at ASU. We're collaborating across our community to harness these tools, extending our learnings as a scalable model for other institutions.”
—Kyle Bowen, Deputy CIO at Arizona State University
ChatGPT Edu is designed for schools that want to deploy AI more broadly to students and their campus communities. Contact our team to learn more.

GPT-4o
Announcements
Footnotes
1Coming soon to ChatGPT Edu and ChatGPT Enterprise

Author
OpenAI

###
https://www.anthropic.com/news/tool-use-ga
Claude can now use tools
2024년 5월 31일

3 min read
Illustration of Claude using tools
Tool use, which enables Claude to interact with external tools and APIs, is now generally available across the entire Claude 3 model family on the Anthropic Messages API, Amazon Bedrock, and Google Cloud's Vertex AI. With tool use, Claude can perform tasks, manipulate data, and provide more dynamic—and accurate—responses.

Tool use
Define a toolset for Claude and specify your request in natural language. Claude will then select the appropriate tool to fulfill the task and, when appropriate, execute the corresponding action:

Extract structured data from unstructured text: Pull names, dates, and amounts from invoices to reduce manual data entry.
Convert natural language requests into structured API calls: Enable teams to self-serve common actions (e.g., "cancel subscription") with simple commands.
Answer questions by searching databases or using web APIs: Provide instant, accurate responses to customer inquiries in support chatbots.
Automate simple tasks through software APIs: Save time and minimize errors in data entry or file management.
Orchestrate multiple fast Claude subagents for granular tasks: Automatically find the optimal meeting time based on attendee availability.

Improved developer experience
To make it easier to leverage the intelligence of the Claude 3 models with tools, we’ve also built in features that help developers further customize the end-user experience.

Tool use with streaming reduces wait times to create more engaging interactions: Streaming enables real-time responses in applications like customer support chatbots for smoother, more natural conversations.
Forced tool use allows developers to instruct Claude on tool selection: Developers can specify which tools Claude should use or leave the choice with Claude, helping create more targeted and efficient applications.
Tools also work with images: Claude can incorporate image inputs in live applications.
During our beta many developers used Opus to build sophisticated user-facing assistants. To further enhance this experience, Opus will now include <thinking> tags in its outputs, clarifying Claude’s reasoning and simplifying the debugging process for developers. Our Claude 3 models are currently unable to support parallel tool calls.

Customer spotlight: StudyFetch
AI-native learning platform StudyFetch uses Claude's tool use capabilities to power its personalized AI tutor, Spark.E. By integrating tools to track student progress, navigate course materials and lectures, and create interactive user interfaces, StudyFetch has created a more engaging educational environment for students globally.

"Claude with tool use is accurate and cost-effective, and now powers our live voice-enabled AI tutoring sessions. Within just a few days, we integrated tools into our platform,” said Ryan Trattner, CTO and Co-Founder at StudyFetch. “As a result, our AI tutor, Spark.E, acts agentically—displaying interactive UIs, tracking student progress in context, and navigating through lectures and materials. Since implementing Claude with tool use, we've observed a 42% increase in positive human feedback."

Customer spotlight: Intuned
Intuned, the browser automation platform, uses Claude to power data extraction within their cloud platform. With AI-powered data extraction, Intuned is able to drastically improve the developer experience in building and executing more reliable browser automations.

"Claude 3 Haiku with tool use has been a game changer for us. After accessing the model and running our benchmarks on it, we realized the quality, speed, and price combination is unmatched,” said Faisal Ilaiwi, Co-Founder at Intuned. “Haiku is helping us scale our customers' data extraction tasks to a completely new level."

Customer spotlight: Hebbia
Hebbia is building the AI knowledge worker for leading financial and legal services firms. They use Claude 3 Haiku to help power several complex, multi-step customer workflows.

"We leverage Claude 3 Haiku for generating live suggestions, automating prompt writing, and extracting key metadata from long documents,” shared Divya Mehta, Product Manager at Hebbia. “Claude 3 Haiku's tool use feature has unlocked capabilities and speed for our platform to generate reliable suggestions and prompts in real-time."

Get started
You can get started with tool use today on the Anthropic Messages API, Amazon Bedrock, and Google Cloud's Vertex AI. To learn more, explore our documentation and Anthropic Cookbooks on tool use.

###
https://tenvence.github.io/p/v-express/
V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation
Cong Wang1*, Kuan Tian2*, Jun Zhang2†, Yonghang Guan2, Feng Luo2, Fei Shen2,
Zhiwei Jiang1†, Qing Gu1, Xiao Han2, Wei Yang2
1 Nanjing University, 2 Tencent AI Lab
* Equal Contribution, † Corresponding Authors


Abstract
In the field of portrait video generation, the use of single images to generate portrait videos has become increasingly prevalent. A common approach involves leveraging generative models to enhance adapters for controlled generation. However, control signals can vary in strength, including text, audio, image reference, pose, depth map, etc. Among these, weaker conditions often struggle to be effective due to interference from stronger conditions, posing a challenge in balancing these conditions. In our work on portrait video generation, we identified audio signals as particularly weak, often overshadowed by stronger signals such as pose and original image. However, direct training with weak signals often leads to difficulties in convergence. To address this, we propose V-Express, a simple method that balances different control signals through a series of progressive drop operations. Our method gradually enables effective control by weak conditions, thereby achieving generation capabilities that simultaneously take into account pose, input image, and audio. The experimental results demonstrate that our method can effectively generate portrait videos controlled by audio. Furthermore, our method provides a potential solution for the simultaneous and effective use of conditions of varying strengths.

###
https://www.youtube.com/watch?v=AzqKLiPQD6g&ab_channel=jarkkomoilanen
figma - Automation powered by GPT-4o generates Figma designs based on PRD.