Summary

OpenAI에서는 데이터 인덱싱과 쿼리 기능을 제공하는 실시간 분석 데이터베이스인 Rockset을 인수하여 자사의 검색 인프라를 강화할 예정입니다. Arcee.ai에서는 새로운 Qwen2 7B 기반의 커스텀 모델 Arcee-Spark를 출시하여 AGIEval과 MT-Bench 등에서 우수한 성능을 보였으며, Nous Research는 Llama-3 Instruct와 통합한 Hermes-2 Theta 70B 모델을 발표하여 다양한 벤치마크에서 뛰어난 성능을 입증했습니다. BBC는 AI가 인간의 일자리에 미치는 영향을 보도하며, AI 자동화로 인한 해고 사례를 조명했습니다. 또한 GenQA는 다양한 주제에 대해 자동으로 질문과 답변을 생성하는 데이터셋을 공개했습니다. MOFA-Video는 컨트롤 가능한 이미지 애니메이션 생성 기술을 선보였으며, MARS5 TTS는 뛰어난 프로소디 제어 기능을 갖춘 오픈 소스 음성 합성 모델을 발표했습니다.

OpenAI, Rockset 인수

OpenAI, Rockset 인수

링크, 2024년 6월 21일,
OpenAI

  • OpenAI는 Rockset을 인수하여 자사의 검색 인프라를 강화할 계획
  • Rockset은 실시간 데이터 인덱싱 및 쿼리 기능을 제공하는 분석 데이터베이스
  • Rockset의 기술은 OpenAI 제품의 검색 인프라에 통합될 예정
  • Rockset 팀의 일부 멤버들이 OpenAI에 합류
  • Brad Lightcap, OpenAI COO는 Rockset의 인프라가 기업들이 데이터를 실행 가능한 인텔리전스로 변환하는 데 도움을 줄 것이라고 발표
  • Venkat Venkataramani, Rockset CEO는 OpenAI와의 협력을 통해 사용자, 기업, 개발자들이 데이터를 최대한 활용할 수 있게 될 것이라고 발표

Arcee.ai, Arcee-Spark 출시

Arcee-Spark 출시

링크, 2024년 6월,
Arcee.ai

  • Qwen2 7B 기반의 커스텀 모델 Arcee-Spark 출시
  • 1.8백만 샘플로 미세 조정 후 Qwen2-7B-Instruct와 병합
  • Direct Preference Optimization (DPO)로 추가 훈련
  • AGIEval 51.11, MT-Bench 8.46, BigBenchHard 45.78, EQ-Bench 71.4 점수 달성
  • 작은 크기에도 불구하고 뛰어난 성능 제공
  • 실시간 애플리케이션, 엣지 컴퓨팅, 비용 효율적인 스케일링 등에 이상적
  • GPT-3.5보다 많은 작업에서 우수한 성능을 보임
  • 대화의 많은 회전이 필요한 작업이나 대량의 텍스트 작업에 적합한 128k 토큰의 컨텍스트 길이 제공

Nous Research, Hermes-2 Theta 70B 발표

Hermes-2 Theta 70B 발표

링크, 2024년 6월,
Nous Research

  • Hermes-2 Θ (Theta) 70B 모델 발표
  • Hermes 2 Pro 모델과 Meta의 Llama-3 Instruct 모델을 통합하여 개발
  • 강화 학습을 통해 성능 향상
  • 다양한 벤치마크에서 Llama-3 Instruct 70B보다 우수한 성능을 입증
  • Nous Research와 Charles Goddard, Arcee AI 팀의 협력으로 개발

AI가 인간의 일자리에 미치는 영향

AI가 인간의 일자리에 미치는 영향

링크, 2024년 6월 16일,
BBC

  • AI 자동화 도입 후 60명 중 59명 해고 사례 보고
  • 마지막 남은 한 명도 나중에 해고됨
  • AI가 작성한 문서를 인간이 수정하는 작업 증가
  • Benjamin Miller의 사례를 통해 AI 도입으로 인한 일자리 감소 사례 소개
  • AI와 인간의 협업이 새로운 일자리 창출 가능성 제시
  • 초기 단계의 AI 도입으로 인해 인간의 일자리가 감소했으나, 향후 협업의 가능성도 존재
  • 저임금으로 AI가 작성한 글을 수정하는 새로운 직업 등장

GenQA: 다양한 주제에 대한 자동 질문 생성

GenQA 데이터셋 공개

링크, 2024년 6월 15일,
GenQA

  • 10백만 개 이상의 청소 및 중복 제거된 명령어 데이터셋 공개
  • 다양한 주제에 대해 자동으로 질문과 답변 생성
  • Gemini Pro 1.0을 사용하여 데이터 생성
  • AlpacaEval 2.0과 MT-Bench에서 UltraChat과 WizardLM보다 우수한 성능 달성
  • 데이터셋, 생성기 프롬프트 및 모델 체크포인트 공개
  • 주제 다양성을 높이기 위해 “be creative”, “be smart” 등의 접미사를 추가하여 데이터 생성

MOFA-Video: 컨트롤 가능한 이미지 애니메이션

MOFA-Video 발표

링크, 2024년 6월 2일,
Muyao Niu 외

  • MOFA-Video는 주어진 이미지에서 다양한 추가 신호를 사용하여 비디오를 생성하는 기술 발표
  • 인간 랜드마크 참조, 수동 경로 및 다른 제공된 비디오 등의 신호를 사용하여 비디오 생성 가능
  • 다양한 모션 도메인에서 작동하며 강력한 제어 기능 제공
  • MOFA-어댑터를 사용하여 비디오 생성 파이프라인에서 생성된 모션 제어
  • 수동 경로 및 인간 랜드마크를 위한 두 개의 모션 어댑터 개별 훈련
  • MOFA-어댑터가 다양한 도메인에서 함께 작동 가능

MARS5 TTS: 고도의 프로소디 제어 음성 합성

MARS5 TTS 발표

링크, 2024년 6월,
CAMB.AI

  • MARS5 TTS는 뛰어난 프로소디 제어 기능을 갖춘 오픈 소스 텍스트 음성 변환(TTS) 모델 발표
  • 5초 이하의 음성으로 음성 클로닝 가능
  • 이중 단계 Auto-Regressive(750M) + Non-Auto Regressive(450M) 모델 아키텍처
  • 구두점, 멈춤 등을 제어할 수 있는 BPE 토크나이저 사용
  • AR 모델이 L0 코스 토큰을 예측하고, NAR DDPM 모델이 이를 세밀하게 조정한 후 보코더를 통해 최종 오디오 생성
  • 텍스트와 참조 오디오를 함께 사용하여 자연스러운 발음 및 억양 제어 가능
  • 스포츠 해설, 애니메이션 등 다양한 시나리오에서 뛰어난 성능 발휘
Sources

This GPT assists users by creating a detailed daily newspaper in Korean based on provided links. It follows these steps: read the content, summarize each content with detailed points, and write a report. The report format is:

(today’s date in 년 월 일) AI 소식,

Summary

(overall short summary, make summary with good details. for Summary section, explain the details starting with company name, e.g. OpenAI에서는 ~~~를 발표하였습니다.)

Title,

한글제목

링크, date,
company name

  • detailed summary1, (개조식 문체 사용)
  • detailed summary2, (개조식 문체 사용)
  • detailed summary N, (개조식 문체 사용)

Title,

한글제목

링크, date,
company name

  • detailed summary1, (개조식 문체 사용)
  • detailed summary2, (개조식 문체 사용)
  • detailed summary N, (개조식 문체 사용)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
###
https://openai.com/index/openai-acquires-rockset/
June 21, 2024

OpenAI acquires Rockset
Enhancing our retrieval infrastructure to make AI more helpful

image (1)
AI has the opportunity to transform how people and organizations leverage their own data. That’s why we’ve acquired Rockset, a leading real-time analytics database that provides world-class data indexing and querying capabilities.

Rockset enables users, developers, and enterprises to better leverage their own data and access real-time information as they use AI products and build more intelligent applications.

We will integrate Rockset’s technology to power our retrieval infrastructure across products, and members of Rockset’s world-class team will join OpenAI.

“Rockset’s infrastructure empowers companies to transform their data into actionable intelligence. We’re excited to bring these benefits to our customers by integrating Rockset’s foundation into OpenAI products,” said Brad Lightcap, OpenAI COO.

“We’re excited to be joining OpenAI to empower users, enterprises and developers to fully leverage their data by bringing powerful retrieval to AI,” said Venkat Venkataramani, CEO of Rockset.

Stay tuned for more updates as we get to work integrating Rockset’s capabilities.

###
https://huggingface.co/arcee-ai/Arcee-Spark
Qwen2 has a lot of potential! 👀 Arcee.ai released Arcee-Spark their first Qwen2 7B based custom model, outperforming Meta Llama 3 8B instruct on AGIEval and OpenAI GPT-3.5 on MT-Bench.
> Fine-tuned Qwen2 Base on 1.8 million samples
> Merged with Qwen2-7B-Instruct using mergekit
> Further post trained using DPO
> AGIEval 51.11; MT-Bench 8.46; BigBenchHard 45.78; EQ-Bench: 71.4
> Apache 2.0 license
Arcee Spark
Arcee Spark is a powerful 7B parameter language model that punches well above its weight class. Initialized from Qwen2, this model underwent a sophisticated training process:

Fine-tuned on 1.8 million samples
Merged with Qwen2-7B-Instruct using Arcee's mergekit
Further refined using Direct Preference Optimization (DPO)
This meticulous process results in exceptional performance, with Arcee Spark achieving the highest score on MT-Bench for models of its size, outperforming even GPT-3.5 on many tasks.

Key Features
7B parameters
State-of-the-art performance for its size
Initialized from Qwen2
Advanced training process including fine-tuning, merging, and DPO
Highest MT-Bench score in the 7B class
Outperforms GPT-3.5 on many tasks
Has a context length of 128k tokens, making it ideal for tasks requiring many conversation turns or working with large amounts of text.
Business Use Cases
Arcee Spark offers a compelling solution for businesses looking to leverage advanced AI capabilities without the hefty computational requirements of larger models. Its unique combination of small size and high performance makes it ideal for:

Real-time applications: Deploy Arcee Spark for chatbots, customer service automation, and interactive systems where low latency is crucial.

Edge computing: Run sophisticated AI tasks on edge devices or in resource-constrained environments.

Cost-effective scaling: Implement advanced language AI across your organization without breaking the bank on infrastructure or API costs.

Rapid prototyping: Quickly develop and iterate on AI-powered features and products.

On-premise deployment: Easily host Arcee Spark on local infrastructure for enhanced data privacy and security.

Performance and Efficiency
Arcee Spark demonstrates that bigger isn't always better in the world of language models. By leveraging advanced training techniques and architectural optimizations, it delivers:

Speed: Blazing fast inference times, often 10-100x faster than larger models.
Efficiency: Significantly lower computational requirements, reducing both costs and environmental impact.
Flexibility: Easy to fine-tune or adapt for specific domains or tasks.
Despite its compact size, Arcee Spark offers deep reasoning capabilities, making it suitable for a wide range of complex tasks including:

Advanced text generation
Detailed question answering
Nuanced sentiment analysis
Complex problem-solving
Code generation and analysis
Model Availability
Quants: Arcee Spark GGUF
FP32: For those looking to squeeze every bit of performance out of the model, we offer an FP32 version that scores slightly higher on all benchmarks.


###
https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-70B
Hermes 2 Theta Llama-3 70B Model Card
image/png
Introducing Hermes 2 Theta 70B!

Hermes 2 Theta is smarter, more creative, and capable of more then ever before.

It takes a strong lead over Llama-3 Instruct 70B across a wide variety of benchmarks, and is a continuation of our collaboration with
Model Description
Hermes-2 Θ (Theta) 70B is the continuation of our experimental merged model released by Nous Research, in collaboration with Charles Goddard and Arcee AI, the team behind MergeKit.

Hermes-2 Θ is a merged and then further RLHF'ed version our excellent Hermes 2 Pro model and Meta's Llama-3 Instruct model to form a new model, Hermes-2 Θ, combining the best of both worlds of each model.


###
https://www.bbc.com/future/article/20240612-the-people-making-ai-sound-more-human
AI took their jobs. Now they get paid to make it sound human
16 June 2024
By Thomas Germain,

1. 60명으로 구성된 콘텐츠 팀, AI 자동화 도입 후 59명 해고.
마지막 남은 한 명도 나중에 해고했다고 한다.

Share
Serenity Strull/BBC/Getty Images Hands typing on a typewriter (Credit: Serenity Strull/BBC/Getty Images)Serenity Strull/BBC/Getty Images
(Credit: Serenity Strull/BBC/Getty Images)
If you're worried about how AI will affect your job, the world of copywriters may offer a glimpse of the future.

Writer Benjamin Miller – not his real name – was thriving in early 2023. He led a team of more than 60 writers and editors, publishing blog posts and articles to promote a tech company that packages and resells data on everything from real estate to used cars. "It was really engaging work," Miller says, a chance to flex his creativity and collaborate with experts on a variety of subjects. But one day, Miller's manager told him about a new project. "They wanted to use AI to cut down on costs," he says. (Miller signed a non-disclosure agreement, and asked the BBC to withhold his and the company's name.)
A month later, the business introduced an automated system. Miller's manager would plug a headline for an article into an online form, an AI model would generate an outline based on that title, and Miller would get an alert on his computer. Instead of coming up with their own ideas, his writers would create articles around those outlines, and Miller would do a final edit before the stories were published. Miller only had a few months to adapt before he got news of a second layer of automation. Going forward, ChatGPT would write the articles in their entirety, and most of his team was fired. The few people remaining were left with an even less creative task: editing ChatGPT's subpar text to make it sound more human.
By 2024, the company laid off the rest of Miller's team, and he was alone. "All of a sudden I was just doing everyone's job," Miller says. Every day, he'd open the AI-written documents to fix the robot's formulaic mistakes, churning out the work that used to employ dozens of people.
In numerous industries, AI is being used to produce work that was once the exclusive domain of the human mind
"Mostly, it was just about cleaning things up and making the writing sound less awkward, cutting out weirdly formal or over-enthusiastic language," Miller says. "It was more editing than I had to do with human writers, but it was always the exact same kinds of edits. The real problem was it was just so repetitive and boring. It started to feel like I was the robot."
Miller's experience reflects a broader shift. In numerous industries, AI is being used to produce work that was once the exclusive domain of the human mind. AI is often less expensive than a person, but early adopters are quick to learn it can't always perform on the same level. Now, people like Miller are finding themselves being asked to team up with the same robots that are stealing their jobs to give the algorithms a bit of humanity – a hidden army making AI seem better than it really is.
If AI gets dramatically more effective, this will be a temporary solution. If it doesn't, Miller's story could be a preview of what's coming to other professions.
Serenity Strull/BBC/Getty Images Copywriters are at the forefront of a new line of work: human-AI collaboration (Credit: Serenity Strull/BBC/Getty Images)Serenity Strull/BBC/Getty Images
Copywriters are at the forefront of a new line of work: human-AI collaboration (Credit: Serenity Strull/BBC/Getty Images)
Will AI steal your job? It's hard to say. We're at an unsettling crossroads, where some experts warn that super intelligent robots will soon replace most human work, while others believe the technology may never even approach that point. There are also some who argue we are heading towards a future of AI and human collaboration rather than competition.
But on a much smaller scale, some workers already face distressing consequences. If there's one thing the large language models powered by generative AI can do, it's string together words and paragraphs, putting some writers on the frontline.
The fear of losing work to AI-powered writing tools was one of the main issues that led to the screen writers strike in the US last year. And other creative industries face similar concerns about their future with the arrival of AI tools capable of generating images, audio and video from scratch.
We're adding the 'human touch', but that often requires a deep, developmental edit on a piece of writing – Catrina Cowart
The impact is already being felt among copywriters – the people who write marketing material and other content for businesses. In some corners of the copywriting business, AI is a blessing. It can be a useful tool that speeds up work and enhances creativity. But other copywriters, especially those early in their careers, say AI is making it harder to find jobs.
But some have also noticed a new type of gig is emerging, one that pays a lot less: fixing the robots' shoddy writing.
"We're adding the human touch, but that often requires a deep, developmental edit on a piece of writing," says Catrina Cowart, a copywriter based in Lexington, Kentucky, US, who's done work editing AI text."The grammar and word choice just sound weird. You're always cutting out flowery words like 'therefore' and 'nevertheless' that don't fit in casual writing. Plus, you have to fact-check the whole thing because AI just makes things up, which takes forever because it's not just big ideas. AI hallucinates these flippant little things in throwaway lines that you'd never notice."
Cowart says the AI-humanising often takes longer than writing a piece from scratch, but the pay is worse. "On the job platforms where you find this work, it usually maxes out around 10 cents (£0.08) a word. But that's when you're writing, This is considered an editing job, so typically you're only getting one to five cents (£0.008-£0.04) a word," she says.
"It's tedious, horrible work, and they pay you next to nothing for it," Cowart says.
Other industries have seen similar examples of lower-paid human beings quietly powering the machines, from stepping in to help with automated ordering systems to labelling the images used to train AI vision systems in the first place.
It's been an incredible co-creative partner – Rebecca Dugas
But for some in the copywriting world, whether the arrival of AI is a good or bad thing depends on how people approach it, and how far along people are in their careers. Some writers say working the tools into their creative process can even improve their work.
The American Writers and Artists Institute (AWAI), an organisation that offers training and resources for freelance writers, hosts a variety of courses on artificial intelligence for its members. AWAI president Rebecca Matter says AI classes are now the institute's most popular offering by far. "It's an incredible tool," Matter says. "For people who make copywriting a career, the risk isn't AI taking their jobs, it's that they have to adapt. That can be uncomfortable, but I think it's a huge opportunity."
Matter says the transition to the AI world has been smooth for most of the writers she knows. In fact, it's become such an inherent part of the copywriting process that many writers now add personal "AI policies" to their professional websites to explain how they use the technology.
Rebecca Dugas, a copywriter with nine years of experience, says AI has been a "godsend" that lets her turn out the same high-quality work in a fraction of the time.
"I use AI whenever my clients are comfortable with it," she says. "Whether it's brainstorming, market research, reworking paragraphs when I'm banging my head against the wall, it's been an incredible co-creative partner."
AI makes life easier for some writers, but for others, it adds insult to injury (Serenity Strull/BBC/Getty Images)
AI makes life easier for some writers, but for others, it adds insult to injury (Serenity Strull/BBC/Getty Images)
But Dugas understands that clients may have reservations about the technology. Her own AI policy explains that Dugas is happy to forgo AI for those who prefer it – but you can expect to pay more. The extra time and mental energy required means her AI-free projects come with a higher price tag.
As AI gets better, Dugas expects that some businesses will turn to ChatGPT and other tools for their writing needs instead of hiring human beings. "But I think even now we're getting to the point where companies are realising that if you don't understand copywriting, you can't judge the effectiveness of what the AI produces," she says. According to Dugas, that means there will always be well-paying work for talented, established writers.
Miller's time humanising AI ended abruptly
But copywriters on the lower end of the career spectrum may not be so lucky. Today, many in that position find themselves in the middle of a distinctly modern set of contradictions.
A great deal of copywriting work comes from website owners who want articles that will generate more traffic from Google. However, Google made a number of dramatic announcements in the last year about its effort to remove "unhelpful" content from search results. That sparked fears that the tech giant may penalise websites that host AI-generated content. Google maintains that AI-writing is fine if the content is high quality, but these reassurances haven't dissuaded concerns.
As a result, it's become a common practice in some parts of the copywriting world to run text through AI detection software. Over the last year, a wave of writers even say they've lost jobs over false accusations from AI detectors.
According to Cowart, many of the same freelance writing platforms that have AI detection software in place are simultaneously hiring people to edit content produced by chatbots. That means in some corners of the copywriting ecosystem, almost everything revolves around efforts to avoid the appearance of artificial intelligence.
"They're selling AI content and paying you to fix it, and at the same time they're sending you emails about how to write like a human so you don't trigger their AI detector," Cowart says. "It's so insulting." Worse, the detectors are regularly updated to keep up with ongoing changes from the companies who make AI chatbots, which means the rules about what might get your writing flagged as AI constantly shift. "It's frustrating, because there are a million ways to say the same thing in English, but which one is more human? I don't like the guessing," she says.
Miller's time humanising AI ended abruptly. After months of repetitive editing work, He got called in to an unexpected meeting. On 5 April 2024, the same day a historic earthquake shook his hometown of New York, he was laid off. The company decided that Miller was just another unnecessary layer of human intervention.
"I more or less got automated out of a job," Miller says.
You might also like:

• This is what happens when you ask an algorithm for relationship advice

• How AI is testing the boundaries of human intelligence

• The chatbots that say they can feel emotions

Fortunately, it wasn't long before Miller found a new, if rather ironic, opportunity. He got a job at Undetectable AI, a technology company that builds software to make AI writing harder to identify. In other words, Miller is helping a company that's using AI to do the work he was forced into after AI took his job in the first place.
Bars Juhasz, chief technology officer of Undetectable AI, says tools like the ones his company produces are certain to have some negative effects on the labour market, but he's optimistic about the future of work. "When the automobile was first introduced in an era of horses and carts, people reacted like this was the end of days. But society always adapts," Juhasz says. "I think we're going to see a lot of jobs being replaced, and freelancers will be the hardest hit. I do feel for them. But these people who are getting paid to humanise AI are fantastic opportunists. Sure, it's not a great job, but they have effectively recognised a new seat at a moment when we're redefining the idea of productivity. People who can learn to work with the technology are going to be OK."
Miller doesn't look back fondly on his time in the AI-humanisation mines. "I contributed to a lot of the garbage that's filling the internet and destroying it," he says. "Nobody was even reading this stuff by the time I left because it's just trash." Ultimately, Miller assumes the company will just take down the AI articles he worked on. "It'll be like it never even happened."

###
https://huggingface.co/papers/2406.10323
New Instruction dataset! GenQA consists of over 10M cleaned and deduplicated instructions. GenQA used generator prompts to create a diverse list of topics (Generate 30 topics on X) and then randomly select one to generate questions and answers or dialogue pairs. It doesn’t require any human oversight. 👀
TL;DR;
💻 10M samples split into 9 domains including, code, math, writing…
⚖️ Rebalanced version with 6.47M samples, performs better than raw (10M)
🤖 Used Gemini Pro 1.0 for data generation
🏆 Outperforms UltraChat and WizardLM on AlpacaEval 2.0 and MT-Bench
📄 Paper explores best ways to create a diverse set of topics
✨ Adding suffix “be creative”, “be smart” increased diversity
🔓 Dataset, generator prompts, and model checkpoints released

GenQA: Generating Millions of Instructions from a Handful of Prompts
Published on Jun 15
Authors:
Jiuhai Chen
,
Rifaa Qadri
,
Yuxin Wen
,

Neel Jain
,
John Kirchenbauer
,
Tianyi Zhou
,
Tom Goldstein
Abstract
Most public instruction finetuning datasets are relatively small compared to the closed source datasets used to train industry models. To study questions about finetuning at scale, such as curricula and learning rate cooldown schedules, there is a need for industrial-scale datasets. However, this scale necessitates a data generation process that is almost entirely automated. In this work, we study methods for generating large instruction datasets from a single prompt. With little human oversight, we get LLMs to write diverse sets of instruction examples ranging from simple completion tasks to complex multi-turn dialogs across a variety of subject areas. When finetuning a Llama-3 8B base model, our dataset meets or exceeds both WizardLM and Ultrachat on both knowledge-intensive leaderboard tasks as well as conversational evaluations. We release our dataset, the "generator" prompts that created it, and our finetuned model checkpoints.

###
https://huggingface.co/papers/2402.03300
What is Group Relative Policy Optimization (GRPO)? Deepseek Coder v2 is the best open Code LLM rivaling GPT-4 on coding tasks. As part of the technical report, GRPO is mentioned as RLHF method, but what is it? 🤔
GRPO was introduced in the DeepSeekMath Paper earlier this year and is method in designed to improve improve mathematical reasoning capabilities with less memory consumption.
Implementation
1️⃣ Generate multiple outputs for each input question using the current Policy
2️⃣ Score these outputs using a reward model
3️⃣ Average the rewards and use it as a baseline to compute the advantages
4️⃣ Update the Policy to maximize the GRPO objective, which includes the advantages and a KL term
Insights
💡 GRPO doesn't need value function model, reducing memory and complexity
🔗 GPRO adds the KL term directly to the loss rather than in the reward
📈 GPRO improved GSM8K and MATH ~5%
👉 GPRO looks similar to RLOO method (available in TRL)
🔁 Used Iterative Approach to train new Reward Models
📊 RL data consisted of 144k CoT prompts from SFT dataset
🧠 Reward Model was trained using “Math-Shepherd” process
RL is “boosting the correct response from TopK rather than the enhancement of fundamental capabilities.”

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Published on Feb 6
·
Submitted by
akhaliq
on Feb 6
#1 Paper of the day
Authors:

Zhihong Shao
,

Peiyi Wang
,

Qihao Zhu
,
Runxin Xu
,

Junxiao Song
,
Mingchuan Zhang
,
Y. K. Li
,
Y. Wu
,

Daya Guo
Abstract
Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH. The mathematical reasoning capability of DeepSeekMath is attributed to two key factors: First, we harness the significant potential of publicly available web data through a meticulously engineered data selection pipeline. Second, we introduce Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO), that enhances mathematical reasoning abilities while concurrently optimizing the memory usage of PPO.


###
https://huggingface.co/papers/2406.13542
Generate verifiable instruction following data with AutoIF! AutoIF validates instructions by following the generated code to check their correctness. In self-alignment and strong-to-weak distillation settings, it can improve models up to 15% on IFEval 👀
Implementation
1️⃣ Create a set of hand-written seed instructions with single atomic constraints.
2️⃣ Perform self-instruct to generate more instructions.
3️⃣ Generate verification functions and test cases for each instruction using LLM.
4️⃣ Back-translate verification functions into instructions to ensure semantic consistency.
5️⃣ Augment queries by concatenating with ShareGPT samples.
6️⃣ Generate multiple responses for each query & verify responses using functions.
7️⃣ Score instructions, queries, and responses and filter out low-scoring samples.
Insights
🚀 Using GPT-4 as supervision improves performance ~15% on IFEval for Qwen2 7B.
📈 On-policy Learning is more effective: Online DPO > Offline DPO.
📊 Larger models relatively improve more.
🔍 Used n-gram probing for IFEval decontamination.
🌟 Llama 3 70B first open LLM to achieve 90% on loose instruction in IFEval.
😔 Code and scripts released, dataset not.

Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models
Published on Jun 19
·
Submitted by
davanstrien
on Jun 21
Authors:

Guanting Dong
,

Keming Lu
,

Chengpeng Li
,
Tingyu Xia
,

Bowen Yu
,
Chang Zhou
,

Jingren Zhou
Abstract
One core capability of large language models (LLMs) is to follow natural language instructions. However, the issue of automatically constructing high-quality training data to enhance the complex instruction-following abilities of LLMs without manual annotation remains unresolved. In this paper, we introduce AutoIF, the first scalable and reliable method for automatically generating instruction-following training data. AutoIF transforms the validation of instruction-following data quality into code verification, requiring LLMs to generate instructions, the corresponding code to check the correctness of the instruction responses, and unit test samples to verify the code's correctness. Then, execution feedback-based rejection sampling can generate data for Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) training. AutoIF achieves significant improvements across three training algorithms, SFT, Offline DPO, and Online DPO, when applied to the top open-source LLMs, Qwen2 and LLaMA3, in self-alignment and strong-to-weak distillation settings. Our code is publicly available at https://github.com/QwenLM/AutoIF.

###
https://myniuuu.github.io/MOFA_Video/
[Submitted on 30 May 2024 (v1), last revised 2 Jun 2024 (this version, v2)]
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
Muyao Niu, Xiaodong Cun, Xintao Wang, Yong Zhang, Ying Shan, Yinqiang Zheng
We present MOFA-Video, an advanced controllable image animation method that generates video from the given image using various additional controllable signals (such as human landmarks reference, manual trajectories, and another even provided video) or their combinations. This is different from previous methods which only can work on a specific motion domain or show weak control abilities with diffusion prior. To achieve our goal, we design several domain-aware motion field adapters (\ie, MOFA-Adapters) to control the generated motions in the video generation pipeline. For MOFA-Adapters, we consider the temporal motion consistency of the video and generate the dense motion flow from the given sparse control conditions first, and then, the multi-scale features of the given image are wrapped as a guided feature for stable video diffusion generation. We naively train two motion adapters for the manual trajectories and the human landmarks individually since they both contain sparse information about the control. After training, the MOFA-Adapters in different domains can also work together for more controllable video generation. Project Page: this https URL

###
https://github.com/Camb-ai/MARS5-TTS
MARS5 TTS: Open Source Text to Speech with insane prosodic control! 🔥
> Voice cloning with less than 5 seconds of audio
> Two stage Auto-Regressive (750M) + Non-Auto Regressive (450M) model architecture
> Used BPE tokenizer to enable control over punctuations, pauses, stops etc.
> AR model predicts L0 coarse tokens, refined further by the NAR DDPM model followed by the vocoder

Approach
This is the repo for the MARS5 English speech model (TTS) from CAMB.AI.

The model follows a two-stage AR-NAR pipeline with a distinctively novel NAR component (see more info in the Architecture).

With just 5 seconds of audio and a snippet of text, MARS5 can generate speech even for prosodically hard and diverse scenarios like sports commentary, anime and more. Check out our demo:

intro_vid_camb.mp4
Watch full video here: Youtube

Mars 5 simplified diagram

Figure: The high-level architecture flow of MARS5. Given text and a reference audio, coarse (L0) encodec speech features are obtained through an autoregressive transformer model. Then, the text, reference, and coarse features are refined in a multinomial DDPM model to produce the remaining encodec codebook values. The output of the DDPM is then vocoded to produce the final audio.

Because the model is trained on raw audio together with byte-pair-encoded text, it can be steered with things like punctuation and capitalization. E.g. To add a pause, add a comma to that part in the transcript. Or, to emphasize a word, put it in capital letters in the transcript. This enables a fairly natural way for guiding the prosody of the generated output.

Speaker identity is specified using an audio reference file between 2-12 seconds, with lengths around 6s giving optimal results. Further, by providing the transcript of the reference, MARS5 enables one to do a 'deep clone' which improves the quality of the cloning and output, at the cost of taking a bit longer to produce the audio. For more details on this and other performance and model details, please see the