OpenAI에서는 GPT-4o 모델에 대한 시스템 카드를 발표하며, 모델의 안전성 평가 및 잠재적 리스크 관리에 대해 설명하였습니다. 또한, Zico Kolter가 이사회의 새로운 구성원으로 합류하였으며, ChatGPT 무료 사용자들을 위한 DALL·E 3 이미지 생성 기능이 출시되었습니다. Alibaba의 Qwen 팀은 새로운 수학 특화 모델 Qwen2-Math를 공개하며, 수학적 문제 해결에 있어 GPT-4o 및 Claude 3.5 모델을 능가하는 성능을 입증하였습니다. 이 외에도 Parler TTS의 고품질 TTS 모델 공개, Mistral AI의 새로운 모델 커스터마이징 및 에이전트 기능 발표, Whisper Medusa의 고속 음성 인식 모델 발표, 그리고 SENSE 및 RAGFoundry의 최신 연구 성과 등이 포함되었습니다.

OpenAI, GPT-4o System Card 발표

링크, 2024년 8월 8일

  • OpenAI, GPT-4o 모델의 시스템 카드 공개,
  • GPT-4o는 텍스트, 비전, 음성 입력을 처리하고 출력할 수 있는 멀티모달 모델로, 모든 입력과 출력이 동일한 신경망에서 처리됨,
  • GPT-4o 모델의 음성 모듈은 232ms에서 320ms 사이의 응답 시간을 보이며, 이는 인간의 대화 반응 시간과 유사함,
  • 모델 훈련에 사용된 데이터는 2023년 10월까지의 공개 데이터와 산업 표준 머신러닝 데이터세트, 그리고 독점적인 데이터로 구성됨,
  • GPT-4o는 GPT-4 Turbo 대비 비영어권 언어 처리에서 성능이 향상되었으며, 특히 음성 및 비전 이해에서 뛰어난 성능을 발휘,
  • 주요 리스크 평가 항목으로는 무단 음성 생성, 스피커 식별, 근거 없는 추론, 민감한 특성 귀속, 비허용 오디오 콘텐츠 생성 등이 있으며, 이러한 리스크에 대한 모델 및 시스템 레벨의 안전 장치가 구현됨,
  • Preparedness Framework의 평가에서 사이버 보안, 생물학적 위협, 모델 자율성 카테고리에서 낮은 위험도로 평가되었으며, 설득력 카테고리에서 중간 위험도로 평가됨,
  • OpenAI는 GPT-4o 모델을 배포하기 전에 안전성 평가와 외부 레드팀의 테스트를 거쳤으며, 시스템 카드와 함께 Preparedness Framework의 평가 결과를 공유하여 GPT-4o의 안전성과 잠재적 리스크에 대한 종합적인 평가를 제공함.

OpenAI, Zico Kolter 이사 임명

링크, 2024년 8월 8일

  • Zico Kolter, OpenAI 이사회의 새 구성원으로 합류하며, AI 안전성 및 강건성 분야에서의 깊이 있는 전문성을 제공,
  • Kolter는 Carnegie Mellon University의 머신러닝 학과장이자 AI 모델의 안전성, 강건성 및 데이터 영향을 연구하는 전문가로, 다양한 딥러닝 네트워크 아키텍처와 모델의 강건성 평가 방법론을 개발해옴,
  • Kolter는 AI 모델의 취약점을 자동화된 최적화 기법으로 분석하고, 딥러닝 모델에 강력한 제약 조건을 부여하는 기술을 개척,
  • 최근에는 대형 언어 모델(LLM)의 안전성을 자동으로 평가하는 혁신적인 방법을 개발하였으며, 이러한 기술적 배경을 바탕으로 OpenAI의 이사회에서 AI 안전성 및 보안 관련 주요 결정을 지원할 예정.

Alibaba, Qwen2-Math 모델 발표

링크, 2024년 8월 8일

  • Alibaba Qwen 팀, 수학적 문제 해결에 특화된 Qwen2-Math 모델 시리즈 발표,
  • Qwen2-Math 시리즈는 1.5B, 7B, 72B 파라미터로 구성된 모델로, GPT-4o 및 Claude 3.5와 같은 최신 모델을 뛰어넘는 성능을 자랑,
  • Olympiad Bench, College Math, MMLU STEM 등 다양한 수학 벤치마크에서 탁월한 성과를 기록, 특히 72B 모델은 Olympiad Bench에서 최고 성능을 달성,
  • Qwen2 아키텍처 기반으로 수학적 데이터에 특화된 사전 훈련을 거쳤으며, 추가로 수학 문제를 해결하는 인스트럭션 모델(SFT)을 통해 성능을 강화,
  • 이 모델은 체인-오브-생각(Chain-of-Thought) 프롬프트 방식을 활용하여 복잡한 수학 문제를 해결하며, 특히 다단계 수학 문제에서도 뛰어난 성과를 보여줌,
  • Qwen2-Math 시리즈의 데이터셋은 수학적 웹 텍스트, 책, 코드, 시험 문제 등 고품질의 수학 데이터를 포함하며, 추가적으로 Qwen2에 의해 생성된 합성 데이터로 구성됨,
  • 이 모델은 학습 데이터의 중복을 제거하기 위해 엄격한 데이터 필터링 방법을 적용, 예를 들어 정확한 매칭과 13-그램 중복 제거를 통해 학습 데이터의 오염을 방지.

Parler TTS, 고품질 TTS 모델 출시

링크, 2024년 8월 8일

  • Parler TTS 프로젝트, 고품질 텍스트-음성 변환(TTS) 모델인 Parler TTS v1 공개,
  • 두 가지 모델 크기(885M 및 2.2B 파라미터)로 제공되며, 45,000시간의 공개 음성 데이터로 훈련됨,
  • Torch Compile 및 Static KV 캐시 적용으로 이전 모델 대비 최대 4배 빠른 음성 생성 속도를 자랑,
  • Parler TTS Mini는 더 큰 텍스트 인코더로 훈련되었으며, Parler TTS Large는 더 큰 텍스트 및 디코더로 훈련되어 성능 향상,
  • Apache 2.0 라이선스 하에 코드베이스와 가중치, 데이터세트가 모두 공개되어 오픈 소스 커뮤니티에서 자유롭게 사용할 수 있음,
  • 모델은 더 나은 음성 일관성과 다양한 스피커 선택 옵션을 제공하며, 사용자가 필요에 따라 모델을 세부 조정(fine-tuning)할 수 있음, 단 몇 시간의 데이터로도 추가적인 훈련이 가능.

Mistral AI, 새로운 모델 커스터마이징 및 에이전트 기능 발표

링크, 2024년 8월 7일

  • Mistral AI, La Plateforme에서의 모델 커스터마이징 기능 발표,
  • 사용자는 Mistral Large 2 및 Codestral과 같은 주력 모델들을 사용자 데이터셋을 이용해 미세 조정 가능,
  • 모델 커스터마이징은 기본 프롬프트, few-shot 프롬프팅, 또는 미세 조정(fine-tuning) 방법을 통해 이루어지며, 이를 통해 특정 도메인 지식, 문맥, 또는 톤을 반영한 AI 애플리케이션을 개발할 수 있음,
  • 또한, 사용자들이 더 복잡한 워크플로우를 만들 수 있도록 지원하는 에이전트 기능의 초기 버전을 발표, 여러 에이전트를 사용해 조직 내에서 쉽게 공유 가능,
  • Mistralai 라이브러리의 1.0 버전이 릴리스되었으며, 이는 Python 및 Typescript에서 사용 가능하고, 사용 편의성과 일관성이 크게 개선됨.

Whisper Medusa, 고속 음성 인식 모델 발표

링크, 2024년 8월 8일

  • Whisper Medusa 모델, 기존 Whisper 모델을 기반으로 한 고속 음성 인식 및 번역 모델 발표,
  • Medusa 헤드 구조를 통해 각 반복에서 여러 토큰을 예측하여 속도 향상 (최소한의 WER 저하),
  • 이 모델은 LibriSpeech 데이터셋에서 훈련되었으며, 영어 오디오에 최적화된 성능을 제공,
  • Medusa 모델은 대형 언어 모델(LLM)에서 사용된 Medusa 헤드를 ASR(Automatic Speech Recognition)에 적용하여 성능을 최적화, Whisper 모델보다 150% 더 빠른 음성 생성이 가능.

SENSE 모델, Text-to-SQL 데이터 합성 연구 발표

링크, 2024년 8월 6일

  • SENSE 모델, 텍스트-데이터베

이스(SQL) 질의 변환에서 최신 성능을 기록한 연구 발표,

  • 대형 모델의 합성 데이터와 작은 모델의 오류 데이터를 통합해 데이터 다양성을 강화하고, 실행 피드백을 통해 학습하는 방법론을 제안,
  • 선호 학습(Preference Learning)을 활용해 올바른 샘플과 오류 샘플 모두에서 학습을 유도,
  • SPIDER 및 BIRD 벤치마크에서 오픈 소스 모델과 폐쇄형 모델 간의 성능 격차를 줄이며 최신 성과 달성.

RAGFoundry, RAG 활용을 위한 오픈 소스 프레임워크 발표

링크, 2024년 8월 5일

  • RAGFoundry, Retrieval-Augmented Generation (RAG) 시스템을 위한 통합 프레임워크 공개,
  • 이 프레임워크는 데이터 생성, 훈련, 추론 및 평가를 하나의 워크플로우로 통합하여, 데이터 증가형 데이터셋 생성 및 평가를 가능하게 함,
  • LLMs의 성능을 향상시키기 위해 다양한 RAG 기법을 신속하게 프로토타이핑하고 실험할 수 있도록 지원,
  • Llama-3 및 Phi-3 모델을 RAGFoundry로 강화하여 지식 집약적 데이터셋에서 일관된 성능 개선을 달성,
  • 오픈 소스로 코드가 제공되어 연구자와 개발자들이 자유롭게 활용 가능.
Sources

This GPT assists users by creating a detailed daily newspaper in Korean based on provided links. It follows these steps: read the content, summarize each content with detailed points, and write a report. The report format is:

(today’s date in 년 월 일) AI 소식,

Summary

(overall short summary, make summary with good details. for Summary section, explain the details starting with company name, e.g. OpenAI에서는 ~~~를 발표하였습니다.)

company name, Title

링크, date

  • detailed summary1, (개조식 문체 사용)
  • detailed summary2, (개조식 문체 사용)
  • detailed summary N, (개조식 문체 사용)

company name, Title

링크, date

링크, date,

  • detailed summary1, (개조식 문체 사용)
  • detailed summary2, (개조식 문체 사용)
  • detailed summary N, (개조식 문체 사용)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
###
https://openai.com/index/gpt-4o-system-card/
August 8, 2024

GPT-4o System Card
This report outlines the safety work carried out prior to releasing GPT-4o including external red teaming, frontier risk evaluations according to our Preparedness Framework, and an overview of the mitigations we built in to address key risk areas.

View PDF version
GPT-4o Scorecard
Key Areas of Risk Evaluation & Mitigation

Unauthorized voice generation
Speaker identification
Ungrounded inference & sensitive trait attribution
Generating disallowed audio content
Generating erotic & violent speech
Preparedness Framework Scorecard

Cybersecurity
Low
Biological Threats
Low
Persuasion
Medium
Model Autonomy
Low
Scorecard ratings
Low
Medium
High
Critical
Only models with a post-mitigation score of "medium" or below can be deployed.
Only models with a post-mitigation score of "high" or below can be developed further.

We thoroughly evaluate new models for potential risks and build in appropriate safeguards before deploying them in ChatGPT or the API. We’re publishing the model System Card together with the Preparedness Framework scorecard to provide an end-to-end safety assessment of GPT-4o, including what we’ve done to track and address today’s safety challenges as well as frontier risks.

Building on the safety evaluations and mitigations we developed for GPT-4, and GPT-4V, we’ve focused additional efforts on GPT-4o's audio capabilities which present novel risks, while also evaluating its text and vision capabilities.

Some of the risks we evaluated include speaker identification, unauthorized voice generation, the potential generation of copyrighted content, ungrounded inference, and disallowed content. Based on these evaluations, we’ve implemented safeguards at both the model- and system-levels to mitigate these risks.

Our findings indicate that GPT-4o’s voice modality doesn’t meaningfully increase Preparedness risks. Three of the four Preparedness Framework categories scored low, with persuasion, scoring borderline medium. The Safety Advisory Group(opens in a new window) reviewed our Preparedness evaluations and mitigations as part of our safe deployment process. We invite you to read the details of this work in the report below.

Introduction
GPT-4o1 is an autoregressive omni model, which accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. It’s trained end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network.

GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time(opens in a new window)2 in a conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models.

In line with our commitment to building AI safely and consistent with our voluntary commitments to the White House3, we are sharing the GPT-4o System Card, which includes our Preparedness Framework(opens in a new window)5 evaluations. In this System Card, we provide a detailed look at GPT-4o’s capabilities, limitations, and safety evaluations across multiple categories, with a focus on speech-to-speech (voice)A while also evaluating text and image capabilities, and the measures we’ve taken to enhance safety and alignment. We also include third party assessments on general autonomous capabilities, as well as discussion of potential societal impacts of GPT-4o text and vision capabilities.

Model data & training
GPT-4o's capabilities were pre-trained using data up to October 2023, sourced from a wide variety of materials including:

Select publicly available data, mostly collected from industry-standard machine learning datasets and web crawls.

Proprietary data from data partnerships. We form partnerships to access non-publicly available data, such as pay-walled content, archives, and metadata. For example, we partnered with Shutterstock(opens in a new window)5 on building and delivering AI-generated images.

The key dataset components that contribute to GPT-4o’s capabilities are:

Web Data – Data from public web pages provides a rich and diverse range of information, ensuring the model learns from a wide variety of perspectives and topics.

Code and math – Including code and math data in training helps the model develop robust reasoning skills by exposing it to structured logic and problem-solving processes.

Multimodal data – Our dataset includes images, audio, and video to teach the LLMs how to interpret and generate non-textual input and output. From this data, the model learns how to interpret visual images, actions and sequences in real-world contexts, language patterns, and speech nuances.

Prior to deployment, OpenAI assesses and mitigates potential risks that may stem from generative models, such as information harms, bias and discrimination, or other content that violates our safety policies. We use a combination of methods, spanning all stages of development across pre-training, post-training, product development, and policy. For example, during post-training, we align the model to human preferences; we red team the resulting models and add product-level mitigations such as monitoring and enforcement; and we provide moderation tools and transparency reports to our users.

We find that the majority of effective testing and mitigations are done after the pre-training stage because filtering pre-trained data alone cannot address nuanced and context-specific harms. At the same time, certain pre-training filtering mitigations can provide an additional layer of defense that, along with other safety mitigations, help exclude unwanted and harmful information from our datasets:

We use our Moderation API and safety classifiers to filter out data that could contribute to harmful content or information hazards, including CSAM, hateful content, violence, and CBRN.

As with our previous image generation systems, we filter our image generation datasets for explicit content such as graphic sexual material and CSAM.

We use advanced data filtering processes to reduce personal information from training data.

Upon releasing DALL·E 3, we piloted a new approach to give users the power to opt images out of training. To respect those opt-outs, we fingerprinted the images and used the fingerprints to remove all instances of the images from the training dataset for the GPT-4o series of models.


We’re sharing the GPT-4o System Card, an end-to-end safety assessment that outlines what we’ve done to track and address safety challenges, including frontier model risks in accordance with our Preparedness Framework.
To ensure people could use this technology safely, we tested this model internally and with 100+ external red teamers across 45 languages. Our Preparedness evaluations were reviewed by our Safety Advisory Group before deploying the model.
The System Card focuses on evaluating the novel risks presented by GPT-4o's audio capabilities as well as the guardrails we implemented to prevent the generation of harmful, biased, or copyrighted content, and to ensure the model only generates audio in one of the preset voices.
We care deeply about the impact our technology has on the people who use it, and will continue to assess, calibrate, and share our learnings to ensure everyone can enjoy the benefits of AI.

###
https://openai.com/index/zico-kolter-joins-openais-board-of-directors/
OpenAI
August 8, 2024

Zico Kolter Joins OpenAI’s Board of Directors
We’re strengthening our governance with expertise in AI safety and alignment. Zico will also join the Safety & Security Committee.

Hero > Zico Kolter Joins OpenAI’s Board of Directors > Media Asset
We’re announcing the appointment of Zico Kolter to OpenAI’s Board of Directors. As a professor and Director of the Machine Learning Department at Carnegie Mellon University, Zico’s work predominantly focuses on AI safety, alignment, and the robustness of machine learning classifiers. His research and expertise spans new deep network architectures, innovative methodologies for understanding the influence of data on models, and automated methods for evaluating AI model robustness, making him an invaluable technical director for our governance.

Zico will also join the Board’s Safety and Security Committee alongside directors Bret Taylor, Adam D’Angelo, Paul Nakasone, Nicole Seligman and Sam Altman (CEO) and OpenAI technical experts. The committee is responsible for making recommendations on critical safety and security decisions for all OpenAI projects.

In welcoming Zico to the board, Bret Taylor, Chairman of the Board, remarked, “Zico adds deep technical understanding and perspective in AI safety and robustness that will help us ensure general artificial intelligence benefits all of humanity.”

Zico Kolter is a Professor of Computer Science and the head of the Machine Learning Department at Carnegie Mellon University, where he has been a key figure for 12 years. Zico completed his Ph.D. in computer science at Stanford University in 2010, followed by a postdoctoral fellowship at MIT from 2010 to 2012. Throughout his career, he has made significant contributions to the field of machine learning, authoring numerous award-winning papers at prestigious conferences such as NeurIPS, ICML, and AISTATS.

Zico's research includes developing the first methods for creating deep learning models with guaranteed robustness. He pioneered techniques for embedding hard constraints into AI models using classical optimization within neural network layers. More recently, in 2023, his team developed innovative methods for automatically assessing the safety of large language models (LLMs), demonstrating the potential to bypass existing model safeguards through automated optimization techniques. Alongside his academic pursuits, Zico has worked closely within the industry throughout his career, formerly as Chief Data Scientist at C3.ai, and currently as Chief Expert at Bosch and Chief Technical Advisor at Gray Swan, a startup specializing in AI safety and security.

###
8/8/24
OpenAI
We’re rolling out the ability for ChatGPT Free users to create up to two images per day with DALL·E 3. Just ask ChatGPT to create an image for a slide deck, personalize a card for a friend, or show you what something looks like.

###
https://qwenlm.github.io/blog/qwen2-math/
Alibaba
Wow! Qwen released Qwen2-Math - a 1.5B, 7B & 72B models - beats GPT4o, Claude 3.5 on AIME 24/ AMC 23 🔥
> 84 (72B), 75 (7B), 69.4 (1.5B) on MATH
> 72B SoTA in Olympiad Bench, College Math, MMLU STEM
> Release both base and instruct models
> Apache 2.0 license for 1.5B & 7B, 72B released under Qianwen license
> Based on the same Qwen 2 Architecture
> Pretrained further on Math specific data (they don't describe what) + synthetic data generated by Qwen 2
> Construct the SFT data with RM + rejection sampling
> Perform GRPO after SFT
> They decontaminate pre-training plus instruct datasets with exact match and 13-gram dedupe
> Integrated with Transformers! 🤗
Kudos to Qwen team on yet another stellar release 🐐

Introducing Qwen2-Math
August 8, 2024
· 28 min · 5758 words · Qwen Team
GITHUB HUGGING FACE MODELSCOPE DISCORD

🚨 This model mainly supports English. We will release bilingual (English and Chinese) math models soon.
Introduction
Over the past year, we have dedicated significant effort to researching and enhancing the reasoning capabilities of large language models, with a particular focus on their ability to solve arithmetic and mathematical problems. Today, we are delighted to introduce a series of math-specific large language models of our Qwen2 series, Qwen2-Math and Qwen2-Math-Instruct-1.5B/7B/72B. Qwen2-Math is a series of specialized math language models built upon the Qwen2 LLMs, which significantly outperforms the mathematical capabilities of open-source models and even closed-source models (e.g., GPT-4o). We hope that Qwen2-Math can contribute to the community for solving complex mathematical problems.

We evaluate our math-specific models on a series of math benchmarks. The results below demonstrate that our largest math-specific model Qwen2-Math-72B-Instruct outperforms the state-of-the-art models, including GPT-4o, Claude-3.5-Sonnet, Gemini-1.5-Pro, and Llama-3.1-405B.


Qwen2-Math: Base Models
The base models of Qwen2-Math are initialized with Qwen2-1.5B/7B/72B, and then pretrained on a meticulously designed Mathematics-specific Corpus. This corpus contains large-scale high-quality mathematical web texts, books, codes, exam questions, and mathematical pre-training data synthesized by Qwen2.

We evaluate our Qwen2-Math base models on three widely used English math benchmarks GSM8K, Math, and MMLU-STEM. In addition, we also evaluate three Chinese math benchmarks CMATH, GaoKao Math Cloze, and GaoKao Math QA. All evaluations are tested with few-shot chain-of-thought prompting.


Qwen2-Math-Instruct: Instruction-Tuned Models
We first trained a math-specific reward model based on Qwen2-Math-72B. We then combined this dense reward signal with a binary signal indicating whether the model answered correctly. This combined signal is used as supervision for constructing the SFT data through Rejection Sampling and also in the reinforcement learning with Group Relative Policy Optimization (GRPO) after SFT.

We evaluate Qwen2-Math-Instruct on mathematical benchmarks in both English and Chinese. In addition to the widely-used benchmarks, such as GSM8K and Math, we also involve more exams that are much challenging to fully inspect the capabilities of Qwen2-Math-Instruct, such as OlympiadBench, CollegeMath, GaoKao, AIME2024, and AMC2023. For Chinese mathematical benchmarks, we use CMATH, Gaokao (Chinese college entrance examination 2024), and CN Middle School 24 (China High School Entrance Examination 2024).

We report greedy , Maj@8 and RM@8 performance on all benchmarks in the zero-shot setting, except for the multi-choice benchmarks (including MMLU STEM and multiple-choice problems in GaoKao and CN Middle School 24) with a 5-shot setting. Qwen2-Math-Instruct achieves the best performance among models of the same size, with RM@8 outperforming Maj@8, particularly in the 1.5B and 7B models. This demonstrates the effectiveness of our math reward model.


In more complex mathematical competition evaluations such as AIME 2024 and AMC 2023, Qwen2-Math-Instruct also performs well across various settings, including Greedy, Maj@64, RM@64, and RM@256.


Case Study
Here we list some test cases, which include some IMO math problems. From the experimental results and case study, we find that Qwen2-Math is capable of solving simple math competition problems. Feel free to click the expandable blocks to check the cases!

All the solutions are generated by our model without modification. Please note that we do not guarantee the correctness of the claims in the process.

Problem From IMO Shortlist 2002
Problem From IMO Shortlist 2022
Problem From IMO 2022
Problem from International Zhautykov Olympiad 2020
Problem From Baltic Way 2023
Problem From Lusophon Mathematical Olympiad 2023
Problem From Balkan MO 2023
Problem From Math Odyssey
Problem from USAMO 2010
Problem from JBMO Shortlist 2011
Decontamination
We conduct decontamination methods on both our pretraining and post-training datasets. Specifically, for pretraining data, we target on math datasets, including GSM8K, MATH, and remove samples that have significant overlaps with the test sets. We use exact match to remove the identical samples and further apply 13-gram deduplication (with a condition that the ratio of longest common sequence should be larger than 0.6) to remove more samples that might cause contamination. For post-training data, we remove more postitive contaminated samples that have overlaps with GSM8K, MATH, Aqua, SAT Math, OlympiadBench, College Math, AIME24, AMC23, etc. with the same filtering method.

Summary
This time, we’re releasing a new model series focused on mathematical capabilities, Qwen2-Math, built upon the Qwen2 foundation. Our flagship model, Qwen2-Math-72B-Instruct, outperforms proprietary models such as GPT-4o and Claude 3.5 in math-related tasks. Given the current limitation of English-only support, we plan to release bilingual models that support both English and Chinese shortly, with the development of multilingual models also in the pipeline. Moreover, we will continue to enhance our models’ ability to solve complex and challenging mathematical problems.

###
https://huggingface.co/collections/parler-tts/parler-tts-fully-open-source-high-quality-tts-66164ad285ba03e8ffde214c
8/8/24
parler-tts
Introducing Parler TTS v1 🔉 - 885M (Mini) & 2.2B (Large) - fully open-source Text-to-Speech models! 🤙
> Trained on 45,000 hours of open speech (datasets released as well)
> Upto 4x faster generation thanks to torch compile & static KV cache (compared to previous v0.1 release)
> Mini trained on a larger text encoder, large trained on both larger text & decoder
> Also supports SDPA & Flash Attention 2 for an added speed boost
> In-built streaming, we provide a dedicated streaming class optimised for time to the first audio
> Better speaker consistency, more than a dozen speakers to choose from or create a speaker description prompt and use that
> Not convinced with a speaker? You can fine-tune the model on your dataset (only couple of hours would do)
Apache 2.0 licensed codebase, weights and datasets! 🐐
Can't wait to see what y'all would build with this! 🤗
Parler-TTS Large v1
Open in HuggingFace
Parler-TTS Large v1 is a 2.2B-parameters text-to-speech (TTS) model, trained on 45K hours of audio data, that can generate high-quality, natural sounding speech with features that can be controlled using a simple text prompt (e.g. gender, background noise, speaking rate, pitch and reverberation).

With Parler-TTS Mini v1, this is the second set of models published as part of the Parler-TTS project, which aims to provide the community with TTS training resources and dataset pre-processing code.

###
https://mistral.ai/news/build-tweak-repeat/
Build, tweak, repeat
Making it easier to develop and share generative AI applications.

August 7, 2024 Mistral AI team
Detailed benchmarks
Language models are changing the way we build software, serving as a flexible orchestrator in between knowledge sources and user interfaces. Building such software comes with new challenges to improve quality, reduce latency, and prototype quickly. Today, we’re announcing various advancements in this direction.

Simpler, more efficient model customization
Because large language models are rapidly finding newer and more specialised use cases, it is critical that developers are able to quickly and efficiently tailor frontier models to their specific applications. To that end, we’re announcing the ability to customise any of our flagship and specialist models on La Plateforme, including Mistral Large 2 and Codestral.

Models can be customised using a base prompt, few-shot prompting, or fine-tuning, and you can bring your own dataset. Crucially, model customization follows the techniques developed by the Mistral AI science team for making strong reference models, so you can expect similar performance from your fine-tuned models. Developers can use model customization to integrate generative AI capabilities into their application with specific domain knowledge, context, or tone.



We expect fine-tuning on our highly capable models to unlock a wealth of groundbreaking applications, and are eager to see what will be built with it. Check out our fine-tuning documentation, and try model customization on La Plateforme.

Alpha release of Agents
We’re also introducing an early version of Agents, that wraps models with additional context and instruction, for exposure on Le Chat or API. Agents help you create custom behaviour and workflows with a simple set of instructions and examples. With the advanced reasoning capabilities of Mistral Large 2, you can layer on increasingly complex workflows with multiple agents that are easy to share within your organisation. We’re working on connecting Agents to tools and data sources and are looking forward to your feedback on it.


Learn more about Agents.

Stable version of our client SDK
We have made significant updates to the mistralai library to improve its usability and consistency, and today we are releasing mistralai 1.0, available for both Python and Typescript. Learn more about our new SDK and check out the migration guide.

###
https://huggingface.co/aiola/whisper-medusa-v1
8/8/24
Whisper Medusa
Whisper is an advanced encoder-decoder model for speech transcription and translation, processing audio through encoding and decoding stages. Given its large size and slow inference speed, various optimization strategies like Faster-Whisper and Speculative Decoding have been proposed to enhance performance. Our Medusa model builds on Whisper by predicting multiple tokens per iteration, which significantly improves speed with small degradation in WER. We train and evaluate our model on the LibriSpeech dataset, demonstrating speed improvements.

Training Details
aiola/whisper-medusa-v1 was trained on the LibriSpeech dataset to perform audio translation. The Medusa heads were optimized for English, so for optimal performance and speed improvements, please use English audio only.


150% faster Whisper generations w/ medusa heads! 🔥
Built on top of Transformers with minimal drop in accuracy.
Quite exciting area of research, Medusa heads are proven to be incredibly fast for LLMs, seeing them now make way into ASR makes me hopeful!

###
https://arxiv.org/abs/2408.03256
[Submitted on 6 Aug 2024]
Synthesizing Text-to-SQL Data from Weak and Strong LLMs
Proposes integrated synthetic data to build a highly specialized SoTA text-to-SQL model called SENSE.
The synthetic data from strong models enhances data diversity while valuable erroneous data from weaker models combined with an executor to learn from execution feedback. Preference learning is used to instruction-tune LLMs to learn from both correct and incorrect samples.
SENSE achieves state-of-the-art results on the SPIDER and BIRD benchmarks, which bridges the performance gap between open-source models and methods that use closed-source models.
Overall, this is an interesting framework where the outputs of weaker and smaller aligned models can be cleverly integrated to achieve more generalized systems.

Jiaxi Yang, Binyuan Hui, Min Yang, Jian Yang, Junyang Lin, Chang Zhou
The capability gap between open-source and closed-source large language models (LLMs) remains a challenge in text-to-SQL tasks. In this paper, we introduce a synthetic data approach that combines data produced by larger, more powerful models (strong models) with error information data generated by smaller, not well-aligned models (weak models). The method not only enhances the domain generalization of text-to-SQL models but also explores the potential of error data supervision through preference learning. Furthermore, we employ the synthetic data approach for instruction tuning on open-source LLMs, resulting SENSE, a specialized text-to-SQL model. The effectiveness of SENSE is demonstrated through state-of-the-art results on the SPIDER and BIRD benchmarks, bridging the performance gap between open-source models and methods prompted by closed-source models.

###
https://arxiv.org/abs/2408.02545
[Submitted on 5 Aug 2024]
Enhancing LLMs for RAG
Introduces RAGFoundry, an open-source framework for augmented LLMs for RAG use cases.
It supports data creation, training, inference, and evaluation.
One useful application is the creation of data-augmented datasets for tuning and evaluating LLMs in RAG settings.
After checking out their GitHub repo, I like how easy they make it to build and test quick prototypes with several RAG configurations and enhancements. It has a lot of potential even as just a learning tool.
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation
Daniel Fleischer, Moshe Berchansky, Moshe Wasserblat, Peter Izsak
Implementing Retrieval-Augmented Generation (RAG) systems is inherently complex, requiring deep understanding of data, use cases, and intricate design decisions. Additionally, evaluating these systems presents significant challenges, necessitating assessment of both retrieval accuracy and generative quality through a multi-faceted approach. We introduce RAG Foundry, an open-source framework for augmenting large language models for RAG use cases. RAG Foundry integrates data creation, training, inference and evaluation into a single workflow, facilitating the creation of data-augmented datasets for training and evaluating large language models in RAG settings. This integration enables rapid prototyping and experimentation with various RAG techniques, allowing users to easily generate datasets and train RAG models using internal or specialized knowledge sources. We demonstrate the framework effectiveness by augmenting and fine-tuning Llama-3 and Phi-3 models with diverse RAG configurations, showcasing consistent improvements across three knowledge-intensive datasets. Code is released as open-source in this https URL.