Summary

Google Research에서는 Gemma 2를 발표하였습니다. Gemma 2는 9B 및 27B 파라미터 크기로 제공되며, 각각 13조 및 8조 토큰으로 훈련되었습니다. 이 모델은 Meta Llama 3 70B와 유사한 성능을 보여줍니다. Meta에서는 Meta LLM Compiler를 공개하였습니다. 이는 Code Llama 기반으로 추가적인 코드 최적화와 컴파일러 기능을 포함합니다. OpenAI는 TIME과의 전략적 콘텐츠 파트너십을 발표하였으며, Anthropic에서는 Claude.ai에 프로젝트 기능을 추가하였습니다. 또한, Hugging Face는 새로운 오픈 LLM 리더보드를 공개하였고, FineWeb 데이터셋에 관한 논문을 발표하였습니다.

Google, Gemma 2 발표

링크, 2024년 6월 28일,
Google Research

  • Gemma 2 모델 발표, 9B 및 27B 파라미터 크기로 제공
  • 각각 13조 및 8조 토큰으로 훈련
  • 27B 모델은 Meta Llama 3 70B와 성능 경쟁 가능
  • 첫 Chatbot Arena 평가에서 Gemma2 27B는 Anthropic Claude 3 Sonnet, Llama 3 70B, OpenAI GPT-4와 유사한 성능 기록
  • 9B 모델은 71.3 MMLU, 52.8 AGIEval, 40.2 HumanEval 점수 기록
  • 27B 모델은 75.2 MMLU, 55.1 AGIEval, 51.8 HumanEval 점수 기록
  • 상업적 사용 가능, Hugging Face에서 제공
  • Google TPUv5e에서 훈련, 효율적인 추론 성능 제공
  • 슬라이딩 윈도우 어텐션, 로짓 소프트캡핑 및 그룹화된 쿼리 어텐션(GQA) 기능 포함
  • Google Cloud에서 간편한 1클릭 배포 지원

META, Meta LLM Compiler 발표

링크, 2024년 6월 28일,
META

  • Meta LLM Compiler 모델 발표, 코드 크기 최적화 및 디스어셈블리 작업에서 최첨단 결과 달성
  • GPT-4보다 코드 크기 개선 및 디스어셈블리 성능 우수
  • 두 가지 모델 제공: LLM Compiler, LLM Compiler FTD
  • LLM Compiler: LLVM-IR, x86_84, ARM, CUDA 어셈블리 코드 5000억 토큰으로 사전 훈련
  • LLM Compiler FTD: LLVM 어셈블리 코드 최적화 및 디스어셈블리 예측을 위해 추가 훈련
  • 상업적 사용 가능, 연구 및 상업적 용도로 제공
  • 컴파일러 완벽하게 에뮬레이션하는 비율 20%

OpenAI, TIME과의 전략적 콘텐츠 파트너십

링크, 2024년 6월 27일,
OpenAI

  • TIME의 신뢰할 수 있는 저널리즘 콘텐츠에 대한 접근 제공
  • 101년간의 아카이브 콘텐츠 포함
  • OpenAI 제품에서 콘텐츠 인용 및 원본 링크 제공
  • TIME은 OpenAI 기술을 활용하여 새로운 제품 개발
  • OpenAI는 TIME의 피드백을 통해 저널리즘 제공 방식 개선

Anthropic, Claude.ai 프로젝트 기능 추가

링크, 2024년 6월 26일,
Anthropic

  • Claude.ai Pro 및 Team 사용자에게 프로젝트 기능 제공
  • 200K 컨텍스트 윈도우로 관련 문서, 코드, 인사이트 추가 가능
  • 사용자 정의 지침 설정 가능, 예: 더 공식적인 어조 사용
  • Artifacts 기능으로 콘텐츠 생성 및 실시간 미리보기 제공
  • 팀 내 공유 활동 피드로 협업 강화
  • 프로젝트 기능은 팀의 내부 지식을 활용하여 Claude의 출력을 강화

오픈 LLM 리더보드 2 공개

링크, 2024년 6월 28일,
Hugging Face

  • 새로운 벤치마크 도입: MMLU-Pro, GPQA, MuSR, MATH, IFEval, BBH
  • 성능 순위 개선 및 새로운 Gradio 컴포넌트 제공
  • Qwen2 72B Instruct 모델이 상위 차지
  • 커뮤니티 투표 시스템 도입
  • 향상된 재현성, 델타 웨이트 및 챗 템플릿 지원

FineWeb 데이터셋 발표

링크, 2024년 6월 25일,
Hugging Face

  • FineWeb 데이터셋: 96개의 Common Crawl 스냅샷에서 15조 토큰으로 구성
  • FineWeb-Edu: 교육용 텍스트 필터링된 1.3조 토큰 데이터셋
  • LLM 성능 개선, 다양한 공개 벤치마크에서 우수한 성능 기록
  • 데이터셋 및 데이터 큐레이션 코드베이스 공개
  • 중복 제거 및 필터링 전략에 대한 심도 있는 연구 포함

Infiniflow, RAGFlow 공개

링크, 2024년 6월 25일,
Infiniflow

  • 오픈소스 RAG 엔진, 깊은 문서 이해 기반 지식 추출
  • 다양한 형식의 비구조화 데이터 지원
  • 품질 기반 질문 응답 기능 제공
  • 간편한 RAG 워크플로우, 개인 및 대기업에 적합
  • 다중 리콜 및 재순위 매기기 기능 제공
  • 템플릿 기반 청킹, 인간의 개입 허용
Sources

This GPT assists users by creating a detailed daily newspaper in Korean based on provided links. It follows these steps: read the content, summarize each content with detailed points, and write a report. The report format is:

(today’s date in 년 월 일) AI 소식,

Summary

(overall short summary, make summary with good details. for Summary section, explain the details starting with company name, e.g. OpenAI에서는 ~~~를 발표하였습니다.)

Title,

한글제목

링크, date,
company name

  • detailed summary1, (개조식 문체 사용)
  • detailed summary2, (개조식 문체 사용)
  • detailed summary N, (개조식 문체 사용)

Title,

한글제목

링크, date,
company name

  • detailed summary1, (개조식 문체 사용)
  • detailed summary2, (개조식 문체 사용)
  • detailed summary N, (개조식 문체 사용)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
###
https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315
June 28, 2024
Google Research
Gemma 2 released! Google just released the next iteration of its open LLM! Gemma 2 comes in two sizes, 9B & 27B, trained on 13T tokens. Gemma 2 27B approaches Meta Llama 3 70B performance! First Chatbot Arena evals place Gemma2 27B around Anthropic Claude 3 Sonnet, Llama 3 70B, and OpenAI GPT-4. 🤯
What's new with Gemma 2:
🧮 9B & 27B Instruction and base version with 8192 context window
🔠 Trained on 13T tokens (27B) and 8T tokens (9B)
🆕 Sliding window attention, logit soft-capping and Grouped-Query Attention (GQA)
🥇 9B scores 71.3 MMLU; 52.8 AGIEval; 40.2 HumanEval
🏆 27B scores 75.2 MMLU; 55.1 AGIEval; 51.8 HumanEval
✅ Commercial use allowed
🧬 Used SFT, Distillation, RLHF & Model Merging.
🧠 Trained on Google TPUv5e
🤗 Available on Hugging Face
🔜 1-click deployment to Google Cloud from Hugging Face

DEVELOPERS

Gemma 2 is now available to researchers and developers
Jun 27, 2024

4 min read

Gemma 2 offers best-in-class performance, runs at incredible speed across different hardware and easily integrates with other AI tools.

C
Clement Farabet
VP of Research, Google DeepMind
T
Tris Warkentin
Director, Google DeepMind
Share
AI has the potential to address some of humanity's most pressing problems — but only if everyone has the tools to build with it. That's why earlier this year we introduced Gemma, a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. We’ve continued to grow the Gemma family with CodeGemma, RecurrentGemma and PaliGemma — each offering unique capabilities for different AI tasks and easily accessible through integrations with partners like Hugging Face, NVIDIA and Ollama.

Now we’re officially releasing Gemma 2 to researchers and developers globally. Available in both 9 billion (9B) and 27 billion (27B) parameter sizes, Gemma 2 is higher-performing and more efficient at inference than the first generation, with significant safety advancements built in. In fact, at 27B, it offers competitive alternatives to models more than twice its size, delivering the kind of performance that was only possible with proprietary models as recently as December. And that’s now achievable on a single NVIDIA H100 Tensor Core GPU or TPU host, significantly reducing deployment costs.

A new open model standard for efficiency and performance
We built Gemma 2 on a redesigned architecture, engineered for both exceptional performance and inference efficiency. Here’s what makes it stand out:

Outsized performance: At 27B, Gemma 2 delivers the best performance for its size class, and even offers competitive alternatives to models more than twice its size. The 9B Gemma 2 model also delivers class-leading performance, outperforming Llama 3 8B and other open models in its size category. For detailed performance breakdowns, check out the technical report.
Unmatched efficiency and cost savings: The 27B Gemma 2 model is designed to run inference efficiently at full precision on a single Google Cloud TPU host, NVIDIA A100 80GB Tensor Core GPU, or NVIDIA H100 Tensor Core GPU, significantly reducing costs while maintaining high performance. This allows for more accessible and budget-friendly AI deployments.
Blazing fast inference across hardware: Gemma 2 is optimized to run at incredible speed across a range of hardware, from powerful gaming laptops and high-end desktops, to cloud-based setups. Try Gemma 2 at full precision in Google AI Studio, unlock local performance with the quantized version with Gemma.cpp on your CPU, or try it on your home computer with an NVIDIA RTX or GeForce RTX via Hugging Face Transformers.
A chart showing Gemma 2 performance benchmarks
Built for developers and researchers
Gemma 2 is not only more powerful, it's designed to more easily integrate into your workflows:

Open and accessible: Just like the original Gemma models, Gemma 2 is available under our commercially-friendly Gemma license, giving developers and researchers the ability to share and commercialize their innovations.
Broad framework compatibility: Easily use Gemma 2 with your preferred tools and workflows thanks to its compatibility with major AI frameworks like Hugging Face Transformers, and JAX, PyTorch and TensorFlow via native Keras 3.0, vLLM, Gemma.cpp, Llama.cpp and Ollama. In addition, Gemma is optimized with NVIDIA TensorRT-LLM to run on NVIDIA-accelerated infrastructure or as an NVIDIA NIM inference microservice, with optimization for NVIDIA’s NeMo to come. You can fine-tune today with Keras and Hugging Face. We are actively working to enable additional parameter-efficient fine-tuning options.1
Effortless deployment: Starting next month, Google Cloud customers will be able to easily deploy and manage Gemma 2 on Vertex AI.
Explore the new Gemma Cookbook, a collection of practical examples and recipes to guide you through building your own applications and fine-tuning Gemma 2 models for specific tasks. Discover how to easily use Gemma with your tooling of choice, including for common tasks like retrieval-augmented generation.

Responsible AI development
We're committed to providing developers and researchers with the resources they need to build and deploy AI responsibly, including through our Responsible Generative AI Toolkit. The recently open-sourced LLM Comparator helps developers and researchers with in-depth evaluation of language models. Starting today, you can use the companion Python library to run comparative evaluations with your model and data, and visualize the results in the app. Additionally, we’re actively working on open sourcing our text watermarking technology, SynthID, for Gemma models.

When training Gemma 2, we followed our robust internal safety processes, filtering pre-training data and performing rigorous testing and evaluation against a comprehensive set of metrics to identify and mitigate potential biases and risks. We publish our results on a large set of public benchmarks related to safety and representational harms.

A chart showing Gemma 2 safety evaluations
Projects built with Gemma
Our first Gemma launch led to more than 10 million downloads and countless inspiring projects. Navarasa, for instance, used Gemma to create a model rooted in India’s linguistic diversity.

Developing for Indic languages: Gemma and Navarasa
3:14
Now, Gemma 2 will help developers get even more ambitious projects off the ground, unlocking new levels of performance and potential in their AI creations. We'll continue to explore new architectures and develop specialized Gemma variants to tackle a wider range of AI tasks and challenges. This includes an upcoming 2.6B parameter Gemma 2 model, designed to further bridge the gap between lightweight accessibility and powerful performance. You can learn more about this upcoming release in the technical report.

###
https://huggingface.co/collections/facebook/llm-compiler-667c5b05557fe99a9edd25cb
June 28, 2024
META
Today we’re releasing Meta LLM Compiler, a family of models built on Meta Code Llama with additional code optimization and compiler capabilities. The models achieve state-of-the-art results on optimization of code size and disassembly tasks.


LLM Compiler can emulate the compiler, predict optimal passes for code size, and disassemble code. It can be fine-tuned for new optimizations and compiler tasks. This work shows that AI is learning to optimize code and can assist compiler experts in identifying opportunities to apply optimizations. We believe this work could have an impact ranging from use in optimization for individual developer environments to inclusion in a compiler such as LLVM.
We’re releasing LLM Compiler 7B & 13B models under a permissive license for both research and commercial use in the hopes of making it easier for developers and researchers alike to leverage this in their work and carry forward new research in this highly impactful space.

WAIT, it's not over; Meta just dropped the LLM Compiler! 🧑‍💻
> Beats GPT-4 on code size improvement and disassembly
> Achieves 77% of the optimising potential of an autotuning search and 45% disassembly round trip 🔥
> Built on top of CodeLLaMa with improved code optimisation and compiler reasoning.
> Allows commercial use
Two model types:
> LLM Compiler: the foundational models, pre-trained on over 500B tokens of LLVM-IR, x86_84, ARM, and CUDA assembly codes and trained to predict the effect of LLVM optimisations
>LLM Compiler FTD, which is further fine-tuned to predict the best optimisations for code in LLVM assembly to reduce code size and disassemble assembly code to LLVM-IR
> Perfectly emulating the compiler 20% of the time ⚡

###
https://openai.com/index/strategic-content-partnership-with-time/
June 27, 2024
OpenAI
Strategic Content Partnership with TIME
New access to current and historic content from TIME's extensive archives from the last 101 years to enhance OpenAI products and display in response to user inquiries.

Time > Hero > Media > Asset
Today, TIME and OpenAI announced a multi-year content deal and strategic partnership to bring TIME's trusted journalism to OpenAI’s products, including ChatGPT.

Through this collaboration, OpenAI will gain access to current and historic content from TIME's extensive archives from the last 101 years to enhance its products and display in response to user inquiries—featuring a citation and link back to the original source on Time.com. The new partnership furthers TIME’s commitment to expanding global access to accurate and trusted information.

"Throughout our 101-year history, TIME has embraced innovation to ensure that the delivery of our trusted journalism evolves alongside technology," said TIME Chief Operating Officer Mark Howard. "This partnership with OpenAI advances our mission to expand access to trusted information globally as we continue to embrace innovative new ways of bringing TIME’s journalism to audiences globally.”

“We’re partnering with TIME to make it easier for people to access news content through our AI tools, and to support reputable journalism by providing proper attribution to original sources,” said Brad Lightcap, Chief Operating Officer of OpenAI.

The partnership will also enable TIME to gain access to OpenAI's technology to develop new products for its audiences, along with the opportunity to provide vital feedback and share practical applications to refine and enhance the delivery of journalism in ChatGPT and other OpenAI products and shape the future of news experiences.

###
OpenAI
June 26, 2024
We're sharing an update on the advanced Voice Mode we demoed during our Spring Update, which we remain very excited about:
We had planned to start rolling this out in alpha to a small group of ChatGPT Plus users in late June, but need one more month to reach our bar to launch. For example, we’re improving the model’s ability to detect and refuse certain content. We’re also working on improving the user experience and preparing our infrastructure to scale to millions while maintaining real-time responses.
As part of our iterative deployment strategy, we'll start the alpha with a small group of users to gather feedback and expand based on what we learn. We are planning for all Plus users to have access in the fall. Exact timelines depend on meeting our high safety and reliability bar. We are also working on rolling out the new video and screen sharing capabilities we demoed separately, and will keep you posted on that timeline.
ChatGPT’s advanced Voice Mode can understand and respond with emotions and non-verbal cues, moving us closer to real-time, natural conversations with AI. Our mission is to bring these new experiences to you thoughtfully.

###
https://www.anthropic.com/news/projects
Collaborate with Claude on Projects
2024년 6월 26일
Anthorphic

3 min read
Illustration of individuals collaborating around Claude logo
Our vision for Claude has always been to create AI systems that work alongside people and meaningfully enhance their workflows. As a step in this direction, Claude.ai Pro and Team users can now organize their chats into Projects, bringing together curated sets of knowledge and chat activity in one place—with the ability to make their best chats with Claude viewable by teammates. With this new functionality, Claude can enable idea generation, more strategic decision-making, and exceptional results.

Projects are available on Claude.ai for all Pro and Team customers, and can be powered by Claude 3.5 Sonnet, our latest release which outperforms its peers on a wide variety of benchmarks. Each project includes a 200K context window, the equivalent of a 500-page book, so users can add all of the relevant documents, code, and insights to enhance Claude’s effectiveness.

Avoid the cold start problem
Projects allow you to ground Claude’s outputs in your internal knowledge—be it style guides, codebases, interview transcripts, or past work. This added context enables Claude to provide expert assistance across tasks, from writing emails like your marketing team to writing SQL queries like a data analyst.

App screen showing a user uploading docs to Claude.ai
In addition, you can define custom instructions for each Project to further tailor Claude’s responses, including instructing Claude to use a more formal tone or answer questions from the perspective of a specific role or industry. With Projects, you can get started much faster and extend your skills further for any task.

App screen showing custom instructions
Create side-by-side with Claude
Artifacts help you better work with Claude by helping you see, edit, and build with Claude. Simply ask Claude to generate content like code snippets, text documents, graphics, diagrams, or website designs, and Artifacts appear in a dedicated window alongside your conversation.

Artifacts especially enhance Claude’s coding capabilities for developers, offering a larger code window and live previews for frontends that streamline reviews. Join the feature preview for Artifacts in Claude.ai via the account menu on the left-side panel.

App screen that shows the Artifacts panel alongside the user chat
Spark inspiration through sharing
Claude Team users can also share snapshots of their best conversations with Claude into your team’s shared project activity feed. Activity feeds help each teammate get inspired around different ways to work with Claude, and helps the entire team uplevel their skills working with AI.

App screen showing shared chats within a Project
Sharing work products that were co-created with Claude can improve innovation in areas like product development and research, where bringing together organizational knowledge from across the company can produce higher-quality outputs.


Customer spotlight: North Highland
At North Highland, a leading change and transformation consultancy, hundreds of employees across consulting, business development, and marketing teams use Claude to work better. From writing proposals to analyzing complex documents like 10-Ks, teams use Claude to enhance and scale their expert services.

The Claude Team plan is transforming our way of working at North Highland. Claude is a truly exceptional writer that has helped our team complete content creation and analysis tasks up to 5x faster than before—turning what was once two weeks of writing and research into minutes of work. With Claude, we’re future-proofing our workforce, finding more excitement in daily challenges, and leaping into the future of AI-assisted collaboration and creativity.
Luka Anic, Senior Director of Technical AI Program and Product Manager at North Highland

The future of work with Claude
These latest features around shared knowledge and collaboration integrate Claude into your existing team processes, enabling you to save time and elevate your work. By harnessing Claude’s accuracy and advanced coding and writing capabilities, Projects can amplify your team’s potential. Additionally, as part of our commitment to user privacy, any data or chats shared within Projects will not be used to train our generative models without a user’s explicit consent.

In the coming months, we’ll continue making Claude easier to use while expanding the types of project knowledge you can bring to Claude via native integrations with popular applications and tools. We’re excited to see how your team works with Claude.

###
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
Open LLM Leaderboard 2 released! Evaluating LLMs is not easy. Finding new ways to compare LLM fairly, transparently, and reproducibly is important! Benchmarks are not perfect, but they give us a first understanding of how well models perform and where their strengths are.
What's new?!
📈 New benchmarks with MMLU-Pro, GPQA, MuSR, MATH, IFEval and BBH.
📊 Improved ranking with normalized scores adjusted to baselines
🏆 Qwen2 72B Instruct > Meta Llama 3 70B Instruct > Cohere Command R+
⚡ Faster, simpler Interface with a new Gradio component.
🛠️ Enhanced reproducibility with support for delta weights and chat templates
⭐ Introduction of "maintainer's highlight" and “community voting system”

###
https://huggingface.co/papers/2406.17557
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Published on Jun 25
·
Submitted by
philschmid
on Jun 26
#1 Paper of the day
Authors:

Guilherme Penedo
,

Hynek Kydlíček
,

Loubna Ben allal
,

Anton Lozhkov
,

Margaret Mitchell
,

Colin Raffel
,

Leandro Von Werra
,

Thomas Wolf
Abstract
The performance of a large language model (LLM) depends heavily on the quality and size of its pretraining dataset. However, the pretraining datasets for state-of-the-art open LLMs like Llama 3 and Mixtral are not publicly available and very little is known about how they were created. In this work, we introduce FineWeb, a 15-trillion token dataset derived from 96 Common Crawl snapshots that produces better-performing LLMs than other open pretraining datasets. To advance the understanding of how best to curate high-quality pretraining datasets, we carefully document and ablate all of the design choices used in FineWeb, including in-depth investigations of deduplication and filtering strategies. In addition, we introduce FineWeb-Edu, a 1.3-trillion token collection of educational text filtered from FineWeb. LLMs pretrained on FineWeb-Edu exhibit dramatically better performance on knowledge- and reasoning-intensive benchmarks like MMLU and ARC. Along with our datasets, we publicly release our data curation codebase and all of the models trained during our ablation experiments.

###
https://github.com/infiniflow/ragflow
💡 What is RAGFlow?
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.

🌟 Key Features
🍭 "Quality in, quality out"
Deep document understanding-based knowledge extraction from unstructured data with complicated formats.
Finds "needle in a data haystack" of literally unlimited tokens.
🍱 Template-based chunking
Intelligent and explainable.
Plenty of template options to choose from.
🌱 Grounded citations with reduced hallucinations
Visualization of text chunking to allow human intervention.
Quick view of the key references and traceable citations to support grounded answers.
🍔 Compatibility with heterogeneous data sources
Supports Word, slides, excel, txt, images, scanned copies, structured data, web pages, and more.
🛀 Automated and effortless RAG workflow
Streamlined RAG orchestration catered to both personal and large businesses.
Configurable LLMs as well as embedding models.
Multiple recall paired with fused re-ranking.
Intuitive APIs for seamless integration with business.