Cohere에서는 새로운 모델 C4AI Command R+ 08-2024를 발표하여 멀티스텝 도구 사용과 다국어 지원을 포함한 고급 기능을 제공합니다. Qwen에서는 Qwen2-VL을 출시하여 복잡한 이미지 및 비디오 이해, 다국어 지원, 그리고 실시간 대화 기능을 강화했습니다. Salesforce는 대규모 행동 모델인 xLAM 시리즈를 공개하며, NVIDIA는 NV-Embed-v2로 MTEB 리더보드 1위를 탈환했습니다. 여러 연구 논문이 대형 언어 모델의 평가 방법과 성능에 대한 중요한 기술적 인사이트를 제공했으며, Gartner는 판매 혁신을 이끌 주요 기술들을 분석하여 발표했습니다.

Cohere, Model Card for C4AI Command R+ 08-2024

링크, 8/30/24

  • C4AI Command R+ 08-2024는 1040억 개의 파라미터를 갖춘 대형 언어 모델로, Retrieval Augmented Generation(RAG)과 다단계 도구 사용을 지원하여 복잡한 작업을 자동화하는 데 최적화됨.
  • 이 모델은 Grouped Query Attention(GQA) 방식을 사용하여 최대 2배 높은 처리량과 낮은 지연 시간을 제공함.
  • 23개 언어로 학습되고, 한국어를 포함한 10개 언어로 평가되어 다양한 다국어 환경에서 활용 가능.
  • 최대 128K 길이의 컨텍스트를 지원하여 장문의 텍스트 처리에서도 높은 성능을 유지함.
  • 모델은 Transformer 라이브러리와 호환되어 손쉽게 사용 가능하며, 허깅 페이스 허브에서 모델 체크포인트를 제공함.

Qwen, Qwen2-VL: To See the World More Clearly

링크, 8/29/24

  • Qwen2-VL은 비전 언어 모델 Qwen2를 기반으로 한 최신 버전으로, 다양한 해상도와 비율의 이미지를 효과적으로 처리할 수 있는 SoTA(State-of-the-Art) 성능을 자랑.
  • 비디오 이해 기능이 대폭 향상되어 최대 20분 이상의 비디오를 분석, 질문에 답하거나 실시간 대화를 유지할 수 있음.
  • Qwen2-VL은 Naive Dynamic Resolution 지원을 통해 다양한 해상도의 이미지를 동적으로 처리하며, Multimodal Rotary Position Embedding(M-ROPE) 기법을 도입하여 1D 텍스트, 2D 이미지, 3D 비디오 정보의 위치 정보를 동시에 캡처하고 통합함.
  • 7B 모델은 비용 효율성을 유지하면서도 이미지, 다중 이미지, 비디오 입력을 지원하여 다양한 작업에서 우수한 성능을 발휘함.
  • 작은 크기의 2B 모델도 출시되어 모바일 배포를 염두에 두고 최적화되었으며, 비디오 관련 작업과 문서 이해, 일반적인 시나리오 질문 응답에서 우수한 성능을 보임.

Salesforce, Large Action Models xLAM-7B

링크, 8/29/24

  • Salesforce의 xLAM 시리즈는 AI 에이전트의 의사 결정과 사용자 의도를 실행 가능한 행동으로 변환하는 대규모 행동 모델로, 7B, 8x7B, 8x22B 등 다양한 파라미터 크기로 제공됨.
  • 이 모델들은 사용자의 복잡한 요구를 충족시키기 위해 자율적으로 계획하고 작업을 실행할 수 있는 능력을 갖춤.
  • 최대 64K의 컨텍스트 길이를 지원하여 장문의 대화나 복잡한 작업에서도 높은 성능을 유지.
  • 이 모델은 Transformers 라이브러리와 통합되어 AI 에이전트 구축에 용이하게 사용 가능.

NVIDIA, NV-Embed-v2

링크, 9/1/24

  • NV-Embed-v2는 Massive Text Embedding Benchmark(MTEB)에서 72.31점이라는 기록적인 점수를 달성하며 56개의 텍스트 임베딩/검색 작업에서 1위를 차지.
  • 모델은 Latent-Attention Pooling 기법을 사용하여 임베딩 출력을 개선하고, 하드 네거티브 마이닝을 통해 잘못된 네거티브 샘플을 제거하여 성능을 향상시킴.
  • NV-Embed-v2는 Mistral-7B-v0.1 기반의 디코더-온리 LLM을 사용하여 4096 차원의 임베딩을 생성하며, 특히 RAG 기술 개발에 필수적인 검색 작업에서 탁월한 성과를 보임.

Gartner, Gartner Hype Cycle Reveals Top Technologies That Will Transform Sales In the Next Decade

링크, 8/28/24

  • Gartner는 2024년 판매 기술 하이프 사이클을 통해 향후 10년간 판매를 혁신할 25가지의 주요 기술을 발표, 이들 기술은 자율형 AI, 개발자 생산성 향상, 총체적 경험, 인간 중심의 보안 및 개인정보 보호 등 네 가지 트렌드로 분류됨.
  • 자율형 AI:
    • AI 시스템이 인간의 감독 없이 스스로 학습하고 복잡한 환경에서 효과적으로 의사 결정을 내릴 수 있도록 개발 가속화.
    • 이 기술에는 다중 에이전트 시스템, 대규모 행동 모델, 자율 에이전트 등이 포함됨.
  • 개발자 생산성 향상:
    • AI 증강 소프트웨어 엔지니어링, 클라우드 네이티브, 프롬프트 엔지니어링 등의 기술이 개발자의 생산성을 극대화함.
  • 총체적 경험:
    • 고객 경험, 직원 경험 등을 통합하여 우수한 공유 경험을 창출하고, 이를 통해 신뢰도, 만족도, 충성도를 향상.
  • 인간 중심의 보안 및 개인정보 보호:
    • AI 트리즘, 디지털 면역 시스템 등 신뢰 기반의 보안 기술을 통해 조직의 보안 구조를 강화하고, 프라이버시 문제를 해결할 수 있는 방안을 제시.
  • Emotion AI:
    • 감정 AI는 사용자의 감정 상태를 분석하여, 판매 팀이 고객과 더 깊이 공감할 수 있도록 도와줌.
    • 이 기술은 프라이버시 및 윤리적 문제 해결이 필요한 과제를 포함.
  • Digital Twin of a Customer (DToC):
    • 고객의 행동을 시뮬레이션하고 예측할 수 있는 가상 모델로, 맞춤형 서비스 제공 및 고객 경험 개선을 지원.
    • 이러한 기술 도입을 위해서는 고도의 머신러닝 알고리즘과 데이터 과학 인력이 필요.
  • Machine Sellers:
    • 비인간 에이전트가 판매를 자동화하여 효율성을 크게 증가시키며, 특히 반복적인 수익 창출 모델에서 강력한 효과를 발휘.
    • 산업별, 지역별, 비즈니스 모델별로 도입 효과 차이가 있을 것으로 예상.

연구 논문, Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models

링크, 8/5/24

  • 연구는 대형 언어 모델(LLM)에서 구조화된 형식(JSON, XML 등)으로 출력할 때 성능 저하를 초래할 수 있음을 확인.
  • 형식 제한은 특히 추론 능력에 부정적인 영향을 미치며, 형식 제약이 클수록 성능 저하가 두드러짐.
  • Gemini 1.5 Flash 모델이 형식 간 일관성에서 가장 우수한 성능을 보임.
  • 연구는 다양한 데이터셋(GSM8K, Last Letter, DDXPlus 등)을 사용하여 분석 수행.

연구 논문, Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators

링크, 3/26/24

  • Pairwise-preference Search(PairS) 방법을 도입하여 LLM 평가에서 인간 판단과의 정렬 문제를 해결하고, 평가의 정확성과 일관성을 크게 향상시킴.
  • PairS는 여러 텍스트 간 쌍별 비교를 통해 평가 대상을 순위화하는 방법으로, 기존의 Win-loss 비율이나 ELO 레이팅 시스템보다 효율적이며, 약 30%의 비교만으로 유사한 성능을 달성 가능.
  • 연구는 Spearman 상관계수에서 G-Eval, Win-loss rate, ELO rating보다 높은 성

능을 기록하며, 코드와 예제를 공개하여 연구 재현성을 보장.

Sources

This GPT assists users by creating a detailed daily newspaper in Korean based on provided links. It follows these steps: read the content, summarize each content with detailed points, and write a report. The report format is:

(today’s date in 년 월 일) AI 소식,

Summary

(overall short summary, make summary with good details. for Summary section, explain the details starting with company name, e.g. OpenAI에서는 ~~~를 발표하였습니다.)

company name, Title

링크, date

  • detailed summary1, (개조식 문체 사용)
  • detailed summary2, (개조식 문체 사용)
  • detailed summary N, (개조식 문체 사용)

company name, Title

링크, date

링크, date,

  • detailed summary1, (개조식 문체 사용)
  • detailed summary2, (개조식 문체 사용)
  • detailed summary N, (개조식 문체 사용)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
###
https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024
8/30/24
Cohere

Model Card for C4AI Command R+ 08-2024
Model Summary
C4AI Command R+ 08-2024 is an open weights research release of a 104B billion parameter model with highly advanced capabilities, this includes Retrieval Augmented Generation (RAG) and tool use to automate sophisticated tasks. The tool use in this model generation enables multi-step tool use which allows the model to combine multiple tools over multiple steps to accomplish difficult tasks. C4AI Command R+ 08-2024 is a multilingual model trained on 23 languages and evaluated in 10 languages. Command R+ 08-2024 is optimized for a variety of use cases including reasoning, summarization, and question answering.

C4AI Command R+ 08-2024 is part of a family of open weight releases from Cohere For AI and Cohere. Our smaller companion model is C4AI Command R 08-2024.

Point of Contact: Cohere For AI: cohere.for.ai
License: CC-BY-NC, requires also adhering to C4AI's Acceptable Use Policy
Model: c4ai-command-r-plus-08-2024
Model Size: 104 billion parameters
Context length: 128K

Cohere just dropped updated Command R plus & Command R - built for RAG and tool use, multilingual (23 languages), grounded generation, 128K content and much more! 🔥
> Command R Plus - 104B param, Command R - 35B param
> up-to 2x higher throughput & 2x lower latency
> Uses Grouped Query Attention (GQA)
> SFT + preference tuned model
> Massive 128K context window
> Trained in 23 languages, evaluated on 10
> Capable of code rewrites, explanations and snippets.
> Supports citation, tool execution, and structured outputs for grounded agentic use
> Model checkpoint available on the hub
> Works out of the box with transformers 🤗
This is a massive improvement from the last iteration of Command R+ (which is still one of my favourites on Hugging Chat).

###
https://qwenlm.github.io/blog/qwen2-vl/
Qwen2-VL: To See the World More Clearly
August 29, 2024
· 17 min · 3569 words · Qwen Team | Translations:
简体中文

DEMO GITHUB HUGGING FACE MODELSCOPE API DISCORD

After a year’s relentless efforts, today we are thrilled to release Qwen2-VL! Qwen2-VL is the latest version of the vision language models based on Qwen2 in the Qwen model familities. Compared with Qwen-VL, Qwen2-VL has the capabilities of:

SoTA understanding of images of various resolution & ratio: Qwen2-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc.

Understanding videos of 20min+: Qwen2-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc.

Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions.

Multilingual Support: to serve global users, besides English and Chinese, Qwen2-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc.


We opensource Qwen2-VL-2B and Qwen2-VL-7B with Apache 2.0 license, and we release the API of Qwen2-VL-72B! The opensource is integrated to Hugging Face Transformers, vLLM, and other third-party frameworks. Hope you enjoy!

Performance
We evaluate our model’s visual capabilities across six key dimensions: complex college-level problem-solving, mathematical abilities, document and table comprehension, multilingual text-image understanding, general scenario question-answering, video comprehension, and agent-based interactions. Overall, our 72B model showcases top-tier performance across most metrics, often surpassing even closed-source models like GPT-4o and Claude 3.5-Sonnet. Notably, it demonstrates a significant edge in document understanding.


At the 7B scale, we’ve managed to retain support for image, multi-image, and video inputs, delivering competitive performance in a more cost-effective model size. Specifically, our model excels in document understanding tasks such as DocVQA and in multilingual text understanding from images, as assessed by MTVQA, establishing state-of-the-art performance.


Additionally, we’re excited to introduce a smaller 2B model, optimized for potential mobile deployment. Despite its compact size, this model boasts strong performance in image, video, and multilingual comprehension. It particularly shines in video-related tasks, document understanding, and general scenario question-answering when compared to other models of similar scale.


Model Capabilities
1. Enhanced Recognition Capabilities
Qwen2-VL now boasts improved object recognition, extending beyond plants and landmarks to comprehend complex relationships between multiple objects in a scene. We’ve also significantly boosted the model’s ability to recognize handwritten text and multiple languages within images, making it more accessible to users worldwide.
2. Visual Reasoning: Solving Real-World Problems
In this iteration, we have significantly enhanced Qwen2-VL’s mathematical and coding proficiencies. The model is not only capable of solving problems by analyzing pictures but can also interpret and solve complex mathematical problems through chart analysis. Extremely aspect-ratio-distorted images can also be correctly interpreted. Additionally, we have reinforced the model’s capability to extract information from real-world images and charts and improved its instruction-following skills. This fusion of visual perception and logical reasoning empowers the model to tackle practical issues, bridging the gap between abstract concepts and tangible solutions.

3. Video Understanding and Live Chat
Beyond static images, Qwen2-VL extends its prowess to video content analysis. It can summarize video content, answer questions related to it, and maintain a continuous flow of conversation in real-time, offering live chat support. This functionality allows it to act as a personal assistant, helping users by providing insights and information drawn directly from video content.
4. Visual Agent Capabilities: Function Calling and Visual Interactions.
Qwen2-VL demonstrates strong potential as a visual agent, facilitating interactions similar to human perceptions of the world.

The model facilitates Function Calling, enabling it to harness external tools for real-time data retrieval – be it flight statuses, weather forecasts, or package tracking – by deciphering visual cues. This integration of visual interpretation with functional execution elevates its utility, making it a powerful tool for information management and decision-making.

Visual Interactions represent a significant stride towards mimicking human perception. By allowing the model to engage with visual stimuli akin to human senses, we’re pushing the boundaries of AI’s ability to perceive and respond to its environment. This capability paves the way for more intuitive and immersive interactions, where Qwen2-VL acts not just as an observer, but an active participant in our visual experiences.
model is unable to extract audio from videos, and its knowledge is only up to date as of June 2023. Additionally, the model cannot guarantee complete accuracy when processing complex instructions or scenarios, and it is relatively weak in tasks involving counting, character recognition, and 3D spatial awareness.

Model Architecture
Overall, we’ve continued with the Qwen-VL architecture, which leverages a Vision Transformer (ViT) model and Qwen2 language models. For all these variants, we utilized a ViT with approximately 600M parameters, designed to handle both image and video inputs seamlessly. To further enhance the model’s ability to effectively perceive and comprehend visual information in videos, we introduced several key upgrades:

A key architectural improvement in Qwen2-VL is the implementation of Naive Dynamic Resolution support. Unlike its predecessor, Qwen2-VL can handle arbitrary image resolutions, mapping them into a dynamic number of visual tokens, thereby ensuring consistency between the model input and the inherent information in images. This approach more closely mimics human visual perception, allowing the model to process images of any clarity or size.

Another key architectural enhancement is the innovation of Multimodal Rotary Position Embedding (M-ROPE). By deconstructing the original rotary embedding into three parts representing temporal and spatial (height and width) information,M-ROPE enables LLM to concurrently capture and integrate 1D textual, 2D visual, and 3D video positional information.

License
Both the opensource Qwen2-VL-2B and Qwen2-VL-7B are under Apache 2.0.

Qwen 2VL 7B & 2B are here - Apache 2.0 licensed smol Vision Language Models competitive with GPT 4o mini - w/ video understanding, function calling and more! 🔥
> 72B (to be released later) beats 3.5 Sonnet & GPT 4o
> Can understand up to 20 min of video
> Handles arbitrary image resolutions
> Multimodal RoPE to capture 1D, 2D & 3D information
> Enhanced Recognition capabilities - can understand complex relationships b/w objects
> Better Visual Reasoning & video understanding w/ live chat
> Function calling for tools + data access
> Integrated with Transformers! 🤗
> Model checkpoints on the Hub
Kudos to the Alibaba Qwen group; I'm a huge fan of the Qwen series, especially their multilingual capabilities! Looking forward to the 72B ⚡

###
https://huggingface.co/Salesforce/xLAM-7b-r
8/29/24
Salseforce

Let's go.. Salesforce released Large Action Models xLAM - 7B, 8x7B, 8x22B, up to 64K context length primed for AI agents use-cases! 🔥
LAMs are designed to enhance decision-making and translate user intentions into executable actions.
Integrated with Transformers 🤗
Welcome to the xLAM model family! Large Action Models (LAMs) are advanced large language models designed to enhance decision-making and translate user intentions into executable actions that interact with the world. LAMs autonomously plan and execute tasks to achieve specific goals, serving as the brains of AI agents. They have the potential to automate workflow processes across various domains, making them invaluable for a wide range of applications. The model release is exclusively for research purposes. A new and enhanced version of xLAM will soon be available exclusively to customers on our Platform.


###
https://huggingface.co/nvidia/NV-Embed-v2
NVIDIA
9/1/24
We have reclaimed #1 on the MTEB Leaderboard 🏆 Our NV-Embed-v2, has achieved a record-breaking score of 72.31 across 56 text embedding/retrieval tasks, reclaiming the top spot on the Massive Text Embedding Benchmark (MTEB) leaderboard. It also holds the No. 1 in the retrieval sub-category (15 tasks) in the leaderboard, which is essential to the development of RAG technology.
We present NV-Embed-v2, a generalist embedding model that ranks No. 1 on the Massive Text Embedding Benchmark (MTEB benchmark)(as of Aug 30, 2024) with a score of 72.31 across 56 text embedding tasks. It also holds the No. 1 in the retrieval sub-category (a score of 62.65 across 15 tasks) in the leaderboard, which is essential to the development of RAG technology.

NV-Embed-v2 presents several new designs, including having the LLM attend to latent vectors for better pooled embedding output, and demonstrating a two-staged instruction tuning method to enhance the accuracy of both retrieval and non-retrieval tasks. Additionally, NV-Embed-v2 incorporates a novel hard-negative mining methods that take into account the positive relevance score for better false negatives removal.

For more technical details, refer to our paper: NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models.

Model Details
Base Decoder-only LLM: Mistral-7B-v0.1
Pooling Type: Latent-Attention
Embedding Dimension: 4096

###
https://huggingface.co/papers/2408.02442
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models
Published on Aug 5
Authors:
Zhi Rui Tam
,
Cheng-Kuang Wu
,
Yi-Lin Tsai
,
Chieh-Yen Lin
,
Hung-yi Lee
,
Yun-Nung Chen
Abstract
Structured generation, the process of producing content in standardized formats like JSON and XML, is widely utilized in real-world applications to extract key output information from large language models (LLMs). This study investigates whether such constraints on generation space impact LLMs' abilities, including reasoning and domain knowledge comprehension. Specifically, we evaluate LLMs' performance when restricted to adhere to structured formats versus generating free-form responses across various common tasks. Surprisingly, we observe a significant decline in LLMs' reasoning abilities under format restrictions. Furthermore, we find that stricter format constraints generally lead to greater performance degradation in reasoning tasks.
Structured Prompting is a key requirement for building real-world LLM applications or agents, but does it harm the performance and ability to reason? 🤔 ”Let Me Speak Freely” studies the impact of structured formats (JSON, XML, YAML) versus generating free-form responses across various common tasks. 👀
Methods: Constraint Decoding (JSON-Mode), Format-Restricting Instructions (FRI, ”respond in JSON”), NL-to-Format (2 step, first NL answer → covert to JSON)
Datasets: GSM8K, Last Letter, Shuffled Objects, DDXPlus, Sports, Task280, and MultiFin
Models: Gemini 1.5 Flash, Claude 3.5 Haiku, GPT-3.5-Turbo, Llama 3 8B or Gemma 2 9B
Insights
🧠 Reasoning abilities can decline under format restrictions
🚀 Gemini 1.5 Flash achieved the highest consistency across formats
📄 Gemini, Llama 3, Gemma 2 work best with JSON format
📉 Claude 3 Haiku showed a big performance drop for JSON but not for XML
⚠️ Adding concrete schema constraints can decrease performance
🔍 Classification: JSON-Mode performs equally, if not better
📊 Classification: JSON-Mode performs better than FRI (instructed) for open models
⬇️ Reasoning: JSON-Mode performs worse than other methods
💪 Reasoning: NL-to-Format has the strongest results after NL.

###
https://huggingface.co/papers/2403.16950
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Published on Mar 26
Authors:
Yinhong Liu
,
Han Zhou
,
Zhijiang Guo
,
Ehsan Shareghi
,
Ivan Vulić
,
Anna Korhonen
,
Nigel Collier
Abstract
Large Language Models (LLMs) have demonstrated promising capabilities as automatic evaluators in assessing the quality of generated natural language. However, LLMs still exhibit biases in evaluation and often struggle to generate coherent evaluations that align with human assessments. In this work, we first conduct a systematic study of the misalignment between LLM evaluators and human judgement, revealing that existing calibration methods aimed at mitigating biases are insufficient for effectively aligning LLM evaluators. Inspired by the use of preference data in RLHF, we formulate the evaluation as a ranking problem and introduce Pairwise-preference Search (PairS), an uncertainty-guided search method that employs LLMs to conduct pairwise comparisons and efficiently ranks candidate texts. PairS achieves state-of-the-art performance on representative evaluation tasks and demonstrates significant improvements over direct scoring. Furthermore, we provide insights into the role of pairwise preference in quantifying the transitivity of LLMs and demonstrate how PairS benefits from calibration.
How can we improve the reliability of LLM Evaluators? “The Role of Pairwise Preference in Large Language Model Evaluators” compares existing evaluation approaches and shows how to perform robustness evaluations using Pairwise-preference Search (PAIRS)! 👀
PAIRS compares multiple preference pairs of generated texts and then ranks candidates globally, like a tournament or tree, to find the best one. It improves the robustness of evaluation and is also more efficient than win-loss or ELO rating systems. “PAIRS-greedy typically requires only about 30% of the comparisons to achieve performance similar to ELO rating.”
⚖️ PAIRS trades compute/cost for more precision and robustness
🧮 Example: 16 candidates per data point lead to 16 * 15 / 2 = 120 comparisons, or if greedy is used, then 36.
🚀 PAIRS outperforms G-Eval, Win-loss rate, and ELO rating on Spearman correlations
🧑🏻‍💻 Code and examples available
📝 Used simple generic prompts for comparison.

###
https://www.gartner.com/en/newsroom/press-releases/2024-08-27-gartner-hype-cycle-reveals-top-technologies-that-will-transform-sales-in-the-next-decade
STAMFORD, Conn, August 28, 2024
Gartner

Gartner Hype Cycle Reveals Top Technologies That Will Transform Sales In the Next Decade
Emotion AI, Digital Twin of a Customer and Machine Sellers Will Have the Biggest Impact on Sales Organizations
Emotion AI, machine sellers and digital twin of a customer will have a transformative impact on the sales function in the next decade, according to Gartner, Inc. However, sales leaders must closely navigate the hype to evaluate when these technologies will be appropriate for their organization to implement.

The Gartner Hype Cycle for Revenue and Sales Technology, 2024 distills key insights that Gartner profiles each year into a succinct set of “must-know” emerging technologies. These technologies have potential to deliver transformational benefits over the next two to 10 years (see Figure 1).

“The common theme of these three technologies are their ability to predict, interpret and serve buyers’ needs and behaviors and to streamline and automate sales fulfillment, releasing sellers to focus on developing high value client relationships,” said Guy Wood, Senior Director Analyst in the Gartner Sales practice.“In order to gain a competitive advantage, sales operations leaders need to be looking on the horizon for technologies not currently in the mainstream.”

Figure 1: Hype Cycle for Revenue and Sales Technology, 2024

Source: (Gartner, August 2024)

Emotion AI

Emotion AI, which is at the Peak of Inflated Expectations, uses AI and software techniques to analyze the emotional state of a user via computer vision, audio/voice input, sensors and/or software logic. Emotion AI turns human behavioral attributes into data. By enabling sales teams to utilize data to actively learn from and empathize with the customer, emotion AI is poised to dramatically change the sales function.

“Emotion AI has already been widely adopted in contact centers, but the sales function has yet to fully realize the technology’s potential,” said Wood. “However, CSOs must navigate privacy concerns and bias, which may be a barrier to successful adoption. For example, privacy and ethics challenges surround psychological profiling, especially when applied to consumers, recruitment prospects or protected individuals like minors. ”

Digital Twin of a Customer

A digital twin of a customer (DToC) is a dynamic virtual mirror representation of a customer that organizations can use to simulate, emulate and anticipate behavior. DToCs, currently at the Innovation Trigger, help organizations better understand their customers and provide a personalized, empathetic service to customers, many of whose buying habits repeatedly change.

“DToCs can transform the way organizations sell products or services and provide customers with better experiences, which will result in increased revenue and lasting customer relationships," said Wood. “DToC can be an engine of transformation and disruption. Organizations need competency in machine learning algorithms and staff with data science skills to build or manage DToCs”

Machine Sellers

Machine sellers, at the early stage of the Innovation Trigger, are nonhuman agents that automate end-to-end selling actions on behalf of human sellers, or a sales organization, to sell products and services in exchange for payment. Currently, machine sellers can be used to facilitate simple and transactional sales.

“Sales organizations deploying machine sellers will gain a competitive advantage by satisfying buyer preferences for seamless purchases and ‘locking-in’ recurring revenue. Organizations that do not adopt machine sellers will risk wasting resources, decreasing efficiency and missing revenue goals,” said Wood. “However, the impact of machine sellers will not be evenly distributed; it will vary by vertical industry, geography and business model”

Gartner clients can read more in “Hype Cycle for Revenue and Sales Technology, 2024”

About Gartner for Sales Leaders

Gartner for Sales Leaders provides heads of sales and their teams with the insights, advice and tools they need to address mission-critical priorities amid mounting pressures to drive growth through new and existing customers. With extensive qualitative and quantitative research, Gartner for Sales Leaders helps sales teams combat commoditization and price-based purchasing, develop critical manager and seller skills, elevate the value of sales interactions, unlock existing growth potential, and optimize sales force enablement. Follow news and update from the Gartner Sales practice on X and LinkedIn using #GartnerSales. Members of the media can find additional information and insights in the Gartner Sales Newsroom.
2024 신기술 하이프 사이클 발표...”자율형 AI 등장 가속화”

가트너가 2024년 신기술 하이프 사이클(Hype Cycle for Emerging Technologies) 보고서를 통해 주목해야 할 25가지의 혁신 기술을 발표했다. 이 기술들은 ▲자율형 AI ▲개발자 생산성 ▲총체적 경험 ▲인간 중심의 보안 및 개인정보 보호 프로그램 등 네 가지 주요 트렌드로 분류된다.

아룬 찬드라세카란 가트너 수석 VP 애널리스트는 “기반 모델에 대한 기대감에서 ROI를 창출하는 사용 사례로 비즈니스 초점이 이동하고 있다. 생성형 AI는 부풀려진 기대의 정점을 넘어섰으며 자율형 AI의 등장을 가속하고 있다”며 “현재 AI 모델에는 에이전트 기능이 부족하다. AI 연구소들은 목표 달성을 위해 상호 작용할 수 있는 AI 에이전트를 신속하게 출시하고 있으나 개발 과정은 점진적으로 진행될 것”이라고 예측했다.

찬드라세카란 수석 VP 애널리스트는 “AI가 계속해서 주목받고 있는 가운데 CIO와 IT 경영진은 개발, 보안, 고객 및 직원 경험에 혁신적인 잠재력을 가진 신기술을 검토해야 한다”며 “또한 검증되지 않은 기술에 대한 관리, 활용법을 조직의 능력에 맞춰 전략적으로 수립해야 한다”고 조언했다.

신기술 하이프 사이클은 가트너가 발표하는 여러 하이프 사이클 중에서도 독보적인 인사이트를 제공한다. 가트너가 매년 프로파일링하는 2000개 이상의 기술 및 응용 프레임워크에서 핵심적인 인사이트를 도출해 반드시 알아야 할 떠오르는 기술들을 간결하게 정리해 제시하기 때문이다. 해당 기술들은 향후 2년에서 10년간 혁신적인 이점을 제공할 잠재력을 갖춘 것으로 평가된다.

2024년 신기술의 4가지 트렌드는 다음과 같다.

자율형 AI :

AI의 빠른 발전으로 인해 인간의 감독을 최소화하면서, 스스로 작동하고 개선하며 복잡한 환경에서도 효과적인 의사 결정을 내릴 수 있는 자율형 AI 시스템이 탄생하고 있다. 인간이 할 수 있는 모든 작업을 수행할 수 있는 이러한 첨단 AI 시스템은 공상 과학에서 현실로 서서히 다가오고 있다.

이 기술에는 ▲다중 에이전트 시스템 ▲대규모 행동 모델 ▲기계 고객 ▲휴머노이드 작업 로봇 ▲자율 에이전트 ▲강화 학습 등이 포함된다.

개발자 생산성 향상:

개발자 생산성은 코드를 빠르게 작성하는 것 이상의 의미를 갖는다. 이는 개발자의 효과적인 커뮤니케이션과 협업, 집중력, 만족도 등에 영향을 받는다. 개발자의 생산성을 향상시키는 신기술로는 ▲AI 증강 소프트웨어 엔지니어링 ▲클라우드 네이티브 ▲깃옵스 ▲내부 개발자 포털 ▲프롬프트 엔지니어링 ▲웹어셈블리 등이 있다.

찬드라세카란 수석 VP 애널리스트는 “기술은 개발자가 소프트웨어를 설계하고 제공하는 방식을 혁신해 그 어느 때보다 높은 생산성을 나타내고 있다”며 “개발자의 만족도, 협업, 플로우 개선을 통해 이익을 극대화하면서 고품질의 제품을 신속하게 제공할 수 있게 됐다”고 말했다.

총체적 경험을 통한 역량 강화:

총체적 경험은 고객 경험, 직원 경험, 다중 경험, 사용자 경험을 서로 연결해 우수한 공유 경험을 창출하는 전략이다. 신뢰도, 만족도, 충성도, 지지도 향상을 목표로 기술을 사용해 중요한 상호 작용을 해결하고 고객과 직원 모두의 역량을 강화한다. 평가 대상 기술에는 ▲디지털 트윈 ▲공간 컴퓨팅 ▲슈퍼앱 ▲6G 등이 포함된다.

인간 중심의 보안 및 개인정보 보호 제공:

기업은 상호 신뢰의 문화를 조성하고 팀 간에 공유된 위험을 인식하는 보안 및 개인정보 보호 기술을 사용해 더욱 탄력적인 조직으로 거듭날 수 있다. 인간 중심 보안 및 개인정보 보호를 지원하는 떠오르는 기술로는 ▲AI 트리즘 ▲사이버 보안 메시 아키텍처 ▲디지털 면역 시스템 ▲허위 정보 보안 ▲연합 머신러닝 ▲동형 암호가 있다.

찬드라세카란 수석 VP 애널리스트는 “보안 관행이 충분히 안전하고 확실한 방식으로 작동할 것이라는 전제에 의존하는 경우가 많다. 하지만 많은 조직이 보안과 비즈니스 제공 중 하나를 선택해야 할 때 비즈니스 제공을 우선시해 지나치게 엄격한 보안 조치를 우회하는 선택을 종종 한다”며 “인간 중심의 보안 및 개인정보 보호는 조직의 디지털 설계에 긴밀한 보안 및 개인정보 보호 구조를 엮어준다”고 말했다.