Summary

오늘의 소식에서는 TimeGPT-1, LANISTR, Mistral-7B-Instruct 모델에 대해 다룹니다. TimeGPT-1은 시계열 예측 및 이상 탐지를 위한 혁신적인 모델이며, LANISTR는 구조화된 데이터와 비구조화된 데이터를 융합하여 다중모드 학습을 가능하게 하는 프레임워크입니다. Mistral-7B-Instruct-v0.3은 개선된 어휘와 기능 호출을 지원하는 대형 언어 모델입니다.

Nixtla: TimeGPT-1

Nixtla: TimeGPT-1 - May 23, 2024

  • TimeGPT-1은 다양한 도메인에서 시계열 데이터를 예측하고 이상 탐지 기능을 제공하는 생산 준비된 생성 사전 학습 변환기입니다.
  • 사용자가 단 몇 줄의 코드로 전기, 금융, IoT 등의 도메인에서 정확한 예측을 할 수 있도록 지원합니다.
  • 설치 및 빠른 시작 가이드, API 사용법 등이 포함된 포괄적인 문서 제공.
  • TimeGPT는 제로샷 추론 능력을 통해 별도의 학습 없이 다양한 시계열 데이터에 즉시 적용 가능.
  • 사용자 정의 손실 함수, 크로스 검증, 예측 간격 제공 등 다양한 기능 지원.

Google: LANISTR - Multimodal learning from structured and unstructured data

Google: LANISTR - May 22, 2024

  • LANISTR는 언어, 이미지, 구조화된 데이터를 융합하여 다중모드 학습을 가능하게 하는 새로운 프레임워크입니다.
  • 구조화된 데이터와 비구조화된 데이터의 융합을 통해 예측 및 분류 정확도를 향상시킵니다.
  • MIMIC-IV 의료 데이터 및 Amazon 리뷰 데이터셋을 사용하여 우수한 성능을 입증.
  • 여러 모달리티에서 누락된 데이터에 대한 견고성을 보여주는 마스킹 기반 학습 전략 사용.
  • 모달리티별 인코더와 다중모달 인코더-디코더 모듈을 포함한 혁신적인 아키텍처.

Hugging Face: Mistral-7B-Instruct-v0.3

Hugging Face: Mistral-7B-Instruct-v0.3 - May 23, 2024

  • Mistral-7B-Instruct-v0.3은 확장된 어휘와 기능 호출을 지원하는 Mistral-7B-v0.3의 명령어 튜닝 버전입니다.
  • 모델 설치 및 다운로드, 채팅 기능, 명령어 따르기, 기능 호출 예제 등 제공.
  • Hugging Face의 transformers 라이브러리를 사용하여 텍스트 생성 가능.
  • 모델은 빠르게 튜닝되어 높은 성능을 발휘하며, 적절한 가드레일 설정을 위한 커뮤니티와의 협업을 기대.
Sources This GPT assists users by creating a detailed daily newspaper in Korean based on provided links. It follows these steps: read the content, summarize each content with detailed points, and write a report. The report format is: # AI News for (today's date), ## Summary (overall short summary), ## Link1 Title, link, date - detailed summary1, - detailed summary2, - detailed summary..N, ## Link2 Title, link, date - detailed summary1, - detailed summary2, - detailed point..N, etc. The report should be written in Korean and use the 개조식 문체 style. give the very deep details for each link as much as possible.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
###
https://github.com/Nixtla/nixtla
# Nixtla   [![Tweet](https://img.shields.io/twitter/url/http/shields.io.svg?style=social)](https://twitter.com/intent/tweet?text=Statistical%20Forecasting%20Algorithms%20by%20Nixtla%20&url=https://github.com/Nixtla/neuralforecast&via=nixtlainc&hashtags=StatisticalModels,TimeSeries,Forecasting)  [![Slack](https://img.shields.io/badge/Slack-4A154B?&logo=slack&logoColor=white)](https://join.slack.com/t/nixtlacommunity/shared_invite/zt-1pmhan9j5-F54XR20edHk0UtYAPcW4KQ)

<div align="center">
<img src="https://raw.githubusercontent.com/Nixtla/neuralforecast/main/nbs/imgs_indx/logo_new.png">
<h1 align="center">TimeGPT-1 </h1>
<h3 align="center">The first foundation model for forecasting and anomaly detection</h3>

[![CI](https://github.com/Nixtla/nixtla/actions/workflows/ci.yaml/badge.svg?branch=main)](https://github.com/Nixtla/nixtla/actions/workflows/ci.yaml)
[![PyPi](https://img.shields.io/pypi/v/nixtla?color=blue)](https://pypi.org/project/nixtla/)
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://github.com/Nixtla/nixtla/blob/main/LICENSE)
[![docs](https://img.shields.io/website-up-down-green-red/http/docs.nixtla.io/.svg?label=docs)](https://docs.nixtla.io)
[![Downloads](https://pepy.tech/badge/nixtla)](https://pepy.tech/project/nixtla)
[![Downloads](https://pepy.tech/badge/nixtla/month)](https://pepy.tech/project/nixtla)
[![Downloads](https://pepy.tech/badge/nixtla/week)](https://pepy.tech/project/nixtla)
[![fern shield](https://img.shields.io/badge/%F0%9F%8C%BF-SDK%20generated%20by%20Fern-brightgreen)](https://buildwithfern.com/?utm_source=nixtla/nixtla/readme)

**TimeGPT** is a production ready, generative pretrained transformer for time series. It's capable of accurately predicting various domains such as retail, electricity, finance, and IoT with just a few lines of code 🚀. </div>

## 📖 Table of Contents
- [Quick Start](#-quick-start)
- [Installation](#install-nixtlas-sdk)
- [Forecasting with TimeGPT](#forecast-using-timegpt-in-3-easy-steps)
- [Anomaly Detection](#anomaly-detection-using-timegpt-in-3-easy-steps)
- [Zero-shot Results](#️-zero-shot-results)
- [How to Cite](#-how-to-cite)
- [Features and Mentions](#-features-and-mentions)
- [License](#-license)
- [Get in Touch](#-get-in-touch)


## 🚀 Quick Start

https://github.com/Nixtla/nixtla/assets/4086186/163ad9e6-7a16-44e1-b2e9-dab8a0b7b6b6



### Install nixtla's SDK
python
pip install nixtla>=0.5.1


### Import libraries and load data
python
import pandas as pd
from nixtla import NixtlaClient

### Forecast using TimeGPT in 3 easy steps
python
# Get your API Key at dashboard.nixtla.io

# 1. Instantiate the NixtlaClient
nixtla_client = NixtlaClient(api_key = 'YOUR API KEY HERE')

# 2. Read historic electricity demand data
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv')

# 3. Forecast the next 24 hours
fcst_df = nixtla_client.forecast(df, h=24, level=[80, 90])

# 4. Plot your results (optional)
nixtla_client.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value', level=[80, 90])


![Forecast Results](./nbs/img/forecast_readme.png)

### Anomaly detection using TimeGPT in 3 easy steps
python
# Get your API Key at dashboard.nixtla.io

# 1. Instantiate the NixtlaClient
nixtla_client = NixtlaClient(api_key = 'YOUR API KEY HERE')

# 2. Read Data # Wikipedia visits of NFL Star (
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/peyton_manning.csv')


# 3. Detect Anomalies
anomalies_df = nixtla_client.detect_anomalies(df, time_col='timestamp', target_col='value', freq='D')

# 4. Plot your results (optional)
nixtla_client.plot(df, anomalies_df,time_col='timestamp', target_col='value')

![AnomalyDetection](nbs/img/anomaly.png)

## 🤓 API support for other languages
Explore our [API Reference](https://docs.nixtla.io) to discover how to leverage TimeGPT across various programming languages including JavaScript, Go, and more.

## 🔥 Features and Capabilities

- **Zero-shot Inference**: TimeGPT can generate forecasts and detect anomalies straight out of the box, requiring no prior training data. This allows for immediate deployment and quick insights from any time series data.

- **Fine-tuning**: Enhance TimeGPT's capabilities by fine-tuning the model on your specific datasets, enabling the model to adapt to the nuances of your unique time series data and improving performance on tailored tasks.

- **API Access**: Integrate TimeGPT seamlessly into your applications via our robust API. Upcoming support for Azure Studio will provide even more flexible integration options. Alternatively, deploy TimeGPT on your own infrastructure to maintain full control over your data and workflows.

- **Add Exogenous Variables**: Incorporate additional variables that might influence your predictions to enhance forecast accuracy. (E.g. Special Dates, events or prices)

- **Multiple Series Forecasting**: Simultaneously forecast multiple time series data, optimizing workflows and resources.

- **Custom Loss Function**: Tailor the fine-tuning process with a custom loss function to meet specific performance metrics.

- **Cross Validation**: Implement out of the box cross-validation techniques to ensure model robustness and generalizability.

- **Prediction Intervals**: Provide intervals in your predictions to quantify uncertainty effectively.

- **Irregular Timestamps**: Handle data with irregular timestamps, accommodating non-uniform interval series without preprocessing.

## 📚 Documentation with examples and use cases

Dive into our [comprehensive documentation](https://docs.nixtla.io/docs/getting-started-timegpt_quickstart) to discover examples and practical use cases for TimeGPT. Our documentation covers a wide range of topics, including:

- **Getting Started**: Begin with our user-friendly [Quickstart Guide](https://docs.nixtla.io/docs/getting-started-timegpt_quickstart) and learn how to [set up your API key](https://docs.nixtla.io/docs/getting-started-setting_up_your_api_key) effortlessly.

- **Advanced Techniques**: Master advanced forecasting methods and learn how to enhance model accuracy with our tutorials on [anomaly detection](https://docs.nixtla.io/docs/tutorials-anomaly_detection), fine-tuning models using specific loss functions, and scaling computations across distributed frameworks such as [Spark, Dask, and Ray](https://docs.nixtla.io/docs/tutorials-computing_at_scale).

- **Specialized Topics**: Explore specialized topics like [handling exogenous variables](https://docs.nixtla.io/docs/tutorials-holidays_and_special_dates), model validation through [cross-validation](https://docs.nixtla.io/docs/tutorials-cross_validation), and strategies for [forecasting under uncertainty](https://docs.nixtla.io/docs/tutorials-uncertainty_quantification).

- **Real-World Applications**: Uncover how TimeGPT is applied in real-world scenarios through case studies on [forecasting web traffic](https://docs.nixtla.io/docs/use-cases-forecasting_web_traffic) and [predicting Bitcoin prices](https://docs.nixtla.io/docs/use-cases-bitcoin_price_prediction).



## 🗞️ TimeGPT-1: Revolutionizing Forecasting and Anomaly Detection

Time series data is pivotal across various sectors, including finance, healthcare, meteorology, and social sciences. Whether it's monitoring ocean tides or tracking the Dow Jones's daily closing values, time series data is crucial for forecasting and decision-making.

Traditional analysis methods such as ARIMA, ETS, MSTL, Theta, CES, machine learning models like XGBoost and LightGBM, and deep learning approaches have been standard tools for analysts. However, TimeGPT introduces a paradigm shift with its standout performance, efficiency, and simplicity. Thanks to its zero-shot inference capability, TimeGPT streamlines the analytical process, making it accessible even to users with minimal coding experience.

TimeGPT is user-friendly and low-code, enabling users to upload their time series data and either generate forecasts or detect anomalies with just a single line of code. As the only foundation model for time series analysis out of the box, TimeGPT can be integrated via our public APIs, through Azure Studio (coming soon), or deployed on your own infrastructure.

## ⚙️ TimeGPT's Architecture
Self-attention, the revolutionary concept introduced by the paper “Attention is all you need“, is the basis of the this foundational model. The TimeGPT model is not based on any existing large language model(LLMs). It is independently trained on vast timeseries dataset as a large transformer model and is designed so as to minimize the forecasting error.

The architecture consists of an encoder-decoder structure with
multiple layers, each with residual connections and layer normalization. Finally, a linear layer maps the decoder’s output to the forecasting window dimension. The general intuition is that attentionbased mechanisms are able to capture the diversity of past events and correctly extrapolate potential
future distributions.

![Arquitecture](nbs/img/forecast.png)


TimeGPT was trained on, to our knowledge, the largest collection of publicly available time series,
collectively encompassing over 100 billion data points. This training set incorporates time series
from a broad array of domains, including finance, economics, demographics, healthcare, weather,
IoT sensor data, energy, web traffic, sales, transport, and banking. Due to this diverse set of domains,
the training dataset contains time series with a wide range of characteristics

For the Zero-shot Results section of your README, you can enhance the clarity and effectiveness by focusing on emphasizing the key findings and their implications, while also making the text more concise and digestible. Here's a refined version:

---

## ⚡️ Zero-shot Results
### Accuracy:
TimeGPT has been tested for its zero-shot inference capabilities on more than 300K unique series, which involve using the model without additional fine-tuning on the test dataset. TimeGPT outperforms a comprehensive range of well-established statistical and cutting-edge deep learning models, consistently ranking among the top three performers across various frequencies.

### Ease of use:
TimeGPT also excels by offering simple and rapid predictions using a pre-trained model. This stands in stark contrast to other models that typically require an extensive training and prediction pipeline.

![Results](nbs/img/results.jpg)

### Efficiency and Speed:
For zero-shot inference, our internal tests recorded an average GPU inference speed of 0.6 milliseconds per series for TimeGPT, which nearly mirrors that of the simple Seasonal Naive.

## 📝 How to cite?

If you find TimeGPT useful for your research, please consider citing the associated [paper](https://arxiv.org/abs/2310.03589):


@misc{garza2023timegpt1,
title={TimeGPT-1},
author={Azul Garza and Max Mergenthaler-Canseco},
year={2023},
eprint={2310.03589},
archivePrefix={arXiv},
primaryClass={cs.LG}
}


## 🎉 Features and Mentions
TimeGPT has been featured in many publications and has been recognized for its innovative approach to time series forecasting. Here are some of the features and mentions:

- [TimeGPT Revolutionizing Time Series Forecasting](https://www.analyticsvidhya.com/blog/2024/02/timegpt-revolutionizing-time-series-forecasting/)
- [TimeGPT: The First Foundation Model for Time Series Forecasting](https://towardsdatascience.com/timegpt-the-first-foundation-model-for-time-series-forecasting-bf0a75e63b3a)
- [TimeGPT: Revolutionising Time Series Forecasting with Generative Models](https://medium.com/@22meera99/timegpt-revolutionising-time-series-forecasting-with-generative-models-86be6c09fa51)
- [TimeGPT on Turing Post](https://www.turingpost.com/p/timegpt)
- [TimeGPT Presentation at AWS Events](https://www.youtube.com/watch?v=5pYkT0rTCfE&ab_channel=AWSEvents)
- [TimeGPT: Machine Learning for Time Series Made Accessible - Podcast](https://podcasts.apple.com/bg/podcast/timegpt-machine-learning-for-time-series-made-accessible/id1487704458?i=1000638551991)
- [TimeGPT on The Data Exchange](https://thedataexchange.media/timegpt/)
- [How TimeGPT Transforms Predictive Analytics with AI](https://hackernoon.com/how-timegpt-transforms-predictive-analytics-with-ai)
- [TimeGPT: The First Foundation Model - AI Horizon Forecast](https://aihorizonforecast.substack.com/p/timegpt-the-first-foundation-model)


## 🔖 License
TimeGPT is closed source. However, this SDK is open source and available under the Apache 2.0 License. Feel free to contribute.


###
https://research.google/blog/lanistr-multimodal-learning-from-structured-and-unstructured-data/
LANISTR: Multimodal learning from structured and unstructured data
May 22, 2024

Sayna Ebrahimi, Research Scientist, and Yihe Dong, Software Engineer, Cloud AI Team

LANISTR is a new framework that enables multimodal learning by ingesting unstructured (image, text) and structured (time series, tabular) data, performing alignment and fusion, and ultimately generating class predictions.

Recent multimodal learning breakthroughs have predominantly focused on unstructured data, spanning vision, language, video, and audio modalities (Flamingo, PaLI, CLIP, VATT, etc.). However, learning joint representations with structured data, including tabular or time-series formats, remains relatively underexplored, despite structured data being the prevalent data type in the real world. Real-world scenarios often demand the integration of structured and unstructured data, for example, in healthcare diagnostics or retail demand forecasting. This highlights the need to learn two seemingly disparate data types together in a multimodal fashion, using a unified architecture and unique pretraining strategies that align structured and unstructured modalities.

Unlocking the potential benefits of multimodal learning with structured and unstructured data requires addressing two challenges that become increasingly prominent as the number of modalities, input size, and data heterogeneity increase. First, as the input feature dimensionality and heterogeneity increase, deep neural networks can become susceptible to overfitting and suboptimal generalization, particularly when trained on datasets of limited scale. This challenge is exacerbated when using unstructured and structured data together, such as time series data that often exhibit non-stationary behavior (fashion trends, sensory measurements, etc.), which, unlike other more independent and identically distributed (i.i.d.) modalities, makes it difficult to build well-generalisable models. Similarly, tabular data often include numerous columns (features) containing minimal information, leading to overfitting to spurious correlations. Second, problems caused by the absence of some modalities become more pronounced in multimodal data with more than two modalities (e.g., image+text+tabular+time series), where each sample may not include some modalities. To the best of our knowledge, a systematic study addressing these challenges in learning from unstructured and structured data remains absent from current literature.

To address these challenges, in “LANISTR: Multimodal Learning from Structured and Unstructured Data”, we introduce a novel framework to learn from LANguage, Image, and STRuctured data. LANISTR enables multimodal learning by ingesting unstructured (image, text) and structured (time series, tabular) data, performing alignment and fusion, and ultimately generating predictions. Using two publicly available healthcare and retail datasets, LANISTR demonstrates remarkable improvements when fine-tuned with 0.1% and 0.01% of labeled data, respectively. Notably, these improvements are observed even with a very high ratio of samples (35.7% and 99.8%, respectively) that don’t contain all modalities, underlining the robustness of LANISTR to practical missing modality challenges.

Model architecture
LANISTR’s architecture is composed of modality-specific encoders and a multimodal encoder-decoder module, which acts as the fusion mechanism. First, raw inputs are encoded with a language encoder, an image encoder, and a structured data encoder. Depending on the dataset, we can have two separate structured data encoders, one for tabular data and one for time-series data. These modality-specific encoders are all chosen to be attention-based architectures.

After the inputs are encoded, we project them using modality-specific encoders with a single layer projection head and concatenate their embeddings together before feeding them into the multimodal fusion module.

A common bottleneck when working with multimodal data is extracting meaningful representations that reflect cross-modal interactions between individual modalities. We leverage cross-attention, which has been predominantly used to capture cross-modal relationships, when creating a fusion encoder with six Transformer layers.

The figure below illustrates the LANISTR architecture using a toy example from a retail application. The goal is to predict the star rating a product will receive. In this example, the product is a can of dog food (image), accompanied by a user review (text), numerical and categorical specifications (tabular features), and the user's purchase history (time sequence). LANISTR integrates these different modalities to produce a star rating prediction.

LANISTR enables multimodal learning by ingesting unstructured (image, text) and structured (time series, tabular) data, performing alignment and fusion, and ultimately generating predictions.

The core of LANISTR's methodology is rooted in masking-based training applied across both unimodal and multimodal levels. LANISTR is pre-trained with two types of objectives:

Unimodal masking objectives.
We use masked language, image, time series, and tabular features modeling as a general self-supervised learning strategy for all the unimodal encoders in LANISTR. This allows the utilization of data with missing modalities for unimodal encoders, since masked inputs are fed to encoders, a form of reconstruction or prediction task can be used for training.
Similarity-based multimodal masking loss.
Prior work on multimodal learning with vision and language, such as FLAVA, focuses on reconstructing one modality (e.g., text) or both image and text modalities from the masked multimodal inputs. In this work, we propose a novel masked multimodal learning loss that maximizes the similarities between masked and unmasked multimodal data representations. This objective resembles an idea that originated from Siamese networks, where the goal is to maximize the similarity between two augmented versions of an image. However, in our framework, the goal is to maximize the similarity between the embeddings generated by a masked and a non-masked input. As shown below, this objective encourages the model to learn cross-modal relations, such that the cosine similarity between the embeddings of a masked and a non-masked data is maximized.
LANSITR-img4
Illustration of similarity-based multimodal masking objective in LANISTR. The goal is to maximize the similarity between the embeddings of a masked and a non-masked input.

After pre-training, we use pre-trained weights to initialize both the unimodal encoders and the multimodal encoder. A multi-layer classification module is then attached to the multimodal encoder for the downstream task. The LANISTR model comprises 300M parameters. During fine-tuning, we maintain the unimodal encoders in a frozen state while concentrating on training the multimodal encoder and the classification module. This accounts for training approximately 15% of the entire architecture. It's worth noting that LANISTR’s versatility extends to other tasks, such as regression or retrieval, by incorporating suitable heads and objective functions, provided labeled data is accessible.

Results
We compare LANISTR’s performance against various competitive baselines, including AutoGluon, ALBEF, and MedFuse, using MIMIC-IV (a widely-used medical dataset for clinical prediction tasks) and Amazon Review Data. With its novel architecture and objective functions, LANISTR achieves state-of-the art results on several challenging tasks.

The plot below highlights the results for mortality prediction using the MIMIC-IV dataset. LANISTR achieves 87.37% in area under the receiver operating characteristic curve (AUROC) on average, significantly outperforming baseline models FLAVA and CoCa, which can only use image and text, and the MedFuse model, which only uses image and time series modalities. The late fusion baseline is a simple fusion mechanism that concatenates all three modality embeddings.

LANSITR-img3
AUROC for in-hospital mortality prediction using the MIMIC-IV dataset.

For predicting product ratings using the Amazon Review dataset, we pre-train methods that can use unlabeled data (LANISTR and ALBEF) from the office products category and fine-tune them using the beauty products category. LANISTR outperforms competitive baselines by a significant margin, achieving an average of 76.27% accuracy. Notably, even without pre-training, LANISTR's unique fusion mechanism surpasses both late fusion and AutoGluon, neither of which support pre-training. For ALBEF, we explored a "Tab2Txt'' approach that incorporates tabular features as additional text input, while the original ALBEF baseline only utilized image and text modalities. We demonstrate that both are significantly outperformed by LANISTR. Our results confirm the importance of learning structured and unstructured data using unlabeled and labeled data together.

Ablation studies and the particular challenges of these tasks illustrate LANISTR’s ability to actively ingest all modalities as they are, take advantage of large quantities of unlabeled data during unsupervised pre-training, and handle missing modalities seamlessly.

LANSITR-img1
Results using the Amazon Review dataset for star rating prediction tasks on the beauty products category.

Conclusion
LANISTR is a novel framework for language, image, and structured data (tabular and time series). With its unimodal and novel similarity-based multimodal masking strategy, LANISTR tackles challenges including missing modalities and limited labeled data, and achieves state-of-the-art performance across diverse domains.

###
https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3
---
license: apache-2.0
---
# Model Card for Mistral-7B-Instruct-v0.3

The Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.3.

Mistral-7B-v0.3 has the following changes compared to [Mistral-7B-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/edit/main/README.md)
- Extended vocabulary to 32768
- Supports v3 Tokenizer
- Supports function calling

## Installation

It is recommended to use `mistralai/Mistral-7B-Instruct-v0.3` with [mistral-inference](https://github.com/mistralai/mistral-inference). For HF transformers code snippets, please keep scrolling.


pip install mistral_inference


## Download

py
from huggingface_hub import snapshot_download
from pathlib import Path
mistral_models_path = Path.home().joinpath('mistral_models', '7B-Instruct-v0.3')
mistral_models_path.mkdir(parents=True, exist_ok=True)
snapshot_download(repo_id="mistralai/Mistral-7B-Instruct-v0.3", allow_patterns=["params.json", "consolidated.safetensors", "tokenizer.model.v3"], local_dir=mistral_models_path)


### Chat

After installing `mistral_inference`, a `mistral-chat` CLI command should be available in your environment. You can chat with the model using


mistral-chat $HOME/mistral_models/7B-Instruct-v0.3 --instruct --max_tokens 256


### Instruct following

py
from mistral_inference.model import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
model = Transformer.from_folder(mistral_models_path)
completion_request = ChatCompletionRequest(messages=[UserMessage(content="Explain Machine Learning to me in a nutshell.")])
tokens = tokenizer.encode_chat_completion(completion_request).tokens
out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
print(result)


### Function calling

py
from mistral_common.protocol.instruct.tool_calls import Function, Tool
from mistral_inference.model import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
model = Transformer.from_folder(mistral_models_path)
completion_request = ChatCompletionRequest(
tools=[
Tool(
function=Function(
name="get_current_weather",
description="Get the current weather",
parameters={
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
)
)
],
messages=[
UserMessage(content="What's the weather like today in Paris?"),
],
)
tokens = tokenizer.encode_chat_completion(completion_request).tokens
out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
print(result)


## Generate with `transformers`

If you want to use Hugging Face `transformers` to generate text, you can do something like this.

py
from transformers import pipeline
messages = [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
{"role": "user", "content": "Who are you?"},
]
chatbot = pipeline("text-generation", model="mistralai/Mistral-7B-Instruct-v0.3")
chatbot(messages)


## Limitations

The Mistral 7B Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance.
It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to
make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.