2024 Open Source AI Models Analysis——Llama, Qwen, Mistral AI, DeepSeek

本内容同时提供以下语言的翻译：简体中文。

This article introduces the Qwen series models, including Qwen 1.5, Qwen 2, and Qwen 2.5, which were released at different times and offer various model scales, achieving significant progress in performance, multilingual capabilities, context length, and safety. Additionally, Qwen has launched specialized models for vision-language, multimodal reasoning, and audio processing, such as Qwen2-VL, QVQ-72B-Preview, and Qwen2-Audio, further expanding the models’ application scope. The Llama series models, from Llama 3 to Llama 3.1, Llama 3.2, and Llama 3.3, have continuously broken through barriers in parameter scale, context length, and performance, with Llama 3.1 405B version becoming one of the largest open-source large language models. The DeepSeek series models include DeepSeek LLM, DeepSeek-Coder, DeepSeekMath, DeepSeek-VL, DeepSeek-V2, DeepSeek-Coder-V2, DeepSeek-VL2, and DeepSeek-V3, demonstrating exceptional capabilities in multilingual processing, code generation, mathematical reasoning, and vision-language processing, with significant improvements in both performance and efficiency. The Mistral AI series models, including Mistral Large, Mistral Small, Pixtral Large, Mixtral 8x22B, Mistral NeMo, Codestral Mamba, and Mathstral, excel in multilingual reasoning, multimodal processing, programming tasks, and mathematical reasoning, achieving a balance between cost and performance.

Qwen

Qwen 1.5 Series Models

Release Date: February 4, 2024

Model Variants:

Qwen1.5-0.5B
Qwen1.5-1.8B
Qwen1.5-4B
Qwen1.5-7B
Qwen1.5-14B
Qwen1.5-32B
Qwen1.5-72B
Qwen1.5-110B
Qwen1.5-MoE

Detailed Information:
The Qwen 1.5 series models were released during the Lunar New Year, aiming to provide enhanced model performance and improve developer experience. This series includes multiple Base and Chat models of various scales, along with an MoE model. All models support a context length of 32K tokens. Furthermore, the Qwen 1.5 series models demonstrate excellent multilingual capabilities, supporting multiple languages including Arabic, Spanish, French, Japanese, Korean, and Thai.

Qwen 2 Series Models

Release Date: June 7, 2024

Model Variants:

Qwen2-0.5B
Qwen2-1.5B
Qwen2-7B
Qwen2-57B-A14B
Qwen2-72B
Qwen2-VL (capable of understanding long videos for video-based Q&A, dialogue, and content creation)
Qwen2-Audio (accepts audio and text inputs, generating text outputs with features like voice chat without requiring ASR modules)

Detailed Information:
The Qwen 2 series represents a major upgrade from Qwen 1.5, adding support for 27 languages and achieving leading performance across multiple evaluation benchmarks. The series shows significant improvements in code and mathematical capabilities, with context lengths up to 128K tokens (Qwen2-72B-Instruct). Additionally, Qwen 2 series models demonstrate safety performance comparable to GPT-4, significantly outperforming other models.

Qwen 2.5 Series Models

Release Date: September 19, 2024

Model Variants:

Qwen2.5-0.5B
Qwen2.5-1.5B
Qwen2.5-3B
Qwen2.5-7B
Qwen2.5-14B
Qwen2.5-32B
Qwen2.5-72B
Qwen2.5-Coder (1.5B, 7B, 32B)
Qwen2.5-Math (1.5B, 7B, 72B)

Detailed Information:
The Qwen 2.5 series surpasses Llama in performance, particularly in programming (HumanEval 85+) and mathematical abilities (MATH 80+). These models support context lengths up to 128K and can generate up to 8K content. Notably, Qwen2.5-72B outperforms the 405B-parameter Llama3.1-405B in multiple core tasks. The series includes specialized subseries for programming and mathematical tasks: Qwen2.5-Coder and Qwen2.5-Math.

Extended Open Source Models

Qwen2-VL-72B Vision-Language Model:

Release Date: August 29, 2024
Model Variant: Qwen2-VL-72B
Detailed Information: Qwen2-VL-72B is a vision-language model capable of recognizing images of various resolutions and aspect ratios, understanding long videos over 20 minutes, and possessing visual agent capabilities for operating phones and robots. It became the highest-scoring open-source model on the LMSYSChatbotArenaLeaderboard globally.

QVQ-72B-Preview Multimodal Reasoning Model:

Release Date: December 25, 2024
Model Variant: QVQ-72B-Preview
Detailed Information: QVQ-72B-Preview is an open-source multimodal reasoning model that achieved significant breakthroughs in AI visual understanding and complex problem-solving. It scored an impressive 70.3 on the MMMU evaluation and showed significant improvements in mathematics-related benchmarks compared to Qwen2-VL-72B-Instruct.

Qwen2-Audio:

Release Date: August 9, 2024
Model Variants: Qwen2-Audio-7B and Qwen2-Audio-7B-Instruct
Detailed Information: This model accepts audio and text inputs, generating text outputs. Key features include:
1. Voice chat capabilities without requiring ASR modules
2. Audio analysis of speech, sound, and music based on text instructions
3. Support for over 8 languages and dialects, including Chinese, English, Cantonese, French, Italian, Spanish, German, and Japanese

Llama

Llama 3 8B/70B

Release Date: April 18, 2024

Model Variants: Llama 3 8B and Llama 3 70B

Detailed Information:
The Llama 3 series includes pre-trained and instruction-tuned text generation models with 8B and 70B parameters. These models significantly surpass Llama 2 technically, redefining performance standards for large language models.

Llama 3.1 8B/70B/405B

Release Date: July 23, 2024

Model Variants: Llama 3.1 8B, Llama 3.1 70B, and Llama 3.1 405B

Detailed Information:
The Llama 3.1 series includes models with 8B, 70B, and 405B parameters, with maximum context window increased to 128K. Llama 3.1 405B is the largest open-source large language model to date, excelling in multilingual support, reasoning capabilities, and complex mathematical problem-solving.

Llama 3.2 1B/3B/11B/90B

Release Date: September 26, 2024

Model Variants: Llama 3.2 1B, Llama 3.2 3B, Llama 3.2 11B, and Llama 3.2 90B

Detailed Information:
Llama 3.2 released small and medium-sized visual LLMs (11B and 90B), along with lightweight text-only models (1B and 3B) suitable for edge and mobile devices, including both pre-trained and instruction-tuned versions. These models support 128K context length and are optimized for Arm processors, suitable for local summarization, instruction following, and rewriting tasks.

Llama 3.3 70B

Release Date: December 7, 2024

Model Variant: Llama 3.3 70B

Detailed Information:
Llama 3.3 is the latest version in the Llama series, further improving model efficiency and performance. Llama 3.3 has made significant progress in multilingual capabilities, code generation, and complex mathematical problem-solving, reaching a parameter scale of 405B, approaching or surpassing the performance of other leading models in the market.

These open-source model releases reflect Meta’s commitment to an open AI ecosystem, providing powerful tools for researchers and developers, driving the advancement of artificial intelligence technology.

DeepSeek

DeepSeek LLM

Release Date: January 5, 2024

Model Variants: DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat

Detailed Information:
DeepSeek LLM is DeepSeek’s first large model, containing 67 billion parameters. It was trained from scratch on a dataset of 2 trillion tokens covering Chinese and English. DeepSeek LLM 67B Base surpasses Llama2 70B Base in reasoning, coding, mathematics, and Chinese comprehension. DeepSeek LLM 67B Chat excels in coding and mathematics, demonstrating significant generalization capabilities, scoring 65 points on the Hungarian National High School Exam. Additionally, it masters Chinese: DeepSeek LLM 67B Chat outperforms GPT-3.5 in Chinese performance.

DeepSeek-Coder

Release Date: January 25, 2024

Model Variants: DeepSeek Coder versions from 1B to 33B

Detailed Information:
DeepSeek Coder consists of a series of code language models, each trained from scratch on 2 trillion tokens, with the dataset comprising 87% code and 13% Chinese and English natural language. Code model sizes range from 1B to 33B versions. Each model was pre-trained on project-level code corpora using a 16K window size and additional fill-in-the-blank tasks to support project-level code completion and filling. DeepSeek Coder achieves state-of-the-art performance among open-source code models across various programming languages and benchmarks.

DeepSeekMath

Release Date: February 5, 2024

Model Variant: DeepSeekMath 7B

Detailed Information:
DeepSeekMath builds upon DeepSeek-Coder-v1.5 7B, continuing pre-training on mathematics-related tokens extracted from Common Crawl along with natural language and code data, reaching 500 billion tokens in training scale. DeepSeekMath 7B achieved an excellent score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance levels of Gemini-Ultra and GPT-4.

DeepSeek-VL

Release Date: March 11, 2024

Model Variants: DeepSeek-VL 1.3B and 7B models

Detailed Information:
DeepSeek-VL is an open-source vision-language (VL) model that employs a hybrid visual encoder, efficiently processing high-resolution images (1024 x 1024) within a fixed token budget while maintaining relatively low computational overhead. This design ensures the model’s ability to capture key semantic and detailed information across various visual tasks. The DeepSeek-VL series achieves state-of-the-art or competitive performance on a wide range of vision-language benchmarks at the same model size.

DeepSeek-V2

Release Date: May 7, 2024

Model Variant: DeepSeek-V2

Detailed Information:
DeepSeek-V2 is a powerful Mixture of Experts (MoE) language model characterized by cost-effective training and inference. It contains 236 billion total parameters, with 21 billion parameters activated per token. Compared to DeepSeek 67B, DeepSeek-V2 achieves stronger performance while saving 42.5% in training costs, reducing KV cache by 93.3%, and increasing maximum generation throughput by 5.76 times. DeepSeek-V2 was pre-trained on a diverse and high-quality corpus of 8.1 trillion tokens.

DeepSeek-Coder-V2

Release Date: June 17, 2024

Model Variant: DeepSeek-Coder-V2

Detailed Information:
DeepSeek-Coder-V2 is an open-source Mixture of Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Starting from an intermediate checkpoint of DeepSeek-V2, it was further pre-trained on an additional 6 trillion tokens, significantly enhancing DeepSeek-V2’s coding and mathematical reasoning capabilities while maintaining comparable performance in general language tasks. It has made significant progress in code-related tasks, reasoning abilities, and general capabilities. Additionally, DeepSeek-Coder-V2 expanded supported programming languages from 86 to 338 and extended context length from 16K to 128K.

DeepSeek-VL2

Release Date: December 13, 2024

Model Variants: DeepSeek-VL2-Tiny, DeepSeek-VL2-Small, and DeepSeek-VL2

Detailed Information:
DeepSeek-VL2 is an advanced large Mixture of Experts (MoE) vision-language model series, showing significant improvements over its predecessor DeepSeek-VL. DeepSeek-VL2 demonstrates excellent capabilities in various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding. The model series consists of three variants: DeepSeek-VL2-Tiny, DeepSeek-VL2-Small, and DeepSeek-VL2, with 1B, 2.8B, and 4.5B activated parameters respectively. Compared to existing open-source dense models and MoE-based models, DeepSeek-VL2 achieves competitive or state-of-the-art performance with similar or fewer activated parameters.

DeepSeek-V3

Release Date: December 26, 2024

Model Variant: DeepSeek-V3

Detailed Information:
DeepSeek-V3 is a powerful Mixture of Experts (MoE) language model with 671 billion total parameters, activating 37 billion parameters per token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeek MoE architecture, which were well-validated in DeepSeek-V2. Additionally, DeepSeek-V3 pioneered an auxiliary-loss-free load balancing strategy and set multi-token prediction training objectives to enhance performance. The team pre-trained DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by supervised fine-tuning and reinforcement learning phases to fully unlock its potential. Comprehensive evaluations indicate that DeepSeek-V3 surpasses other open-source models and achieves performance comparable to leading closed-source models.

Mistral AI

Mistral Large

Release Date: February 26, 2024

Model Variant: Mistral Large

Detailed Information:
Mistral Large is Mistral AI’s flagship model, featuring powerful reasoning capabilities suitable for complex multilingual reasoning tasks, including text understanding, transformation, and code generation. Mistral Large achieved strong results in common benchmarks, becoming the world’s second-best model available through API (after GPT-4). Mistral Large features:

Multilingual Capabilities: Mastery of English, French, Spanish, German, and Italian, with detailed understanding of grammar and cultural context
32K tokens context window: Enabling precise information recall from large documents
Precise instruction following: Allowing developers to design their content moderation policies
Native function calling capabilities: Enabling large-scale application development and technology stack modernization when combined with restricted output mode implemented on la Plateforme

Mistral Small

Release Date: September 18, 2024

Model Variant: Mistral Small

Detailed Information:
Mistral Small is an optimized model targeting latency and cost optimization. Mistral Small outperforms Mixtral 8x7B in performance with lower latency, making it a refined middle-ground solution between open weights products and flagship models. Mistral Small features:

22B parameters: Providing a middle ground between Mistral NeMo 12B and Mistral Large 2
32,768 vocabulary size: Supporting richer language expression
Function calling support: Enabling more complex interactions with internal code, APIs, or databases
128k sequence length: Allowing processing of longer text inputs

Pixtral Large

Release Date: November 18, 2024

Model Variant: Pixtral Large

Detailed Information:
Pixtral Large is a 124B open weights multimodal model built on Mistral Large 2. Pixtral Large demonstrates frontier-level image understanding capabilities, able to comprehend documents, charts, and natural images while maintaining Mistral Large 2’s leading pure text understanding capabilities. Pixtral Large features:

Frontier-level multimodal performance: Achieving SOTA on tasks like MathVista, DocVQA, VQAv2
123B parameter multimodal decoder, 1B parameter visual encoder: Providing powerful image and text processing capabilities
128K context window length: Accommodating at least 30 high-resolution images
Multilingual OCR and reasoning capabilities: Able to process multilingual text and perform complex reasoning tasks

Mixtral 8x22B

Release Date: April 17, 2024

Model Variant: Mixtral 8x22B

Detailed Information:
Mixtral 8x22B is a sparse Mixture-of-Experts (SMoE) model with 141 billion parameters, activating only 39 billion parameters. It excels in multilingual, mathematics, and coding tasks, and supports native function calling and 64K tokens context window. Mixtral 8x22B is released under the Apache 2.0 license, allowing broad usage.

Mistral NeMo

Release Date: July 19, 2024

Model Variant: Mistral NeMo

Detailed Information:
Mistral NeMo is a 12B model developed in collaboration between Mistral AI and NVIDIA, featuring a 128k tokens context window and supporting FP8 inference. Mistral NeMo excels in multilingual support, mathematical reasoning, and code generation, with notable improvements in Chinese language processing.

Codestral Mamba

Release Date: July 17, 2024

Model Variant: Codestral Mamba

Detailed Information:
Codestral Mamba is an open-source programming model based on a new architecture, capable of processing inputs up to 256,000 tokens with significantly improved speed and efficiency. Codestral Mamba excels in programming tasks, supports multiple programming languages, and can execute complex code generation and reasoning tasks.

Mathstral

Release Date: July 17, 2024

Model Variant: Mathstral

Detailed Information:
Mathstral is a 7B model focused on mathematical reasoning and scientific discovery, designed to solve advanced mathematical problems requiring complex, multi-step logical reasoning. Built on Mistral 7B, Mathstral supports STEM subjects and performs excellently across multiple industry-standard benchmarks.

2024 Open Source AI Models Analysis——Llama, Qwen, Mistral AI, DeepSeek

https://liduos.com/en/open-source-ai-models-2025-llama-qwen-mistral-deepseek.html

Author

莫尔索

Posted on

2024-12-31

Updated on

2025-08-27

2024 Open Source AI Models Analysis——Llama, Qwen, Mistral AI, DeepSeek

Qwen

Qwen 1.5 Series Models

Qwen 2 Series Models

Qwen 2.5 Series Models

Extended Open Source Models

Llama

Llama 3 8B/70B

Llama 3.1 8B/70B/405B

Llama 3.2 1B/3B/11B/90B

Llama 3.3 70B

DeepSeek

DeepSeek LLM

DeepSeek-Coder

DeepSeekMath

DeepSeek-VL

DeepSeek-V2

DeepSeek-Coder-V2

DeepSeek-VL2

DeepSeek-V3

Mistral AI

Mistral Large

Mistral Small

Pixtral Large

Mixtral 8x22B

Mistral NeMo

Codestral Mamba

Mathstral

Author

Posted on

Updated on

Licensed under

Like this article? Support the author with

Comments

follow.it

Links

Catalogue

Recents

Archives

Tags