2024 Open Source AI Models Analysis——Llama, Qwen, Mistral AI, DeepSeek
This article introduces the Qwen series models, including Qwen 1.5, Qwen 2, and Qwen 2.5, which were released at different times and offer various model scales, achieving significant progress in performance, multilingual capabilities, context length, and safety. Additionally, Qwen has launched specialized models for vision-language, multimodal reasoning, and audio processing, such as Qwen2-VL, QVQ-72B-Preview, and Qwen2-Audio, further expanding the models’ application scope. The Llama series models, from Llama 3 to Llama 3.1, Llama 3.2, and Llama 3.3, have continuously broken through barriers in parameter scale, context length, and performance, with Llama 3.1 405B version becoming one of the largest open-source large language models. The DeepSeek series models include DeepSeek LLM, DeepSeek-Coder, DeepSeekMath, DeepSeek-VL, DeepSeek-V2, DeepSeek-Coder-V2, DeepSeek-VL2, and DeepSeek-V3, demonstrating exceptional capabilities in multilingual processing, code generation, mathematical reasoning, and vision-language processing, with significant improvements in both performance and efficiency. The Mistral AI series models, including Mistral Large, Mistral Small, Pixtral Large, Mixtral 8x22B, Mistral NeMo, Codestral Mamba, and Mathstral, excel in multilingual reasoning, multimodal processing, programming tasks, and mathematical reasoning, achieving a balance between cost and performance.
Qwen
Qwen 1.5 Series Models
Release Date: February 4, 2024
Model Variants:
- Qwen1.5-0.5B
- Qwen1.5-1.8B
- Qwen1.5-4B
- Qwen1.5-7B
- Qwen1.5-14B
- Qwen1.5-32B
- Qwen1.5-72B
- Qwen1.5-110B
- Qwen1.5-MoE
Detailed Information:
The Qwen 1.5 series models were released during the Lunar New Year, aiming to provide enhanced model performance and improve developer experience. This series includes multiple Base and Chat models of various scales, along with an MoE model. All models support a context length of 32K tokens. Furthermore, the Qwen 1.5 series models demonstrate excellent multilingual capabilities, supporting multiple languages including Arabic, Spanish, French, Japanese, Korean, and Thai.
Qwen 2 Series Models
Release Date: June 7, 2024
Model Variants:
- Qwen2-0.5B
- Qwen2-1.5B
- Qwen2-7B
- Qwen2-57B-A14B
- Qwen2-72B
- Qwen2-VL (capable of understanding long videos for video-based Q&A, dialogue, and content creation)
- Qwen2-Audio (accepts audio and text inputs, generating text outputs with features like voice chat without requiring ASR modules)
Detailed Information:
The Qwen 2 series represents a major upgrade from Qwen 1.5, adding support for 27 languages and achieving leading performance across multiple evaluation benchmarks. The series shows significant improvements in code and mathematical capabilities, with context lengths up to 128K tokens (Qwen2-72B-Instruct). Additionally, Qwen 2 series models demonstrate safety performance comparable to GPT-4, significantly outperforming other models.
Qwen 2.5 Series Models
Release Date: September 19, 2024
Model Variants:
- Qwen2.5-0.5B
- Qwen2.5-1.5B
- Qwen2.5-3B
- Qwen2.5-7B
- Qwen2.5-14B
- Qwen2.5-32B
- Qwen2.5-72B
- Qwen2.5-Coder (1.5B, 7B, 32B)
- Qwen2.5-Math (1.5B, 7B, 72B)
Detailed Information:
The Qwen 2.5 series surpasses Llama in performance, particularly in programming (HumanEval 85+) and mathematical abilities (MATH 80+). These models support context lengths up to 128K and can generate up to 8K content. Notably, Qwen2.5-72B outperforms the 405B-parameter Llama3.1-405B in multiple core tasks. The series includes specialized subseries for programming and mathematical tasks: Qwen2.5-Coder and Qwen2.5-Math.
Extended Open Source Models
Qwen2-VL-72B Vision-Language Model:
- Release Date: August 29, 2024
- Model Variant: Qwen2-VL-72B
- Detailed Information: Qwen2-VL-72B is a vision-language model capable of recognizing images of various resolutions and aspect ratios, understanding long videos over 20 minutes, and possessing visual agent capabilities for operating phones and robots. It became the highest-scoring open-source model on the LMSYSChatbotArenaLeaderboard globally.
QVQ-72B-Preview Multimodal Reasoning Model:
- Release Date: December 25, 2024
- Model Variant: QVQ-72B-Preview
- Detailed Information: QVQ-72B-Preview is an open-source multimodal reasoning model that achieved significant breakthroughs in AI visual understanding and complex problem-solving. It scored an impressive 70.3 on the MMMU evaluation and showed significant improvements in mathematics-related benchmarks compared to Qwen2-VL-72B-Instruct.
Qwen2-Audio:
- Release Date: August 9, 2024
- Model Variants: Qwen2-Audio-7B and Qwen2-Audio-7B-Instruct
- Detailed Information: This model accepts audio and text inputs, generating text outputs. Key features include:
- Voice chat capabilities without requiring ASR modules
- Audio analysis of speech, sound, and music based on text instructions
- Support for over 8 languages and dialects, including Chinese, English, Cantonese, French, Italian, Spanish, German, and Japanese
Llama
Llama 3 8B/70B
Release Date: April 18, 2024
Model Variants: Llama 3 8B and Llama 3 70B
Detailed Information:
The Llama 3 series includes pre-trained and instruction-tuned text generation models with 8B and 70B parameters. These models significantly surpass Llama 2 technically, redefining performance standards for large language models.
Llama 3.1 8B/70B/405B
Release Date: July 23, 2024
Model Variants: Llama 3.1 8B, Llama 3.1 70B, and Llama 3.1 405B
Detailed Information:
The Llama 3.1 series includes models with 8B, 70B, and 405B parameters, with maximum context window increased to 128K. Llama 3.1 405B is the largest open-source large language model to date, excelling in multilingual support, reasoning capabilities, and complex mathematical problem-solving.
Llama 3.2 1B/3B/11B/90B
Release Date: September 26, 2024
Model Variants: Llama 3.2 1B, Llama 3.2 3B, Llama 3.2 11B, and Llama 3.2 90B
Detailed Information:
Llama 3.2 released small and medium-sized visual LLMs (11B and 90B), along with lightweight text-only models (1B and 3B) suitable for edge and mobile devices, including both pre-trained and instruction-tuned versions. These models support 128K context length and are optimized for Arm processors, suitable for local summarization, instruction following, and rewriting tasks.
Llama 3.3 70B
Release Date: December 7, 2024
Model Variant: Llama 3.3 70B
Detailed Information:
Llama 3.3 is the latest version in the Llama series, further improving model efficiency and performance. Llama 3.3 has made significant progress in multilingual capabilities, code generation, and complex mathematical problem-solving, reaching a parameter scale of 405B, approaching or surpassing the performance of other leading models in the market.
These open-source model releases reflect Meta’s commitment to an open AI ecosystem, providing powerful tools for researchers and developers, driving the advancement of artificial intelligence technology.
DeepSeek
DeepSeek LLM
Release Date: January 5, 2024
Model Variants: DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat
Detailed Information:
DeepSeek LLM is DeepSeek’s first large model, containing 67 billion parameters. It was trained from scratch on a dataset of 2 trillion tokens covering Chinese and English. DeepSeek LLM 67B Base surpasses Llama2 70B Base in reasoning, coding, mathematics, and Chinese comprehension. DeepSeek LLM 67B Chat excels in coding and mathematics, demonstrating significant generalization capabilities, scoring 65 points on the Hungarian National High School Exam. Additionally, it masters Chinese: DeepSeek LLM 67B Chat outperforms GPT-3.5 in Chinese performance.
DeepSeek-Coder
Release Date: January 25, 2024
Model Variants: DeepSeek Coder versions from 1B to 33B
Detailed Information:
DeepSeek Coder consists of a series of code language models, each trained from scratch on 2 trillion tokens, with the dataset comprising 87% code and 13% Chinese and English natural language. Code model sizes range from 1B to 33B versions. Each model was pre-trained on project-level code corpora using a 16K window size and additional fill-in-the-blank tasks to support project-level code completion and filling. DeepSeek Coder achieves state-of-the-art performance among open-source code models across various programming languages and benchmarks.
DeepSeekMath
Release Date: February 5, 2024
Model Variant: DeepSeekMath 7B
Detailed Information:
DeepSeekMath builds upon DeepSeek-Coder-v1.5 7B, continuing pre-training on mathematics-related tokens extracted from Common Crawl along with natural language and code data, reaching 500 billion tokens in training scale. DeepSeekMath 7B achieved an excellent score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance levels of Gemini-Ultra and GPT-4.
DeepSeek-VL
Release Date: March 11, 2024
Model Variants: DeepSeek-VL 1.3B and 7B models
Detailed Information:
DeepSeek-VL is an open-source vision-language (VL) model that employs a hybrid visual encoder, efficiently processing high-resolution images (1024 x 1024) within a fixed token budget while maintaining relatively low computational overhead. This design ensures the model’s ability to capture key semantic and detailed information across various visual tasks. The DeepSeek-VL series achieves state-of-the-art or competitive performance on a wide range of vision-language benchmarks at the same model size.
DeepSeek-V2
Release Date: May 7, 2024
Model Variant: DeepSeek-V2
Detailed Information:
DeepSeek-V2 is a powerful Mixture of Experts (MoE) language model characterized by cost-effective training and inference. It contains 236 billion total parameters, with 21 billion parameters activated per token. Compared to DeepSeek 67B, DeepSeek-V2 achieves stronger performance while saving 42.5% in training costs, reducing KV cache by 93.3%, and increasing maximum generation throughput by 5.76 times. DeepSeek-V2 was pre-trained on a diverse and high-quality corpus of 8.1 trillion tokens.
DeepSeek-Coder-V2
Release Date: June 17, 2024
Model Variant: DeepSeek-Coder-V2
Detailed Information:
DeepSeek-Coder-V2 is an open-source Mixture of Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Starting from an intermediate checkpoint of DeepSeek-V2, it was further pre-trained on an additional 6 trillion tokens, significantly enhancing DeepSeek-V2’s coding and mathematical reasoning capabilities while maintaining comparable performance in general language tasks. It has made significant progress in code-related tasks, reasoning abilities, and general capabilities. Additionally, DeepSeek-Coder-V2 expanded supported programming languages from 86 to 338 and extended context length from 16K to 128K.
DeepSeek-VL2
Release Date: December 13, 2024
Model Variants: DeepSeek-VL2-Tiny, DeepSeek-VL2-Small, and DeepSeek-VL2
Detailed Information:
DeepSeek-VL2 is an advanced large Mixture of Experts (MoE) vision-language model series, showing significant improvements over its predecessor DeepSeek-VL. DeepSeek-VL2 demonstrates excellent capabilities in various tasks, including but not limited to visual question answering, optical character recognition, document/table/chart understanding, and visual grounding. The model series consists of three variants: DeepSeek-VL2-Tiny, DeepSeek-VL2-Small, and DeepSeek-VL2, with 1B, 2.8B, and 4.5B activated parameters respectively. Compared to existing open-source dense models and MoE-based models, DeepSeek-VL2 achieves competitive or state-of-the-art performance with similar or fewer activated parameters.
DeepSeek-V3
Release Date: December 26, 2024
Model Variant: DeepSeek-V3
Detailed Information:
DeepSeek-V3 is a powerful Mixture of Experts (MoE) language model with 671 billion total parameters, activating 37 billion parameters per token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeek MoE architecture, which were well-validated in DeepSeek-V2. Additionally, DeepSeek-V3 pioneered an auxiliary-loss-free load balancing strategy and set multi-token prediction training objectives to enhance performance. The team pre-trained DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by supervised fine-tuning and reinforcement learning phases to fully unlock its potential. Comprehensive evaluations indicate that DeepSeek-V3 surpasses other open-source models and achieves performance comparable to leading closed-source models.
Mistral AI
Mistral Large
Release Date: February 26, 2024
Model Variant: Mistral Large
Detailed Information:
Mistral Large is Mistral AI’s flagship model, featuring powerful reasoning capabilities suitable for complex multilingual reasoning tasks, including text understanding, transformation, and code generation. Mistral Large achieved strong results in common benchmarks, becoming the world’s second-best model available through API (after GPT-4). Mistral Large features:
- Multilingual Capabilities: Mastery of English, French, Spanish, German, and Italian, with detailed understanding of grammar and cultural context
- 32K tokens context window: Enabling precise information recall from large documents
- Precise instruction following: Allowing developers to design their content moderation policies
- Native function calling capabilities: Enabling large-scale application development and technology stack modernization when combined with restricted output mode implemented on la Plateforme
Mistral Small
Release Date: September 18, 2024
Model Variant: Mistral Small
Detailed Information:
Mistral Small is an optimized model targeting latency and cost optimization. Mistral Small outperforms Mixtral 8x7B in performance with lower latency, making it a refined middle-ground solution between open weights products and flagship models. Mistral Small features:
- 22B parameters: Providing a middle ground between Mistral NeMo 12B and Mistral Large 2
- 32,768 vocabulary size: Supporting richer language expression
- Function calling support: Enabling more complex interactions with internal code, APIs, or databases
- 128k sequence length: Allowing processing of longer text inputs
Pixtral Large
Release Date: November 18, 2024
Model Variant: Pixtral Large
Detailed Information:
Pixtral Large is a 124B open weights multimodal model built on Mistral Large 2. Pixtral Large demonstrates frontier-level image understanding capabilities, able to comprehend documents, charts, and natural images while maintaining Mistral Large 2’s leading pure text understanding capabilities. Pixtral Large features:
- Frontier-level multimodal performance: Achieving SOTA on tasks like MathVista, DocVQA, VQAv2
- 123B parameter multimodal decoder, 1B parameter visual encoder: Providing powerful image and text processing capabilities
- 128K context window length: Accommodating at least 30 high-resolution images
- Multilingual OCR and reasoning capabilities: Able to process multilingual text and perform complex reasoning tasks
Mixtral 8x22B
Release Date: April 17, 2024
Model Variant: Mixtral 8x22B
Detailed Information:
Mixtral 8x22B is a sparse Mixture-of-Experts (SMoE) model with 141 billion parameters, activating only 39 billion parameters. It excels in multilingual, mathematics, and coding tasks, and supports native function calling and 64K tokens context window. Mixtral 8x22B is released under the Apache 2.0 license, allowing broad usage.
Mistral NeMo
Release Date: July 19, 2024
Model Variant: Mistral NeMo
Detailed Information:
Mistral NeMo is a 12B model developed in collaboration between Mistral AI and NVIDIA, featuring a 128k tokens context window and supporting FP8 inference. Mistral NeMo excels in multilingual support, mathematical reasoning, and code generation, with notable improvements in Chinese language processing.
Codestral Mamba
Release Date: July 17, 2024
Model Variant: Codestral Mamba
Detailed Information:
Codestral Mamba is an open-source programming model based on a new architecture, capable of processing inputs up to 256,000 tokens with significantly improved speed and efficiency. Codestral Mamba excels in programming tasks, supports multiple programming languages, and can execute complex code generation and reasoning tasks.
Mathstral
Release Date: July 17, 2024
Model Variant: Mathstral
Detailed Information:
Mathstral is a 7B model focused on mathematical reasoning and scientific discovery, designed to solve advanced mathematical problems requiring complex, multi-step logical reasoning. Built on Mistral 7B, Mathstral supports STEM subjects and performs excellently across multiple industry-standard benchmarks.
2024 Open Source AI Models Analysis——Llama, Qwen, Mistral AI, DeepSeek
https://liduos.com/en/open-source-ai-models-2025-llama-qwen-mistral-deepseek.html