AI Models

    Open-source models you can fine-tune with Ertas.

    Ant Group Ling / Ring

    Reasoning

    Ant Group (inclusionAI)

    Ant Group's trillion-parameter open-weight family — Ling-2.5-1T (non-thinking, 1M context) and Ring-2.5-1T (the world's first hybrid-linear-architecture thinking model, gold-tier on IMO 2025 with 35/42 score and CMO 2025), plus the April 2026 Ling-2.6-1T update.

    1T (Ling/Ring 2.5)1T (Ling 2.6)

    Arcee Trinity Large

    Reasoning

    Arcee AI

    Arcee AI's January 2026 release — a 400-billion parameter mixture-of-experts with 13B active parameters, 256 experts (4 active per token), 17 trillion training tokens, and 30-33 days of training on 2048 NVIDIA B300 chips. One of the few US-made frontier open-weight models in 2026 alongside OLMo 3 and GPT-OSS.

    400B-A13B

    Code Llama

    Code

    Meta

    Meta's specialized code generation model family built on Llama 2, available in 7B, 13B, 34B, and 70B sizes with variants optimized for code completion, instruction following, and Python development.

    7B13B34B

    Command R

    General

    Cohere

    Cohere's enterprise-focused model family in 35B and 104B sizes, purpose-built for retrieval-augmented generation (RAG) with native citation support, tool use, and multilingual capability across 10+ languages.

    35B104B

    DeepSeek V3.2

    Reasoning

    DeepSeek

    DeepSeek's late-2025 release that introduced DeepSeek Sparse Attention (DSA) — a learned sparse attention mechanism enabling efficient long-context inference, paired with a unified thinking mode toggle. Direct predecessor to DeepSeek V4. MIT-style license.

    671B-A37B

    DeepSeek V4

    Reasoning

    DeepSeek

    DeepSeek's April 2026 flagship — a 1.6 trillion parameter mixture-of-experts model with 49B active parameters and 1M token context, currently leading composite open-weight intelligence benchmarks and reportedly closing the gap with frontier closed-source models.

    284B-A13B (Flash)1.6T-A49B (Pro)

    DeepSeek-R1

    Reasoning

    DeepSeek

    DeepSeek's dedicated reasoning model trained with reinforcement learning to perform extended chain-of-thought reasoning, available in distilled sizes from 1.5B to 70B and the full 671B mixture-of-experts architecture.

    1.5B7B8B

    DeepSeek-V3

    General

    DeepSeek

    DeepSeek's flagship 671-billion parameter mixture-of-experts model with 37B active parameters per token, delivering frontier-level general performance at remarkably efficient inference costs.

    671B (37B active)

    Devstral 2

    Code

    Mistral AI

    Mistral AI's coding-specialized open-weight family — Devstral 2 (123B) and Devstral Small 2 (24B), with the 123B variant scoring 72.2% on SWE-Bench Verified and the 24B running on consumer hardware. Released as a coding specialist line before being absorbed into Mistral Small 4's unified architecture in March 2026.

    24B (Small 2)123B

    Falcon

    General

    TII Abu Dhabi

    The Technology Innovation Institute's open-weight model family in 7B, 40B, and 180B sizes, trained on the massive RefinedWeb dataset and pioneering the use of high-quality filtered web data for LLM training.

    7B40B180B

    Falcon H1R-7B

    Reasoning

    TII

    TII's January 2026 hybrid Mamba+Transformer architecture — a 7-billion parameter model with 256K context that scores 83.1% on AIME 2025, outperforming reasoning models up to 7× its size on math benchmarks.

    7B

    Falcon-H1 Arabic

    Multilingual

    TII

    Technology Innovation Institute's January 2026 Arabic-specialized release — three sizes (3B, 7B, 34B) with hybrid Mamba+Transformer architecture, leading the Open Arabic LLM Leaderboard. The 34B variant beats Llama 3.3 70B at less than half the parameter count on Arabic-specific benchmarks.

    3B7B34B

    Falcon-H1-Tiny

    Small

    TII

    Technology Innovation Institute's January 2026 ultra-small model collection — 15 variants under 100M parameters plus a 600M reasoning model (Falcon-H1-Tiny-R-0.6B), all using hybrid Mamba+Transformer architecture for the smallest viable LLMs of 2026 in browser and microcontroller deployment.

    ~50M~135M~360M

    Gemma 3

    General

    Google

    Google's latest open-weight model family built on Gemini technology, available in 1B, 4B, 12B, and 27B sizes with native multimodal vision-language capabilities and a 128K token context window.

    1B4B12B

    Gemma 4

    General

    Google

    Google's April 2026 open-weight model family — the first Gemma generation released under Apache 2.0, spanning a dense 31B flagship, a 26B-A3.8B mixture-of-experts variant, and edge-optimized 4B and 2B models, all with native multimodal capabilities.

    2B (e2b)4B (e4b)26B-A3.8B

    GLM-4.5

    General

    Z.ai

    Z.ai's July 2025 mixture-of-experts release — 355 billion total parameters with 32 billion active per token, designed to run on 8× Huawei Ascend H20 chips. The workhorse predecessor to the GLM-5 flagship.

    355B-A32B

    GLM-4.6

    General

    Z.ai

    Z.ai's late-2025 mid-tier release — a 355-billion parameter mixture-of-experts with 200K context, near-Claude-Sonnet-4 coding parity, and ~15% fewer tokens-per-task than its predecessor. Companion vision variants GLM-4.6V (106B and 9B) extend the family to multimodal use cases.

    355B

    GLM-4.7

    Code

    Z.ai

    Z.ai's December 2025 coding-focused release — a 400-billion parameter mixture-of-experts with 'Preserved Thinking' multi-turn reasoning, plus a smaller GLM-4.7 Flash variant for production serving. Topped Code Arena among open-weight models at release before being succeeded by the GLM-5 series.

    ~400B (Flagship)Flash (smaller)

    GLM-5

    Reasoning

    Z.ai

    Z.ai's February 2026 flagship — a 745-billion parameter model trained on Huawei Ascend chips, the foundation of the GLM-5 series before the April 2026 GLM-5.1 update added substantial post-training improvements. Z.ai went public on the Hong Kong Stock Exchange in January 2026.

    745B

    GLM-5.1

    Reasoning

    Z.ai

    Z.ai's April 8 2026 update to GLM-5 — same 745-billion parameter base with refined post-training, delivering a 28% coding improvement, 8-hour autonomous run capability, and a SWE-Bench Pro lead that briefly placed an open-weight model ahead of GPT-5.4 and Claude Opus 4.6.

    745B

    GPT-OSS

    General

    OpenAI

    OpenAI's first open-weight model release since GPT-2 — a mixture-of-experts family with the 117B/5.1B-active GPT-OSS-120B flagship and a smaller 21B/3.6B-active GPT-OSS-20B variant, released August 2025 under Apache 2.0.

    21B-A3.6B (20b)117B-A5.1B (120b)

    Hermes 4

    Reasoning

    Nous Research

    Nous Research's August 2025 model family — Llama-3.1-based fine-tunes in 14B, 70B, and 405B sizes featuring hybrid reasoning via explicit thinking tokens, neutrally-aligned post-training, and trained on ~60B tokens with the Atropos reinforcement learning system using ~1,000 task-specific verifiers.

    14B70B405B

    IBM Granite 4.1

    General

    IBM

    IBM's enterprise-focused April 29 2026 release — a family of dense models in 3B, 8B, and 30B sizes plus an Embedding R2 and a 2B Speech variant. The 8B Instruct matches the previous-generation Granite 4.0 32B MoE on benchmarks. Apache 2.0 with 12+ language coverage.

    3B8B30B

    InternLM

    Multilingual

    Shanghai AI Lab

    Shanghai AI Laboratory's multilingual model series in 7B and 20B sizes, featuring strong Chinese-English capabilities, long-context support, and excellent performance on reasoning and tool-use benchmarks.

    7B20B

    Kimi K2

    Reasoning

    Moonshot AI

    Moonshot AI's original 2025 trillion-parameter mixture-of-experts model — the foundation of the Kimi K2 series, with K2.5 setting the open-weight HumanEval record at 99.0 and K2.6 introducing Agent Swarm orchestration. Modified MIT license.

    1T-A32B

    Kimi K2.5

    Reasoning

    Moonshot AI

    Moonshot AI's January 2026 release — the first multimodal Kimi model, adding the MoonViT-3D vision encoder to the K2 lineage's 1T-parameter mixture-of-experts architecture. Set the open-weight HumanEval record at 99.0 and introduced the original 100-agent swarm runtime that K2.6 later scaled to 300.

    1T-A32B

    Kimi K2.6

    Reasoning

    Moonshot AI

    Moonshot AI's April 2026 release: a 1 trillion parameter mixture-of-experts model with 32B active parameters, native vision support, and the standout Agent Swarm capability that scales to 300 coordinated sub-agents over 4,000 steps for long-horizon coding and research tasks.

    1T-A32B

    Llama 3

    General

    Meta

    Meta's third-generation open-weight large language model family, delivering state-of-the-art performance across reasoning, code generation, and multilingual tasks in 8B, 70B, and 405B parameter configurations.

    8B70B405B

    Llama 4

    General

    Meta

    Meta's fourth-generation open-weight model family featuring a mixture-of-experts architecture, with Scout (109B total, 17B active) for efficient deployment and Maverick (400B total, 17B active) for high-capability tasks.

    Scout 109B (17B active)Maverick 400B (17B active)

    Magistral

    Reasoning

    Mistral AI

    Mistral AI's dedicated reasoning model line — Magistral Medium 1.2 (magistral-medium-2509) and Magistral Small 1.2 (magistral-small-2509) — focused on extended chain-of-thought capability before the lineage was unified into Mistral Small 4.

    SmallMedium

    MiMo V2.5

    Code

    Xiaomi

    Xiaomi's April 28 2026 mid-tier release — a 310-billion parameter mixture-of-experts with 15B active parameters, MIT-licensed and released alongside the larger MiMo V2.5 Pro flagship. The deployable mid-tier of the MiMo family for teams that don't need full Pro infrastructure.

    310B-A15B

    MiMo V2.5 Pro

    Code

    Xiaomi

    Xiaomi's April 2026 flagship — a 1.02 trillion parameter mixture-of-experts model with 42B active parameters, 1M token context, MIT license, and benchmark scores reportedly beating Claude Opus 4.6 on SWE-Bench Pro for agentic coding tasks.

    1T-A42B

    MiniMax M2.5

    Code

    MiniMax

    MiniMax's flagship coding model — the current leader on SWE-Bench Verified at 80.2% among open-weight models, designed for agentic coding workloads. The M2.7 successor continues to extend the line.

    456B-A45B

    MiniMax M2.7

    Reasoning

    MiniMax

    MiniMax's March 2026 self-evolving release — improved through 100+ rounds of autonomous reinforcement learning, with native reasoning, 205K context, and the ability to perform 30-50% of an RL research workflow autonomously. The successor to M2.5 (the prior SWE-Bench Verified leader at 80.2%).

    456B-A45B

    Mistral 7B

    General

    Mistral AI

    Mistral AI's foundational 7-billion parameter model that punches well above its weight class, featuring sliding window attention and grouped-query attention for efficient long-context inference.

    7B

    Mistral Small 4

    General

    Mistral AI

    Mistral's March 2026 release that unifies the previously-separate Magistral (reasoning), Devstral (coding agents), and Mistral Small (instruction-tuned) lineages into a single 119B mixture-of-experts model with 6B active parameters, released under Apache 2.0.

    119B-A6B

    Mixtral

    General

    Mistral AI

    Mistral AI's mixture-of-experts models that route each token through 2 of 8 expert networks, delivering 70B-class performance at the cost of a 13B dense model in the 8x7B variant.

    8x7B8x22B

    Nemotron 3 Nano Omni

    Multilingual

    NVIDIA

    NVIDIA's April 29 2026 omni-modal release — a 30-billion parameter mixture-of-experts with 3B active parameters per token, unified text/vision/audio/image processing, 9× throughput vs other open omni models on video workloads, and 25GB RAM deployment. Production adopters at release: Foxconn, Palantir, Oracle, DocuSign.

    30B-A3B

    Neural Chat

    General

    Intel

    Intel's 7-billion parameter conversational model fine-tuned from Mistral 7B, optimized for Intel hardware and demonstrating strong chat performance with particular focus on CPU inference efficiency.

    7B

    OLMo

    General

    Allen AI

    Allen Institute for AI's fully open language model family in 1B, 7B, and 13B sizes, with completely open training data, code, weights, and evaluation — setting the standard for reproducible AI research.

    1B7B13B

    OpenChat

    General

    OpenChat

    A 7-billion parameter model fine-tuned from Mistral 7B using Conditioned Reinforcement Learning Fine-Tuning (C-RLFT), achieving GPT-3.5-level performance through a novel mixed-quality data training approach.

    7B

    Phi-3

    Small

    Microsoft

    Microsoft's family of compact yet capable language models available in 3.8B, 7B, and 14B sizes, designed for on-device and edge deployment with surprisingly strong performance on reasoning and instruction-following tasks.

    3.8B7B14B

    Phi-4

    Small

    Microsoft

    Microsoft's 14-billion parameter small language model that emphasizes reasoning quality through synthetic data training, achieving performance competitive with models several times its size on math and logic benchmarks.

    14B

    Qwen 2.5

    Multilingual

    Alibaba

    Alibaba's comprehensive open-weight model family spanning seven sizes from 0.5B to 72B parameters, with particularly strong multilingual and coding capabilities across 29+ languages.

    0.5B1.5B3B

    Qwen 3

    Multilingual

    Alibaba

    Alibaba's latest-generation model family featuring both dense and mixture-of-experts architectures, with sizes from 0.6B to 235B and built-in hybrid thinking modes for adaptive reasoning depth.

    0.6B1.7B4B

    Qwen 3.5

    Reasoning

    Alibaba

    Alibaba's February 2026 flagship reasoning release — a 397B-A17B mixture-of-experts model that currently leads the open-weight GPQA Diamond benchmark at 88.4, with sibling variants from 0.8B through 122B-A10B. Apache 2.0.

    0.8B2B4B

    Qwen 3.6

    Multilingual

    Alibaba

    Alibaba's April 2026 flagship release combining a fully dense 27B variant that beats the previous-generation 397B reasoning model on coding, alongside a 35B-A3B mixture-of-experts variant for ultra-efficient inference, all under Apache 2.0.

    27B35B-A3B

    Qwen3-Coder

    Code

    Alibaba

    Alibaba's specialized coding model line — including the 480B-A35B Qwen3-Coder flagship with 256K-1M context and the 80B-A3B Qwen3-Coder-Next, both designed natively for Claude Code, Cline, and Qwen Code-style agentic coding CLIs. Apache 2.0.

    30B-A3B80B-A3B (Next)480B-A35B

    Qwen3-Coder-Next

    Code

    Alibaba

    Alibaba's February 2026 small-giant release — an 80-billion parameter mixture-of-experts model with only 3B active parameters per token, outperforming DeepSeek V3.2 (37B active), Kimi K2.5 and GLM-4.7 (32B active each) on coding benchmarks while activating 10× fewer parameters. Apache 2.0 with 256K context.

    80B-A3B

    Qwen3-Omni

    Multilingual

    Alibaba

    Alibaba's omni-modal model — accepting text, image, audio, and video input and producing text plus realtime speech output in a single 30B-A3B mixture-of-experts checkpoint. Apache 2.0.

    30B-A3B

    Qwen3.5-Omni

    Multilingual

    Alibaba

    Alibaba's March 30 2026 omni-modal release — Plus, Flash, and Light variants supporting 113 speech-input languages, 256K context (10 hours of audio or 400 seconds of 720p video), and beating Gemini 3.1 Pro on audio benchmarks. The architectural and capability successor to Qwen3-Omni.

    Light (edge)Flash (latency)Plus (flagship)

    SmolLM

    Small

    HuggingFace

    HuggingFace's family of ultra-compact language models in 135M, 360M, and 1.7B sizes, trained on the high-quality Cosmopedia synthetic dataset and designed for on-device AI applications with minimal resource requirements.

    135M360M1.7B

    SOLAR

    General

    Upstage

    Upstage's 10.7-billion parameter model created through depth up-scaling, a novel technique that merges and extends a pretrained model's layers to achieve larger-model quality at efficient inference cost.

    10.7B

    StarCoder

    Code

    BigCode / HuggingFace

    An open-access code generation model trained on permissively licensed source code, available in 3B, 7B, and 15B sizes with transparent training data governance and strong multi-language programming support.

    3B7B15B

    StepFun Step-3.5-Flash

    Reasoning

    StepFun

    StepFun's February 2026 small-giant release — a 196-billion parameter mixture-of-experts with 11B active parameters, outperforming Kimi K2.5 (1T) and DeepSeek V3.2 (671B) on agentic, reasoning, and coding benchmarks at 3-5× smaller scale. Apache 2.0 with 100 tok/sec at 128K context on Hopper GPUs.

    196B-A11B

    Tencent Hy3 (Hunyuan 3) Preview

    Reasoning

    Tencent

    Tencent's April 23 2026 comeback release — a 295-billion parameter mixture-of-experts with 21B active parameters plus a 3.8B Multi-Token Prediction module, built in 90 days under former OpenAI researcher Shunyu Yao after a complete Hunyuan infrastructure rebuild. 256K context with strong math, code, and multilingual performance.

    295B-A21B + 3.8B MTP

    TinyLlama

    Small

    TinyLlama Team

    A compact 1.1-billion parameter model trained on 3 trillion tokens — far more data than typical for its size — delivering surprisingly capable performance for edge deployment, mobile applications, and resource-constrained environments.

    1.1B

    Vicuna

    General

    LMSYS

    LMSYS's instruction-tuned model family in 7B, 13B, and 33B sizes, fine-tuned from Llama on ShareGPT conversations and widely recognized for pioneering open-source chatbot evaluation methodology.

    7B13B33B

    Yi

    Multilingual

    01.AI

    01.AI's bilingual Chinese-English model family available in 6B, 9B, and 34B sizes, known for strong performance on both Chinese and English benchmarks with excellent instruction-following capabilities.

    6B9B34B

    Zephyr

    General

    HuggingFace

    HuggingFace's 7-billion parameter model fine-tuned from Mistral 7B using distilled direct preference optimization (dDPO), demonstrating that alignment techniques can produce highly capable chat models without human preference data.

    7B