AI Models

Open-source models you can fine-tune with Ertas.

Ant Group Ling / Ring

Ant Group (inclusionAI)

Ant Group's trillion-parameter open-weight family — Ling-2.5-1T (non-thinking, 1M context) and Ring-2.5-1T (the world's first hybrid-linear-architecture thinking model, gold-tier on IMO 2025 with 35/42 score and CMO 2025), plus the April 2026 Ling-2.6-1T update.

1T (Ling/Ring 2.5)1T (Ling 2.6)

Arcee Trinity Large

Arcee AI

Arcee AI's January 2026 release — a 400-billion parameter mixture-of-experts with 13B active parameters, 256 experts (4 active per token), 17 trillion training tokens, and 30-33 days of training on 2048 NVIDIA B300 chips. One of the few US-made frontier open-weight models in 2026 alongside OLMo 3 and GPT-OSS.

Code Llama

Meta

Meta's specialized code generation model family built on Llama 2, available in 7B, 13B, 34B, and 70B sizes with variants optimized for code completion, instruction following, and Python development.

Command R

Cohere

Cohere's enterprise-focused model family in 35B and 104B sizes, purpose-built for retrieval-augmented generation (RAG) with native citation support, tool use, and multilingual capability across 10+ languages.

DeepSeek V3.2

DeepSeek

DeepSeek's late-2025 release that introduced DeepSeek Sparse Attention (DSA) — a learned sparse attention mechanism enabling efficient long-context inference, paired with a unified thinking mode toggle. Direct predecessor to DeepSeek V4. MIT-style license.

DeepSeek V4

DeepSeek

DeepSeek's April 2026 flagship — a 1.6 trillion parameter mixture-of-experts model with 49B active parameters and 1M token context, currently leading composite open-weight intelligence benchmarks and reportedly closing the gap with frontier closed-source models.

284B-A13B (Flash)1.6T-A49B (Pro)

DeepSeek-R1

DeepSeek

DeepSeek's dedicated reasoning model trained with reinforcement learning to perform extended chain-of-thought reasoning, available in distilled sizes from 1.5B to 70B and the full 671B mixture-of-experts architecture.

DeepSeek-V3

DeepSeek

DeepSeek's flagship 671-billion parameter mixture-of-experts model with 37B active parameters per token, delivering frontier-level general performance at remarkably efficient inference costs.

671B (37B active)

Devstral 2

Mistral AI

Mistral AI's coding-specialized open-weight family — Devstral 2 (123B) and Devstral Small 2 (24B), with the 123B variant scoring 72.2% on SWE-Bench Verified and the 24B running on consumer hardware. Released as a coding specialist line before being absorbed into Mistral Small 4's unified architecture in March 2026.

24B (Small 2)123B

Falcon

TII Abu Dhabi

The Technology Innovation Institute's open-weight model family in 7B, 40B, and 180B sizes, trained on the massive RefinedWeb dataset and pioneering the use of high-quality filtered web data for LLM training.

Falcon H1R-7B

TII

TII's January 2026 hybrid Mamba+Transformer architecture — a 7-billion parameter model with 256K context that scores 83.1% on AIME 2025, outperforming reasoning models up to 7× its size on math benchmarks.

Falcon-H1 Arabic

TII

Technology Innovation Institute's January 2026 Arabic-specialized release — three sizes (3B, 7B, 34B) with hybrid Mamba+Transformer architecture, leading the Open Arabic LLM Leaderboard. The 34B variant beats Llama 3.3 70B at less than half the parameter count on Arabic-specific benchmarks.

Falcon-H1-Tiny

TII

Technology Innovation Institute's January 2026 ultra-small model collection — 15 variants under 100M parameters plus a 600M reasoning model (Falcon-H1-Tiny-R-0.6B), all using hybrid Mamba+Transformer architecture for the smallest viable LLMs of 2026 in browser and microcontroller deployment.

Gemma 3

Google

Google's latest open-weight model family built on Gemini technology, available in 1B, 4B, 12B, and 27B sizes with native multimodal vision-language capabilities and a 128K token context window.

Gemma 4

Google

Google's April 2026 open-weight model family — the first Gemma generation released under Apache 2.0, spanning a dense 31B flagship, a 26B-A3.8B mixture-of-experts variant, and edge-optimized 4B and 2B models, all with native multimodal capabilities.

2B (e2b)4B (e4b)26B-A3.8B

GLM-4.5

Z.ai

Z.ai's July 2025 mixture-of-experts release — 355 billion total parameters with 32 billion active per token, designed to run on 8× Huawei Ascend H20 chips. The workhorse predecessor to the GLM-5 flagship.

GLM-4.6

Z.ai

Z.ai's late-2025 mid-tier release — a 355-billion parameter mixture-of-experts with 200K context, near-Claude-Sonnet-4 coding parity, and ~15% fewer tokens-per-task than its predecessor. Companion vision variants GLM-4.6V (106B and 9B) extend the family to multimodal use cases.

GLM-4.7

Z.ai

Z.ai's December 2025 coding-focused release — a 400-billion parameter mixture-of-experts with 'Preserved Thinking' multi-turn reasoning, plus a smaller GLM-4.7 Flash variant for production serving. Topped Code Arena among open-weight models at release before being succeeded by the GLM-5 series.

~400B (Flagship)Flash (smaller)

GLM-5

Z.ai

Z.ai's February 2026 flagship — a 745-billion parameter model trained on Huawei Ascend chips, the foundation of the GLM-5 series before the April 2026 GLM-5.1 update added substantial post-training improvements. Z.ai went public on the Hong Kong Stock Exchange in January 2026.

GLM-5.1

Z.ai

Z.ai's April 8 2026 update to GLM-5 — same 745-billion parameter base with refined post-training, delivering a 28% coding improvement, 8-hour autonomous run capability, and a SWE-Bench Pro lead that briefly placed an open-weight model ahead of GPT-5.4 and Claude Opus 4.6.

GPT-OSS

OpenAI

OpenAI's first open-weight model release since GPT-2 — a mixture-of-experts family with the 117B/5.1B-active GPT-OSS-120B flagship and a smaller 21B/3.6B-active GPT-OSS-20B variant, released August 2025 under Apache 2.0.

21B-A3.6B (20b)117B-A5.1B (120b)

Hermes 4

Nous Research

Nous Research's August 2025 model family — Llama-3.1-based fine-tunes in 14B, 70B, and 405B sizes featuring hybrid reasoning via explicit thinking tokens, neutrally-aligned post-training, and trained on ~60B tokens with the Atropos reinforcement learning system using ~1,000 task-specific verifiers.

IBM Granite 4.1

IBM

IBM's enterprise-focused April 29 2026 release — a family of dense models in 3B, 8B, and 30B sizes plus an Embedding R2 and a 2B Speech variant. The 8B Instruct matches the previous-generation Granite 4.0 32B MoE on benchmarks. Apache 2.0 with 12+ language coverage.

InternLM

Shanghai AI Lab

Shanghai AI Laboratory's multilingual model series in 7B and 20B sizes, featuring strong Chinese-English capabilities, long-context support, and excellent performance on reasoning and tool-use benchmarks.

Kimi K2

Moonshot AI

Moonshot AI's original 2025 trillion-parameter mixture-of-experts model — the foundation of the Kimi K2 series, with K2.5 setting the open-weight HumanEval record at 99.0 and K2.6 introducing Agent Swarm orchestration. Modified MIT license.

Kimi K2.5

Moonshot AI

Moonshot AI's January 2026 release — the first multimodal Kimi model, adding the MoonViT-3D vision encoder to the K2 lineage's 1T-parameter mixture-of-experts architecture. Set the open-weight HumanEval record at 99.0 and introduced the original 100-agent swarm runtime that K2.6 later scaled to 300.

Kimi K2.6

Moonshot AI

Moonshot AI's April 2026 release: a 1 trillion parameter mixture-of-experts model with 32B active parameters, native vision support, and the standout Agent Swarm capability that scales to 300 coordinated sub-agents over 4,000 steps for long-horizon coding and research tasks.

Llama 3

Meta

Meta's third-generation open-weight large language model family, delivering state-of-the-art performance across reasoning, code generation, and multilingual tasks in 8B, 70B, and 405B parameter configurations.

Llama 4

Meta

Meta's fourth-generation open-weight model family featuring a mixture-of-experts architecture, with Scout (109B total, 17B active) for efficient deployment and Maverick (400B total, 17B active) for high-capability tasks.

Scout 109B (17B active)Maverick 400B (17B active)

Magistral

Mistral AI

Mistral AI's dedicated reasoning model line — Magistral Medium 1.2 (magistral-medium-2509) and Magistral Small 1.2 (magistral-small-2509) — focused on extended chain-of-thought capability before the lineage was unified into Mistral Small 4.

MiMo V2.5

Xiaomi

Xiaomi's April 28 2026 mid-tier release — a 310-billion parameter mixture-of-experts with 15B active parameters, MIT-licensed and released alongside the larger MiMo V2.5 Pro flagship. The deployable mid-tier of the MiMo family for teams that don't need full Pro infrastructure.

MiMo V2.5 Pro

Xiaomi

Xiaomi's April 2026 flagship — a 1.02 trillion parameter mixture-of-experts model with 42B active parameters, 1M token context, MIT license, and benchmark scores reportedly beating Claude Opus 4.6 on SWE-Bench Pro for agentic coding tasks.

MiniMax M2.5

MiniMax

MiniMax's flagship coding model — the current leader on SWE-Bench Verified at 80.2% among open-weight models, designed for agentic coding workloads. The M2.7 successor continues to extend the line.

MiniMax M2.7

MiniMax

MiniMax's March 2026 self-evolving release — improved through 100+ rounds of autonomous reinforcement learning, with native reasoning, 205K context, and the ability to perform 30-50% of an RL research workflow autonomously. The successor to M2.5 (the prior SWE-Bench Verified leader at 80.2%).

Mistral 7B

Mistral AI

Mistral AI's foundational 7-billion parameter model that punches well above its weight class, featuring sliding window attention and grouped-query attention for efficient long-context inference.

Mistral Small 4

Mistral AI

Mistral's March 2026 release that unifies the previously-separate Magistral (reasoning), Devstral (coding agents), and Mistral Small (instruction-tuned) lineages into a single 119B mixture-of-experts model with 6B active parameters, released under Apache 2.0.

Mixtral

Mistral AI

Mistral AI's mixture-of-experts models that route each token through 2 of 8 expert networks, delivering 70B-class performance at the cost of a 13B dense model in the 8x7B variant.

Nemotron 3 Nano Omni

NVIDIA

NVIDIA's April 29 2026 omni-modal release — a 30-billion parameter mixture-of-experts with 3B active parameters per token, unified text/vision/audio/image processing, 9× throughput vs other open omni models on video workloads, and 25GB RAM deployment. Production adopters at release: Foxconn, Palantir, Oracle, DocuSign.

Neural Chat

Intel

Intel's 7-billion parameter conversational model fine-tuned from Mistral 7B, optimized for Intel hardware and demonstrating strong chat performance with particular focus on CPU inference efficiency.

OLMo

Allen AI

Allen Institute for AI's fully open language model family in 1B, 7B, and 13B sizes, with completely open training data, code, weights, and evaluation — setting the standard for reproducible AI research.

OpenChat

OpenChat

A 7-billion parameter model fine-tuned from Mistral 7B using Conditioned Reinforcement Learning Fine-Tuning (C-RLFT), achieving GPT-3.5-level performance through a novel mixed-quality data training approach.

Phi-3

Microsoft

Microsoft's family of compact yet capable language models available in 3.8B, 7B, and 14B sizes, designed for on-device and edge deployment with surprisingly strong performance on reasoning and instruction-following tasks.

Phi-4

Microsoft

Microsoft's 14-billion parameter small language model that emphasizes reasoning quality through synthetic data training, achieving performance competitive with models several times its size on math and logic benchmarks.

Qwen 2.5

Alibaba

Alibaba's comprehensive open-weight model family spanning seven sizes from 0.5B to 72B parameters, with particularly strong multilingual and coding capabilities across 29+ languages.

Qwen 3

Alibaba

Alibaba's latest-generation model family featuring both dense and mixture-of-experts architectures, with sizes from 0.6B to 235B and built-in hybrid thinking modes for adaptive reasoning depth.

Qwen 3.5

Alibaba

Alibaba's February 2026 flagship reasoning release — a 397B-A17B mixture-of-experts model that currently leads the open-weight GPQA Diamond benchmark at 88.4, with sibling variants from 0.8B through 122B-A10B. Apache 2.0.

Qwen 3.6

Alibaba

Alibaba's April 2026 flagship release combining a fully dense 27B variant that beats the previous-generation 397B reasoning model on coding, alongside a 35B-A3B mixture-of-experts variant for ultra-efficient inference, all under Apache 2.0.

Qwen3-Coder

Alibaba

Alibaba's specialized coding model line — including the 480B-A35B Qwen3-Coder flagship with 256K-1M context and the 80B-A3B Qwen3-Coder-Next, both designed natively for Claude Code, Cline, and Qwen Code-style agentic coding CLIs. Apache 2.0.

30B-A3B80B-A3B (Next)480B-A35B

Qwen3-Coder-Next

Alibaba

Alibaba's February 2026 small-giant release — an 80-billion parameter mixture-of-experts model with only 3B active parameters per token, outperforming DeepSeek V3.2 (37B active), Kimi K2.5 and GLM-4.7 (32B active each) on coding benchmarks while activating 10× fewer parameters. Apache 2.0 with 256K context.

Qwen3-Omni

Alibaba

Alibaba's omni-modal model — accepting text, image, audio, and video input and producing text plus realtime speech output in a single 30B-A3B mixture-of-experts checkpoint. Apache 2.0.

Qwen3.5-Omni

Alibaba

Alibaba's March 30 2026 omni-modal release — Plus, Flash, and Light variants supporting 113 speech-input languages, 256K context (10 hours of audio or 400 seconds of 720p video), and beating Gemini 3.1 Pro on audio benchmarks. The architectural and capability successor to Qwen3-Omni.

Light (edge)Flash (latency)Plus (flagship)

SmolLM

HuggingFace

HuggingFace's family of ultra-compact language models in 135M, 360M, and 1.7B sizes, trained on the high-quality Cosmopedia synthetic dataset and designed for on-device AI applications with minimal resource requirements.

SOLAR

Upstage

Upstage's 10.7-billion parameter model created through depth up-scaling, a novel technique that merges and extends a pretrained model's layers to achieve larger-model quality at efficient inference cost.

StarCoder

BigCode / HuggingFace

An open-access code generation model trained on permissively licensed source code, available in 3B, 7B, and 15B sizes with transparent training data governance and strong multi-language programming support.

StepFun Step-3.5-Flash

StepFun

StepFun's February 2026 small-giant release — a 196-billion parameter mixture-of-experts with 11B active parameters, outperforming Kimi K2.5 (1T) and DeepSeek V3.2 (671B) on agentic, reasoning, and coding benchmarks at 3-5× smaller scale. Apache 2.0 with 100 tok/sec at 128K context on Hopper GPUs.

Tencent Hy3 (Hunyuan 3) Preview

Tencent

Tencent's April 23 2026 comeback release — a 295-billion parameter mixture-of-experts with 21B active parameters plus a 3.8B Multi-Token Prediction module, built in 90 days under former OpenAI researcher Shunyu Yao after a complete Hunyuan infrastructure rebuild. 256K context with strong math, code, and multilingual performance.

295B-A21B + 3.8B MTP

TinyLlama

TinyLlama Team

A compact 1.1-billion parameter model trained on 3 trillion tokens — far more data than typical for its size — delivering surprisingly capable performance for edge deployment, mobile applications, and resource-constrained environments.

Vicuna

LMSYS

LMSYS's instruction-tuned model family in 7B, 13B, and 33B sizes, fine-tuned from Llama on ShareGPT conversations and widely recognized for pioneering open-source chatbot evaluation methodology.

Yi

01.AI

01.AI's bilingual Chinese-English model family available in 6B, 9B, and 34B sizes, known for strong performance on both Chinese and English benchmarks with excellent instruction-following capabilities.

Zephyr

HuggingFace

HuggingFace's 7-billion parameter model fine-tuned from Mistral 7B using distilled direct preference optimization (dDPO), demonstrating that alignment techniques can produce highly capable chat models without human preference data.