Mein Newsfeed — Mistral Small 4

Mein Newsfeed — Mistral Small 4 https://newsfeed.avintaris.com News zum Thema Mistral Small 4 de Fri, 22 May 2026 21:22:05 +0000 Mistral Small 4: reasoning_effort Parameter nachträglich in Mistral API dokumentiert https://simonwillison.net/2026/Mar/16/mistral-small-4/ https://simonwillison.net/2026/Mar/16/mistral-small-4/ Mon, 23 Mar 2026 12:00:00 +0000 Mistral Small 4 Bei Launch am 16.03. fehlte Parameter in offizieller API-Doku. Am 23.03. nachgereicht — Werte: 'none' (schnelle Antworten, kein Chain-of-Thought) und 'high' (verbose Reasoning vergleichbar mit Magistral). Pro Request entscheiden ob Latenz oder Reasoning-Qualität priorisiert wird — ähnlich OpenAI reasoning.effort oder Anthropic Extended Thinking. Für razzfazz.ai: ein Modell für beide Workloads. Mistral Small 4: Mistral Small 4 released — 119B MoE, 6B aktive Parameter, Apache 2.0 https://mistral.ai/news/mistral-small-4 https://mistral.ai/news/mistral-small-4 Mon, 16 Mar 2026 12:00:00 +0000 Mistral Small 4 Erstes unified MoE-Modell. 119B Total, 128 Experten, 4 aktive pro Token (~6B aktiv, 8B inkl. Embedding/Output), 256k Context, multimodal Text+Image. Function Calling und JSON-Output nativ. Modell-IDs HF: Mistral-Small-4-119B-2603 (242 GB) und -NVFP4 (4-bit für NVIDIA Blackwell). Inference: vLLM (FLASH_ATTN_MLA), llama.cpp, LM Studio, SGLang, Transformers. Benchmarks (Mistral): LiveCodeBench > GPT-OSS 120B bei 20% weniger Output, GPQA Diamond 71.2%, AIME 2025 auf GPT-OSS-120B-Niveau. Latency: 40% schneller als Small 3. Throughput: 3× mehr Req/s. API $0.15/1M Input. Apache 2.0 — kommerziell uneingeschränkt. Mistral Small 4: Mistral Small 4 NVFP4-Variante & Speculative-Decoding 'Eagle' auf HF https://huggingface.co/collections/mistralai/mistral-small-4 https://huggingface.co/collections/mistralai/mistral-small-4 Mon, 16 Mar 2026 12:00:00 +0000 Mistral Small 4 Zusätzlich zum Hauptmodell: Mistral-Small-4-119B-2603-NVFP4 (NVIDIA FP4-Quantisierung für Blackwell, ~25% Original-Größe bei minimalem Qualitätsverlust) und -eagle (Eagle-Speculative-Decoding-Heads für 2-3× schnelleren Inference auf vLLM). Mistral deckt gesamtes Deployment-Spektrum ab — Datacenter (Blackwell + NVFP4) bis Consumer-GPUs via community GGUF-Quants. Für AMD/Strix Halo: NVFP4 irrelevant, llama.cpp Q-Quants Vulkan-kompatibel.