Ivonar: Native 1-Bit LLM Family.

Ivonar is a native 1-Bit LLM family training real W1.58A8 BitLinear models in PyTorch with ternary-weight STE, INT8-activation STE, Mamba-3 mixers, MLA attention, and dense-to-sparse scaling across four tiers.

Contact View Architecture

About Ivonar

Ivonar is a native 1-Bit LLM family developed independently in Germany. It spans four training scales: Nano, Mini, Medium, and High. Each tier shares the same technical direction while scaling model capacity, active parameters, context length, and dense-to-sparse architecture choices.

Nano / Mini / Medium / High

Nano

350M

A 350M dense tier for fast iteration and end-to-end validation. It uses 18 Mamba-3 blocks plus 6 MLA blocks, all parameters active, and no MoE routing.

Total Parameters350M

Active Parameters350M

Base Context4k

Long Context8k

Mini

1.5B

A compact 1.5B dense model for the first full production scale. It keeps all parameters active, combines Mamba-3 with MLA, and avoids expert routing.

Total Parameters1.5B

Active Parameters1.5B

Base Context8k

Long Context64k

Medium

8.0B / 3.0B active

An 8.0B total parameter tier with 3.0B active parameters. Sparse capacity is limited to feedforward blocks with Top-2 routing; attention and sequence mixers stay shared.

Total Parameters8.0B

Active Parameters3.0B active

Base Context16k

Long Context128k

High

32.1B / 6.3B active

A 32.1B total parameter flagship tier with 6.3B active parameters. It extends the FFN-only sparse MoE pattern with Top-2 routing and the longest context target.

Total Parameters32.1B

Active Parameters6.3B active

Base Context32k

Long Context256k

Team / About Us

Focused independent AI engineering from Germany.

Ivonar was founded and is led by Luis Oezdem. We operate as an independent research and engineering initiative focused on efficient 1-Bit LLM development. Architecture, training infrastructure, data pipeline design, and deployment direction are kept tightly aligned under one technical vision.

The operating model is deliberately execution-oriented: validate the system through Ivonar Nano, then scale carefully into Mini, Medium, and High with clear quality, efficiency, and deployment goals.

ResearchEngineeringDeployment

Architecture & Core Features

The 1-Bit stack is built around real W1.58A8 BitLinear training in PyTorch: ternary-weight STE, INT8-activation STE, Mamba-3 state-space mixers, MLA latent attention, and FFN-only sparse routing where the larger tiers need it. Packed or custom kernels are future inference optimizations, not the current training GEMM.

W1.58A8 BitLinear Training

Ivonar trains real BitLinear models with ternary-weight STE and INT8-activation STE in PyTorch. It is not yet a custom bit-packed CUDA training GEMM; packed/custom kernels are future inference work.

Mamba-3 and MLA Hybrid

Mamba-3 state-space mixers handle efficient long-range sequence processing. MLA latent attention provides retrieval-focused attention, with Nano explicitly using 18 Mamba-3 blocks and 6 MLA blocks.

Dense and Sparse Tiers

Nano and Mini are dense models with no MoE. Medium and High use FFN-only sparse MoE with Top-2 routing, keeping sparse capacity out of attention and sequence-mixing blocks.

Long Context with LongRoPE2

Context targets scale from 4k base and 8k long in Nano up to 32k base and 256k long in High, using separate base and long-context continuation phases.

Efficiency / Target Audience

A powerful European 1-Bit model built for efficient use.

Ivonar aims to become the leading European 1-Bit LLM: real W1.58A8 BitLinear training in PyTorch today, honest dense and sparse tiering, and practical deployment work that can later benefit from packed or custom inference kernels.

Quality

Strong results with compact models

Ternary BitNet layers achieve competitive language, code, and math capability without dense-parameter scaling. The goal is practical quality across daily work, prototypes, and research tasks.

Individuals, builders, research workflows

Efficiency

Minimal memory footprint

Ternary weights, Mamba-3 mixers, and dense-to-sparse scaling keep active compute controlled. Nano and Mini are dense; Medium and High use FFN-only sparse MoE with Top-2 routing.

Practical inference and controlled scaling

Business

Built for everyone.

Ivonar serves both B2C and B2B markets. Our monetization strategy relies on commercial API access for developers, premium subscriptions for end-users, and flexible enterprise licensing for businesses that require strict data privacy and local deployment.

Flexible access for users, builders, and teams

Roadmap

Architecture defined. Nano training is active. Mini, Medium, and High scale from there.

Done

Architecture

Completed

Hybrid 1-Bit architecture with PyTorch W1.58A8 BitLinear training, Mamba-3 mixers, MLA attention, dense Nano/Mini tiers, and FFN-only sparse MoE for Medium/High is defined.

Active

Nano Training

In Training

Training the 350M Nano tier with 4k base and 8k long context for fast iteration, pipeline validation, and end-to-end BitLinear quality checks.

Mini Training

Upcoming

Next dense tier: 1.5B total and active parameters, 8k base context, 64k long context, Mamba-3 + MLA, and no MoE routing.

Planned

Sparse Scale-Up

Upcoming

Medium and High add FFN-only sparse MoE with Top-2 routing: 8.0B total / 3.0B active for Medium, then 32.1B total / 6.3B active for High.