Nano
350MA 350M dense tier for fast iteration and end-to-end validation. It uses 18 Mamba-3 blocks plus 6 MLA blocks, all parameters active, and no MoE routing.
Ivonar is a native 1-Bit LLM family training real W1.58A8 BitLinear models in PyTorch with ternary-weight STE, INT8-activation STE, Mamba-3 mixers, MLA attention, and dense-to-sparse scaling across four tiers.
Ivonar is a native 1-Bit LLM family developed independently in Germany. It spans four training scales: Nano, Mini, Medium, and High. Each tier shares the same technical direction while scaling model capacity, active parameters, context length, and dense-to-sparse architecture choices.
Nano / Mini / Medium / High
A 350M dense tier for fast iteration and end-to-end validation. It uses 18 Mamba-3 blocks plus 6 MLA blocks, all parameters active, and no MoE routing.
A compact 1.5B dense model for the first full production scale. It keeps all parameters active, combines Mamba-3 with MLA, and avoids expert routing.
An 8.0B total parameter tier with 3.0B active parameters. Sparse capacity is limited to feedforward blocks with Top-2 routing; attention and sequence mixers stay shared.
A 32.1B total parameter flagship tier with 6.3B active parameters. It extends the FFN-only sparse MoE pattern with Top-2 routing and the longest context target.
Team / About Us
Ivonar was founded and is led by Luis Oezdem. We operate as an independent research and engineering initiative focused on efficient 1-Bit LLM development. Architecture, training infrastructure, data pipeline design, and deployment direction are kept tightly aligned under one technical vision.
The operating model is deliberately execution-oriented: validate the system through Ivonar Nano, then scale carefully into Mini, Medium, and High with clear quality, efficiency, and deployment goals.
The 1-Bit stack is built around real W1.58A8 BitLinear training in PyTorch: ternary-weight STE, INT8-activation STE, Mamba-3 state-space mixers, MLA latent attention, and FFN-only sparse routing where the larger tiers need it. Packed or custom kernels are future inference optimizations, not the current training GEMM.
Ivonar trains real BitLinear models with ternary-weight STE and INT8-activation STE in PyTorch. It is not yet a custom bit-packed CUDA training GEMM; packed/custom kernels are future inference work.
Mamba-3 state-space mixers handle efficient long-range sequence processing. MLA latent attention provides retrieval-focused attention, with Nano explicitly using 18 Mamba-3 blocks and 6 MLA blocks.
Nano and Mini are dense models with no MoE. Medium and High use FFN-only sparse MoE with Top-2 routing, keeping sparse capacity out of attention and sequence-mixing blocks.
Context targets scale from 4k base and 8k long in Nano up to 32k base and 256k long in High, using separate base and long-context continuation phases.
Efficiency / Target Audience
Ivonar aims to become the leading European 1-Bit LLM: real W1.58A8 BitLinear training in PyTorch today, honest dense and sparse tiering, and practical deployment work that can later benefit from packed or custom inference kernels.
Ternary BitNet layers achieve competitive language, code, and math capability without dense-parameter scaling. The goal is practical quality across daily work, prototypes, and research tasks.
Individuals, builders, research workflowsTernary weights, Mamba-3 mixers, and dense-to-sparse scaling keep active compute controlled. Nano and Mini are dense; Medium and High use FFN-only sparse MoE with Top-2 routing.
Practical inference and controlled scalingIvonar serves both B2C and B2B markets. Our monetization strategy relies on commercial API access for developers, premium subscriptions for end-users, and flexible enterprise licensing for businesses that require strict data privacy and local deployment.
Flexible access for users, builders, and teamsArchitecture defined. Nano training is active. Mini, Medium, and High scale from there.
Completed
Hybrid 1-Bit architecture with PyTorch W1.58A8 BitLinear training, Mamba-3 mixers, MLA attention, dense Nano/Mini tiers, and FFN-only sparse MoE for Medium/High is defined.
In Training
Training the 350M Nano tier with 4k base and 8k long context for fast iteration, pipeline validation, and end-to-end BitLinear quality checks.
Upcoming
Next dense tier: 1.5B total and active parameters, 8k base context, 64k long context, Mamba-3 + MLA, and no MoE routing.
Upcoming
Medium and High add FFN-only sparse MoE with Top-2 routing: 8.0B total / 3.0B active for Medium, then 32.1B total / 6.3B active for High.