MagicQuant GGUF Hybrids - granite 4.0 h 350m unsloth

MagicQuant is an automated quantization, benchmarking, and evolutionary hybrid-GGUF search system for LLMs.

Each release includes models optimized to outperform standard baseline quants (Q8, Q6, Q5, Q4). If a baseline GGUF exists in this repo, the evolutionary engine couldn’t beat it. If a baseline is missing, it’s because a hybrid configuration outperformed it so completely that including the baseline would've been pointless.

These hybrid GGUFs are built to be as small, fast, and low-drift as possible while preserving model capability.

To dive deeper into how MagicQuant works, see the main repo: MagicQuant on GitHub (by MagicCodingMan)

Notes:

The HuggingFace hardware compatibility where it shows the bits is usually wrong. It doesn't understand hybrid mixes, so don't trust it.
Naming scheme can be found on the MagicQuant Wiki.
(tips) Less precision loss means less brain damage. More TPS means faster! Smaller is always better right?

Precision Loss Guide

0–0.1% → God-tier, scientifically exact
0.1–1% → True near-lossless, agent-ready
1–3% → Minimal loss, great for personal use
3–5% → Borderline, but still functional
5%+ → Toys, not tools, outside MagicQuant’s scope

Learn more about precision loss here.

IMPORTANT NOTE: Due to this model being so small. The test was significantly stricter in what precision loss was allowed.

Table - File Size + TPS + Avg Precision Loss

model_name	file_size_gb	bench_tps	avg_prec_loss
mxfp4_moe-EKUD-B16-O-Q6K-Q-Q8_0	0.54	1705.35	0.0816%
mxfp4_moe-O-Q6K-EQKUD-Q8_0	0.34	1605.97	0.2555%

Table - PPL Columns

model_name	gen	gen_er	code	code_er	math	math_er
mxfp4_moe-EKUD-B16-O-Q6K-Q-Q8_0	18.1560	0.4667	1.9548	0.0175	10.2986	0.2319
mxfp4_moe-O-Q6K-EQKUD-Q8_0	18.2304	0.4691	1.9555	0.0175	10.3074	0.2320

gen = ppl_general, code = ppl_code, math = ppl_math

Table - Precision Loss Columns

model_name	loss_general	loss_code	loss_math
mxfp4_moe-EKUD-B16-O-Q6K-Q-Q8_0	0.1368	0.0051	0.1030
mxfp4_moe-O-Q6K-EQKUD-Q8_0	0.5471	0.0307	0.1886

loss_* values are absolute precision-loss % vs BF16 per domain.

Baseline Models (Reference)

Table - File Size + TPS + Avg Precision Loss

model_name	file_size_gb	bench_tps	avg_prec_loss
BF16	0.64	1718.28	0.0000%
Q8_0	0.34	1598.28	0.3116%
Q6_K	0.26	1513.71	0.5598%
Q5_K	0.24	1305.37	2.8875%
Q4_K_M	0.21	1401.44	12.2733%
IQ4_NL	0.20	1679.00	14.2608%
MXFP4_MOE	0.17	1713.00	8222.4218%

Table - PPL Columns

model_name	gen	gen_er	code	code_er	math	math_er
BF16	18.1312	0.4655	1.9549	0.0175	10.2880	0.2315
Q8_0	18.2363	0.4693	1.9558	0.0175	10.3198	0.2325
Q6_K	18.3753	0.4719	1.9612	0.0175	10.2869	0.2294
Q5_K	18.9974	0.4899	1.9842	0.0180	10.5335	0.2365
Q4_K_M	21.5138	0.5690	2.0633	0.0194	11.5862	0.2686
IQ4_NL	22.4687	0.6035	2.0709	0.0194	11.6178	0.2686
MXFP4_MOE	1172.2706	45.9470	303.0942	7.7666	308.3771	10.9069

gen = ppl_general, code = ppl_code, math = ppl_math

Table - Precision Loss Columns

model_name	loss_general	loss_code	loss_math
BF16	0.0000	0.0000	0.0000
Q8_0	0.5797	0.0460	0.3091
Q6_K	1.3463	0.3223	0.0107
Q5_K	4.7774	1.4988	2.3863
Q4_K_M	18.6562	5.5450	12.6186
IQ4_NL	23.9229	5.9338	12.9257
MXFP4_MOE	6365.4882	15404.3327	2897.4446

loss_* values are absolute precision-loss % vs BF16 per domain.

Support

I’m a solo developer working full time for myself to achieve my dream, pouring nights and weekends into open protocols and tools that I hope make the world a little better. If you chip in, you're helping me keep the lights on while I keep shipping.

Click here to see ways to support - BTC, Paypal, GitHub sponsors.

Or, just drop a like on the repo :)