unslothai (Unsloth Backup Account)

posted an update about 13 hours ago

Post

155

Mistral's new SOTA coding models Devstral 2 can now be Run locally! (25GB RAM) 🐱
We fixed the chat template, so performance should be much better now!
24B: unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF
123B: unsloth/Devstral-2-123B-Instruct-2512-GGUF

🧡Step-by-step Guide: https://docs.unsloth.ai/models/devstral-2

danielhanchen

posted an update 8 days ago

Post

3372

Mistral's new Ministral 3 models can now be Run & Fine-tuned locally! (16GB RAM)
Ministral 3 have vision support and the best-in-class performance for their sizes.
14B Instruct GGUF: unsloth/Ministral-3-14B-Instruct-2512-GGUF
14B Reasoning GGUF: unsloth/Ministral-3-14B-Reasoning-2512-GGUF

🐱 Step-by-step Guide: https://docs.unsloth.ai/new/ministral-3
All GGUFs, BnB, FP8 etc. variants uploads: https://huggingface.co/collections/unsloth/ministral-3

3 replies

·

danielhanchen

posted an update 13 days ago

Post

8294

Qwen3-Next can now be Run locally! (30GB RAM)
Instruct GGUF: unsloth/Qwen3-Next-80B-A3B-Instruct-GGUF

The models come in Thinking and Instruct versions and utilize a new architecture, allowing it to have ~10x faster inference than Qwen32B.
💜 Step-by-step Guide: https://docs.unsloth.ai/models/qwen3-next

Thinking GGUF: unsloth/Qwen3-Next-80B-A3B-Thinking-GGUF

danielhanchen

posted an update about 1 month ago

Post

4236

You can now run Kimi K2 Thinking locally with our Dynamic 1-bit GGUFs: unsloth/Kimi-K2-Thinking-GGUF

We shrank the 1T model to 245GB (-62%) & retained ~85% of accuracy on Aider Polyglot. Run on >247GB RAM for fast inference.

We also collaborated with the Moonshot AI Kimi team on a system prompt fix! 🥰

Guide + fix details: https://docs.unsloth.ai/models/kimi-k2-thinking-how-to-run-locally

danielhanchen

posted an update 4 months ago

Post

6491

Run DeepSeek-V3.1 locally on 170GB RAM with Dynamic 1-bit GGUFs!🐋
GGUFs: unsloth/DeepSeek-V3.1-GGUF

The 715GB model gets reduced to 170GB (-80% size) by smartly quantizing layers.

The 1-bit GGUF passes all our code tests & we fixed the chat template for llama.cpp supported backends.

Guide: https://docs.unsloth.ai/basics/deepseek-v3.1

danielhanchen

posted an update 4 months ago

Post

5555

Run OpenAI's new gpt-oss models locally with Unsloth GGUFs! 🔥🦥
20b GGUF: unsloth/gpt-oss-20b-GGUF
120b GGUF: unsloth/gpt-oss-120b-GGUF

Model will run on 14GB RAM for 20b and 66GB for 120b.

2 replies

·

danielhanchen

posted an update 5 months ago

Post

3679

It's Qwen3 week! 💜 We uploaded Dynamic 2-bit GGUFs for:

Qwen3-Coder: unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF
Qwen3-2507: unsloth/Qwen3-235B-A22B-Instruct-2507-GGUF

So you can run them both locally!
Guides are in model cards.

1 reply

·

danielhanchen

posted an update 5 months ago

Post

3902

Made some 245GB (80% size reduction) 1.8bit quants for Kimi K2!

unsloth/Kimi-K2-Instruct-GGUF

danielhanchen

posted an update 5 months ago

Post

3984

We fixed more issues! Use --jinja for all!
* Fixed Nanonets OCR-s unsloth/Nanonets-OCR-s-GGUF
* Fixed THUDM GLM-4 unsloth/GLM-4-32B-0414-GGUF
* DeepSeek Chimera v2 is uploading! unsloth/DeepSeek-TNG-R1T2-Chimera-GGUF

danielhanchen

posted an update 5 months ago

Post

3190

Gemma 3n finetuning is now 1.5x faster and uses 50% less VRAM in Unsloth!

Click "Use this model" and click "Google Colab"!

unsloth/gemma-3n-E4B-it

unsloth/gemma-3n-E2B-it

https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3N_(4B)-Conversational.ipynb

2 replies

·

danielhanchen

posted an update 6 months ago

Post

1325

We updated lots of our GGUFs and uploaded many new ones!
* unsloth/dots.llm1.inst-GGUF
* unsloth/Jan-nano-GGUF
* unsloth/Nanonets-OCR-s-GGUF
* Updated and fixed Q8_0 upload for unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF
* Added Q2_K_XL for unsloth/DeepSeek-R1-0528-GGUF
* Updated and fixed Vision support for unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF

danielhanchen

posted an update 6 months ago

Post

2531

Mistral releases Magistral, their new reasoning models! 🔥
GGUFs to run: unsloth/Magistral-Small-2506-GGUF

Magistral-Small-2506 excels at mathematics and coding.

You can run the 24B model locally with just 32GB RAM by using our Dynamic GGUFs.

danielhanchen

posted an update 6 months ago

Post

3860

New DeepSeek-R1-0528 1.65-bit Dynamic GGUF!

Run the model locally even easier! Will fit on a 192GB Macbook and run at 7 tokens/s.

DeepSeek-R1-0528 GGUFs: unsloth/DeepSeek-R1-0528-GGUF
Qwen3-8B DeepSeek-R1-0528 GGUFs: unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF

And read our Guide: https://docs.unsloth.ai/basics/deepseek-r1-0528

danielhanchen

posted an update 8 months ago

Post

2333

💜 Qwen3 128K Context Length: We've released Dynamic 2.0 GGUFs + 4-bit safetensors!
Fixed: Now works on any inference engine and fixed issues with the chat template.
Qwen3 GGUFs:
30B-A3B: unsloth/Qwen3-30B-A3B-GGUF
235-A22B: unsloth/Qwen3-235B-A22B-GGUF
32B: unsloth/Qwen3-32B-GGUF

Read our guide on running Qwen3 here: https://docs.unsloth.ai/basics/qwen3-how-to-run-and-finetune

128K Context Length:
30B-A3B: unsloth/Qwen3-30B-A3B-128K-GGUF
235-A22B: unsloth/Qwen3-235B-A22B-128K-GGUF
32B: unsloth/Qwen3-32B-128K-GGUF

All Qwen3 uploads: unsloth/qwen3-680edabfb790c8c34a242f95

danielhanchen

posted an update 8 months ago

Post

6260

🦥 Introducing Unsloth Dynamic v2.0 GGUFs!
Our v2.0 quants set new benchmarks on 5-shot MMLU and KL Divergence, meaning you can now run & fine-tune quantized LLMs while preserving as much accuracy as possible.

Llama 4: unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF
DeepSeek-R1: unsloth/DeepSeek-R1-GGUF-UD
Gemma 3: unsloth/gemma-3-27b-it-GGUF

We made selective layer quantization much smarter. Instead of modifying only a subset of layers, we now dynamically quantize all layers so every layer has a different bit. Now, our dynamic method can be applied to all LLM architectures, not just MoE's.

Blog with Details: https://docs.unsloth.ai/basics/dynamic-v2.0

All our future GGUF uploads will leverage Dynamic 2.0 and our hand curated 300K–1.5M token calibration dataset to improve conversational chat performance.

For accurate benchmarking, we built an evaluation framework to match the reported 5-shot MMLU scores of Llama 4 and Gemma 3. This allowed apples-to-apples comparisons between full-precision vs. Dynamic v2.0, QAT and standard iMatrix quants.

Dynamic v2.0 aims to minimize the performance gap between full-precision models and their quantized counterparts.

danielhanchen

posted an update 8 months ago

Post

5235

You can now run Llama 4 on your own local device! 🦙
Run our Dynamic 1.78-bit and 2.71-bit Llama 4 GGUFs:
unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF

You can run them on llama.cpp and other inference engines. See our guide here: https://docs.unsloth.ai/basics/tutorial-how-to-run-and-fine-tune-llama-4

1 reply

·

danielhanchen

posted an update 9 months ago

Post

3676

You can now run DeepSeek-V3-0324 on your own local device!
Run our Dynamic 2.42 and 2.71-bit DeepSeek GGUFs: unsloth/DeepSeek-V3-0324-GGUF

You can run them on llama.cpp and other inference engines. See our guide here: https://docs.unsloth.ai/basics/tutorial-how-to-run-deepseek-v3-0324-locally

danielhanchen

updated a model 10 months ago

unslothai/Phi-4-mini-instruct

4B • Updated Feb 28 • 6

danielhanchen

published a model 10 months ago

unslothai/Phi-4-mini-instruct

4B • Updated Feb 28 • 6

danielhanchen

updated a model 10 months ago

unslothai/Llama-3.2-1B-Instruct

1B • Updated Feb 25 • 5 • 1

AI & ML interests

Team members 2

unslothai's activity