Pretty bad results compared to Devstral 1 (?)

by bonswouar - opened 1 day ago

1 day ago

•

From my non-representative early testing with Q6_K_XL, it's much worse than Devstral Small "1" with the same quant.

I've been having lots of hallucinations, using RooCode with llama.cpp (vulkan), it's creating useless files, useless tasks, not totally following instructions, etc.
I'm having much better results with the original Devstral Small (UD_Q6_K_XL).

Not sure if this is due to unsloth quantization, llama.cpp, chat template, or if it's also a problem with the original model?
I've been running it with the recommended params, and ctx/cache adapted to my hardware (same as Devstral 1):

    --flash-attn 1
    --jinja
    --temp 0.15
    --min-p 0.01
    --ctx-size 100000
    --cache-type-k q4_0 --cache-type-v q4_0

I'm curious to hear about other Devstral Small 1 users feedback, please don't hesitate to share

bonswouar changed discussion title from Pretty bad results to Pretty bad results compared to Devstral 1 (?) 1 day ago

danielhanchen

Unsloth AI org 1 day ago

•

edited 1 day ago

Can you also remove the cache type

Support for the model in llama.cpp is still being worked on and experimental and the chat template still needs work

bonswouar

1 day ago

•

edited 1 day ago

Unfortunately right now I'm testing it on real tasks, that need at least 60k tokens.
So it won't fit in my 32GB of ram with F16 kv_cache.

But I'd be curious if someone else wants to give it a try!
I hope there isn't some regression, with previous Devstral there was really little to no impact using Q4 kv_cache from my testing

EDIT: I just gave a shot to 36K not-quant ctx (the maximum I can fit), but as I feared it's not enough for my current task (and I don't have any "small testing task" for coding agent).
I see there's been a recent commit for converting to gguf though, don't know if unsloth uses it? https://github.com/ggml-org/llama.cpp/pull/17889/files

bonswouar

1 day ago

So FYI I've been able to test with Q8_0 cache, and it seems as bad as Q4_0.

I won't be able to go to native cache with my current tests though

danielhanchen

Unsloth AI org 1 day ago

@bonswouar Can you try the new Q4_K_XL - we added a system prompt which the official upload did not

fgk55555

1 day ago

Tested with IQ4_XS quant from Unsloth and Bartowski. I don't think it's issues with the quant, I think it's the model. It's just bad. I couldn't make it past my first lightly modified Snake game benchmark.

vico44

1 day ago

Watch out the top-p and top-k parameters maybe ? (not specified in unsloth guide) but they are respectively 0.9 and 40 by default in llama.cpp
Devstral version 1 was suggesting top-p 0.95 and top-k 64

bonswouar

1 day ago

I confirm it seems better with the updated versions!

As far as I can see from my quick testing, Q4_K_XL still isn't so great as a coding agent, but Q6_K_XL seems pretty good! (as it's usually the case for most models of this size)

There still might be some weird behavior that I didn't have with Devstral 1, but yeah maybe using top-p and top-k recommended values for Devstral 1 would help. And even maybe tweaking dry-multiplier / dry-penalty-last-n

Anyway I'll need to do more testing, but that's quite promising!

shimmyshimmer

Unsloth AI org 1 day ago

I confirm it seems better with the updated versions!

As far as I can see from my quick testing, Q4_K_XL still isn't so great as a coding agent, but Q6_K_XL seems pretty good! (as it's usually the case for most models of this size)

There still might be some weird behavior that I didn't have with Devstral 1, but yeah maybe using top-p and top-k recommended values for Devstral 1 would help. And even maybe tweaking dry-multiplier / dry-penalty-last-n

Anyway I'll need to do more testing, but that's quite promising!

Amazing to hear thanks for testing! 🙏

lazyDataScientist

1 day ago

I am also getting poor performance with with the Q6 version.

bonswouar

about 23 hours ago

So it seems, even if it's better than before, I still often have errors with tool calling (i.e quite often broken diff).
At some point it even responded with broken formatting (i.e messages containing </user_message> </read_file> <notice>)

It's still not as good as Devstral 1 (same quant / same agent framework), which probably mean there are still some issues

danielhanchen

Unsloth AI org about 17 hours ago

@bonswouar So the new update we did helped somewhat, but there are still issues? :(Hm

bonswouar

about 14 hours ago

•

edited about 9 hours ago

@danielhanchen According to my non-representative testing, yeah.
Maybe it's really a problem with quantization and/or cache quantization though

Also there seems to still be issues related to tool use with llama.cpp : https://github.com/ggml-org/llama.cpp/issues/17928

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment