Qwen3.5 Fine-Tuning Guide – Unsloth Documentation

unsloth.ai

216 points by bilsbie 10 hours ago


krasikra - 5 hours ago

Fine-tuned Qwen models run surprisingly well on NVIDIA Jetson hardware. We've deployed several 7B variants for edge AI tasks where latency matters more than raw accuracy – think industrial inspection, retail analytics where you can't rely on cloud connectivity. The key is LoRA fine-tuning keeps the model small enough to fit in unified memory while still hitting production-grade inference speeds. Biggest surprise was power efficiency; a Jetson Orin can run continuous inference at under 15W while a cloud round-trip burns way more energy at scale.

clueless - 8 hours ago

What are some sample real world cases folks are using to fine tune their own small/medium models?

syntaxing - 7 hours ago

Awesome guide, shame how a couple of the Qwen leads got kicked out and replaced with more “business” minded leadership. Hopefully this doesn’t mean the end of the open source era from Qwen.

bugglebeetle - 20 minutes ago

Unfortunately, this looks to only cover the larger MoE models. I imagine the smaller models are what most people would target. 9B just dropped two days ago, so not surprised it’s not explicitly documented, but does use a hybrid mamba architecture that I expect needs some special consideration.

aliljet - 5 hours ago

Does fine tuning really improve anything above just pure RAG approaches for usee cases that involve tons of direct document context?

antirez - 7 hours ago

Fine tuning is a story that is nice to tell but that with modern LLMs makes less and less sense. Modern LLMs are so powerful that they are able to few shot learn complicated things, so a strong prompt and augmenting the generation (given the massive context window of Qwen3.5, too) is usually the best option available. There are models for which fine tuning is great, like image models: there with LoRa you can get good results in many ways. And LLMs of the past, too: it made sense for certain use cases. But now, why? LLMs are already released after seeing (after pre-training) massive amount of datasets for SFT and then RL. Removing the censorship is much more efficiently done with other techniques. So I have a strong feeling that fine tuning will be every day less relevant, and already is quite irrelevant. This, again, in the specific case of LLMs. For other foundational models fine tuning still makes sense and is useful (images, text to speech, ...).