DeepSeek-V4-Flash means LLM steering is interesting again

seangoedecke.com

148 points by Brajeshwar 5 hours ago


NitpickLawyer - 4 hours ago

I'm surprised the article doesn't mention the biggest use of steering vectors, which is the potential to remove refusals from models (a.k.a. abliteration or uncensoring).

There was an earlier paper that found that "most refusals are on a single vector", and you can identify and "nerf" that vector so the model will skip refusals and answer "any" request normally. This was very doable for earlier models trained with SFT for refusals, seems to be a bit more complicated for newer models, but still doable to some extent.

There are already some libraries to automate this process and reduce refusals, but usually they focus on identifying and then modifying the models and releasing them as uncensored models. This technique of steering lets you enable this vector changing dynamically, so you don't need to change models if the abliteration process somehow hurts accuracy on other unrelated tasks.

antirez - 3 hours ago

Thank you for posting this! Just a clarification, with DwarfStar steering features I was able to completely remove refusal from DS4. It is only the example dataset (prompt pairs I provide) which is a toy, not the abilities. I thought that who is able to come up with the right dataset and understands how to use the well-documented steering feature, can access to steering. People that have no idea and would just cut & paste, I'm not sure, maybe it is a good idea if they also have access to a model without refusals? I the doubt I didn't release publicly the steering file, but I'm highly perplexed.

Btw recently the support was extended and now the steering vector can be applied to the activations at different time: always, only after thinking, only outside of tool calling, ...

Something important that not many folks realize: vector direction steering inside the inference engine itself is very superior to having GGUFs modified in the same way. The more you steer, the more you damage the model capabilities. So applying it at runtime, you apply it the minimun needed for what you want to accomplish. Also you can apply only during selected moments. It is even possible (I still didn't implement it but I like the idea) of applying the steering only when the energy across the refusal direction is over a given threshold. Many things you can play with.

kamranjon - 3 hours ago

The really interesting thing that I think is going on inside of the DS4 repo is exploring all of the interesting knobs that frontier labs have hidden from users, and then thinking about how they can fit into real dev/interaction workflows. It's really cool to see different interaction modalities being explored and thinking about for example how steering can be worked into a user interface in a helpful way. I think that once the cat is out of the bag as they say, and users understand the level of control and utility they can get from models that are sort of turned inside out in this way, it will start to be an integral part of their tool belt, and it'll just make sense for this level of control to be expected from your models or model providers.

wolttam - 4 hours ago

> inspired to write this post by antirez’s recent project DwarfStar 4, which is a version of llama.cpp that’s been stripped down to run only DeepSeek-V4-Flash

This is not true, it is its own project.

Indebted to llama.cpp, sure, but not a stripped down version

bel8 - 3 hours ago

Great article but I'm confused on one thing.

The article claims steering only works in local models, but GitHub Copilot has a "steer with message" feature where I can course correct mid execution. I use it often.

I think these are different kinds of steering right? Agent steering probably inserts another user message between the harnesses own ping-pong between harness and the LLM.

- https://docs.github.com/en/copilot/how-tos/copilot-cli/use-c...

- https://docs.github.com/en/copilot/how-tos/copilot-sdk/use-c...

micahwhite - 2 hours ago

I used steering to make an AI more radical:

Write up: https://www.outcryai.com/research/shift-a-models-political-i...

App: https://apps.apple.com/us/app/outcry-activist-ai/id676208676...

This technique has a lot of potential.

potatoman22 - 2 hours ago

This reminds me of control vectors, especially this line in the linked DwarfStar repo:

> y = y - scale * direction[layer] * dot(direction[layer], y)

From https://vgel.me/posts/representation-engineering/

> A control vector is a vector (technically a list of vectors, one per layer) that you can apply to model activations during inference to control the model's behavior without additional prompting

amelius - 3 hours ago

Sounds more like something for DL research than something you might want to use in practice.

aswegs8 - 3 hours ago

How does the model qualify as local? ~192 GB RAM needed sounds a bit much for local.

anonym29 - 3 hours ago

I know it's only tangentially relevant, but I've been baffled by the interest in DeepSeek V4 Flash. It's larger, less efficient, and in many cases, performs worse on both objective benchmarks and real world sniff test (admittedly, n=1) than Minimax M2.7. DS4F hallucinates at extraordinary rates while M2.7 does not. The 196k context length that M2.7 was natively trained up represents neither a hard technical ceiling (this is metadata that can easily adjusted), nor a meaningful degradation threshold - I've personally ran it up past 330k token context windows where it maintained full coherency, and still completed my one-shot agentic task to my satisfaction.

dominotw - 3 hours ago

> you can already exercise extremely fine-grained control by tweaking the language of your prompt.

maybe i suck at prompting but i find it impossible to overcome its biases from training data, post training ect.

you can only pattern mine from training data using prompts. you dont really have sort of fine-grained control.

- 4 hours ago
[deleted]