Phi 4 available on Ollama

ollama.com

271 points by eadz 4 days ago


sgk284 - 19 hours ago

Over the holidays, we published a post[1] on using high-precision few-shot examples to get `gpt-4o-mini` to perform similar to `gpt-4o`. I just re-ran that same experiment, but swapped out `gpt-4o-mini` with `phi-4`.

`phi-4` really blew me away in terms of learning from few-shots. It measured as being 97% consistent with `gpt-4o` when using high-precision few-shots! Without the few-shots, it was only 37%. That's a huge improvement!

By contrast, with few-shots it performs as well as `gpt-4o-mini` (though `gpt-4o-mini`'s baseline without few-shots was 59% – quite a bit higher than `phi-4`'s).

[1] https://bits.logic.inc/p/getting-gpt-4o-mini-to-perform-like

t0lo - 13 hours ago

Is anyone blown away by how fast we got to running something this powerful locally? I know it's easy to get burnt out on llms but this is pretty incredible.

I genuinely think we're only 2 years away from full custom local voice to voice llm assistants that grow with you like JOI in BR2049 and it's going to change how we think about being human and being social, and how we grow up.

crorella - 20 hours ago

It’s odd that MS is releasing models they are competitors to OA. This reinforce the idea that there is no real strategic advantage in owning a model. I think the strategy is now offer cheap and performant infra to run the models.

raybb - 16 hours ago

I was going to ask if this or other Ollama models support structured output (like JSON).

Then a quick search revealed you can as of a free weeks ago

https://ollama.com/blog/structured-outputs

mythz - 20 hours ago

Was disappointed in all the Phi models before this, whose benchmark results scored way better than it worked in practice, but I've been really impressed with how good Phi-4 is at just 14B. We've run it against the top 1000 most popular StackOverflow questions and it came up 3rd beating out GPT-4 and Sonnet 3.5 in our benchmarks, only behind DeepSeek v3 and WizardLM 8x22B [1]. We're using Mixtral 8x7B to grade the quality of the answers which could explain how WizardLM (based on Mixtral 8x22B) took 2nd Place.

Unfortunately I'm only getting 6 tok/s on NVidia A4000 so it's still not great for real-time queries, but luckily now that it's MIT licensed it's available on OpenRouter [2] for a great price of $0.07/$0.14M at a fast 78 tok/s.

Because it yields better results and we're able to self-host Phi-4 for free, we've replaced Mistral NeMo with it in our default models for answering new questions [3].

[1] https://pvq.app/leaderboard

[2] https://openrouter.ai/microsoft/phi-4

[3] https://pvq.app/questions/ask

hbcondo714 - 20 hours ago

FWIW, Phi-4 was converted to Ollama by the community last month:

https://ollama.com/vanilj/Phi-4

mettamage - 3 hours ago

How come models can be so small now? I don't know a lot about AI, but is there an ELI5 for a software engineer that knows a bit about AI?

For context: I've made some simple neural nets with backprop. I read [1].

[1] http://neuralnetworksanddeeplearning.com/

andhuman - 4 days ago

I’ve seen on the localllama subreddit that some GGUFs have bugs in them. The one recommended was by unsloth. However, I don’t know how the Ollama GGUF holds up.

gnabgib - 4 days ago

Related Phi-4: Microsoft's Newest Small Language Model Specializing in Complex Reasoning (439 points, 24 days ago, 144 comments) https://news.ycombinator.com/item?id=42405323

Also on hugging face https://huggingface.co/microsoft/phi-4

summarity - 11 hours ago

Does it include the unsloth fixes yet?

kuatroka - 5 hours ago

I’ve pulled and ran it. It launches fine, but when I actually ask it anything I constantly get just a blank line. Does anyone else experience this?

k__ - 10 hours ago

"built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets"

Does this mean the model was trained without copyright infringements?

ionwake - 7 hours ago

Can this run on a macbook m1? What is the performance like? Or would I need an m3? Thanks

dartos - 9 hours ago

Does this include some of the config fixes that the sloth guys pointed out?

buyucu - 11 hours ago

I have unfortunately been disappointed with the llama.cpp/ollama ecosystem of late, and thinking about moving my things to vllm instead.

llama.cpp basically dropped support for multimodal visual models. ollama still does support them, but only a handful. Also ollama still does not support vulkan eventhough llama.cpp had vulkan support for a long long time now.

This has been very sad to watch. I'm more and more convinced that vllm is the way to go, not ollama.

behnamoh - 20 hours ago

[flagged]

v3ss0n - 16 hours ago

Translation, Phi-4 available on llmacpp