Gemma 3 270M re-implemented in pure PyTorch for local tinkering

github.com

207 points by ModelForge 5 hours ago


canyon289 - 4 hours ago

Hey all, I created this model with a top notch team. I answered many questions last week when this hit the front page, and happy to answer more here as well.

https://news.ycombinator.com/item?id=44902148

Personally I'm excited that you all have access to this model now and hope you all get value out of using them.

kace91 - an hour ago

This might be a very basic question, but as a dev whose only interaction with models is using the main commercial ones (sonnet, ChatGPT and the like), what are some usecases for these smaller local models?

What usages can be reasonable to expect from them? Are there uses out of the box or does one have to go through some custom post-training to get useful behavior?

I feel like there is a huge gap between understanding models as a user of commercial tools and the kind of discussions happening in these threads, but I’m not sure what are the in-between steps.

shekhar101 - 2 hours ago

Can someone (or OP) point me to a recipe to fine tune a model like this for natural language tasks like complicated NER or similar workflows? I tried finetuning Gemma3 270M when it came out last week without any success. A lot of tutorials are geared towards chat applications and role playing but I feel this model could be great for usecases like mine where I am trying to extract clean up and extract data from PDFs with entity identification and such.

keeeba - 2 hours ago

What use-cases do you see for the 270M’s embeddings, and should we be sticking to token embeddings or can we meaningfully pool for sentence/document embeddings?

Do we need to fine-tune for the embeddings to be meaningful at the sentence/document level?

lsb - 3 hours ago

That’s wild that with a KV cache and compilation on the Mac CPU you are faster than on an A100 GPU.

eachro - 2 hours ago

If you wanted to train it from scratch, how long would it take on a reasonable GPU setup?

n0vella - 4 hours ago

Do you think these very small models have some utility in the real world? Apart from learning and academic purposes of course.

_giorgio_ - an hour ago

what a legend

vi0g0d - 4 hours ago

[dead]