Getting a Gemini API key is an exercise in frustration

ankursethi.com

845 points by speckx 4 months ago


Ozzie_osman - 4 months ago

I was recently (vibe)-coding some games with my kid, and we wanted some basic text-to-speech functionality. We tested Google's Gemini models in-browser, and they worked great, so we figured we'd add them to the app. Some fun learnings:

1. You can access those models via three APIs: the Gemini API (which it turns out is only for prototyping and returned errors 30% of the time), the Vertex API (much more stable but lacking in some functionality), and the TTS API (which performed very poorly despite offering the same models). They also have separate keys (at least, Gemini vs Vertex).

2. Each of those APIs supports different parameters (things like language, whether you can pass a style prompt separate from the words you want spoken, etc). None of them offered the full combination we wanted.

3. To learn this, you have to spend a couple hours reading API docs, or alternatively, just have Claude Code read the docs then try all different combinations and figure out what works and what doesn't (with the added risk that it might hallucinate something).

dannyobrien - 4 months ago

The odd thing about all of this (well, I guess it's not odd, just ironic), is that when Google AdWords started, one of the notable things about it was that anyone could start serving or buying ads. You just needed a credit-card. I think that bought Google a lot of credibility (along with the ads being text-only) as they entered an already disreputable space: ordinary users and small businesses felt they were getting the same treatment as more faceless, distant big businesses.

I have a friend that says Google's decline came when they bought DoubleClick in 2008 and suffered a reverse-takeover: their customers shifted from being Internet users and became other, matchingly-sized corporations.