A simple search engine from scratch

bernsteinbear.com

289 points by bertman 4 days ago


franczesko - 4 days ago

On the topic of search engines, I really liked classes by David Evans. The task was also building a simple search engine from scratch. It's really for beginners, as the emphasis is on coding in general, but I've found it to be very approachable.

https://www.cs.virginia.edu/~evans/courses/

ktallett - 4 days ago

I always wonder if the days of search engines for specific topics could return. With LLM's providing less than accurate results in some areas, and Google, bing, etc being taken over by adverts or well organised SEO, there feels like a place for accurate, specialised search.

snowstormsun - 4 days ago

Nice idea, but this approach does not handle out of vocabulary words well which is one major motivation for using a vector-based search. It might not perform significantly better compared to lexical matching like tf-idf or BM25, and being slower because of linear complexity. But cool regardless.

leumassuehtam - 4 days ago

The author has a nice series on compiling a Lisp [0], but unfortunately his search engine fails to find it by querying it with "lisp" or "Lisp".

[0] https://bernsteinbear.com/blog/compiling-a-lisp-0/

sp0rk - 4 days ago

The SVG equation is very difficult to read if you're using a dark OS theme because the blog uses the OS preference for dark/light theme (and doesn't seem to give an option to change it manually, either.)

kaycebasques - 4 days ago

> The idea behind the search engine is to embed each of my posts into this domain by adding up the embeddings for the words in the post.

Ah, OK! I never really grokked how to use word-level embeddings. Makes more sense now.

cosmicgadget - 4 days ago

This was a really nice read. Now I have no excuse not to upgrade my blog search. I do feel that I'll have a ton of long tail words like 'prank'.

vojtechrichter - 4 days ago

I really like people playing around with technology many take for granted, without understanding its core, underlying princliples

swyx - 4 days ago

this embeds words with word2vec, which is like 10 years old. at least use BERT or sentencetransformers :)

curtisszmania - 4 days ago

[dead]

potato-peeler - 4 days ago

[dead]