Show HN: Han – A Korean programming language written in Rust
github.com33 points by xodn348 an hour ago
33 points by xodn348 an hour ago
A few weeks ago I saw a post about someone converting an entire C++ codebase to Rust using AI in under two weeks.
That inspired me — if AI can rewrite a whole language stack that fast, I wanted to try building a programming language from scratch with AI assistance.
I've also been noticing growing global interest in Korean language and culture, and I wondered: what would a programming language look like if every keyword was in Hangul (the Korean writing system)?
Han is the result. It's a statically-typed language written in Rust with a full compiler pipeline (lexer → parser → AST → interpreter + LLVM IR codegen).
It supports arrays, structs with impl blocks, closures, pattern matching, try/catch, file I/O, module imports, a REPL, and a basic LSP server.
This is a side project, not a "you should use this instead of Python" pitch. Feedback on language design, compiler architecture, or the Korean keyword choices is very welcome.
https://github.com/xodn348/han
I don't know Korean at all, but this looks cool and a fun project. I'm curious if this reduces typing or has any benefits being in Hangul vs Latin? Thanks! One thing that motivated me was curiosity about prompt efficiency in the AI era. Hangul is beautifully dense — a single syllable block packs initial consonant + vowel + final consonant into one character. I wondered if Korean-keyword code might produce shorter prompts for LLMs. I actually tested this with GPT-4o's tokenizer, and the result was the opposite — Korean keywords average 2-3 tokens vs 1 for English. A fibonacci program in Han takes 88 tokens vs 54 in Python. The reason comes down to how LLM tokenizers work. They use BPE (Byte Pair Encoding), which starts with raw bytes and repeatedly merges the most frequent pairs into single tokens. Since training data is predominantly English, words like `function` and `return` appear billions of times and get merged into single tokens. Korean text appears far less frequently, so the tokenizer doesn't learn to merge Hangul syllables — it falls back to splitting each character into 2-3 byte-level tokens instead. It's a tokenizer training bias, not a property of Hangul itself. If a tokenizer were trained on a Korean-heavy corpus, `함수` could absolutely become a single token too. So no efficiency benefit today. But it was a fun exploration, and Korean speakers can read the code like natural language. It could also be a fun way for people learning Korean to practice reading Hangul in a different context — every keyword is a real Korean word with meaning.
raaspazasu - 13 minutes ago
xodn348 - 5 minutes ago