RFC 454545 – Human Em Dash Standard
gist.github.com119 points by jdauriemma 19 hours ago
119 points by jdauriemma 19 hours ago
> Historically, the em dash (—) has served as a flexible punctuation mark used by human authors to indicate interruption, emphasis, or sudden changes in thought.
I learned about the em dash in high school and adapted it to my writing style very quickly for analysis and opinion documents. It felt natural given the amount of tangents I can go off into, particularly when including analogies for the reader’s understanding.
I was surprised to find out in my career that it was rarely used by others. Subconsciously I pulled back on how often I used it — especially when it was once suggested that frequent use could imply neurodivergence. Important and lengthy documents which I’d written and published (internally) at work still display them. On occasion there have been comments asking if I’d somehow accessed early AI models to assist in writing these works because of their presence. I think I averaged two em dashes per letter page.
I find myself on the fence with proposals like these. They have good intentions but they do not solve an issue at its core. An LLM is going to reflect one of many writing styles. If today it’s frequent em dash usage, tomorrow it could be frequent parentheses. Swapping Unicode characters becomes a cat-and-mouse game with the cat always two steps behind. The real issue is that the social contract is broken because LLM output is attempted to be passed off as human work. Review and revise that social contract instead to adapt to the existence of the new tools.
> I learned about the em dash in high school and adapted it to my writing style very quickly for analysis and opinion documents. It felt natural given the amount of tangents I can go off into, particularly when including analogies for the reader’s understanding.
Isn't this what parenthesizes are meant for? Together with footnotes, I've always used them like that, but I guess it could also be just a cultural difference. My teachers in Swedish school always told me to put thoughts like that into parenthesizes, but I also just (barely) finished high school, could be related too.
> I find myself on the fence with proposals like these. They have good intentions but they do not solve an issue at its core.
I don't understand what the issue even is here, and the RFC also doesn't clearly outline it. Is "created ambiguity for human writers who have historically relied upon the em dash as a stylistic device" the problem here?
Trying to solve it by adding just another character and slap the label "Human Attestation Mark (HAM)" on it will just make LLMs eventually use those instead... Not sure what the point is to be honest.
Punctuation in written English can be used in many ways. It's a very flexible language.
It is perfectly OK (it really is) to use parentheses -- and emdashes alike -- where they're useful; other punctuation like the semicolon, the comma, and even the Oxford comma are also OK.
There's not much that is disallowed in English. Most people have no reason to adhere to any particularly-rote style guide.
> Isn't this what parenthesizes are meant for?
Parentheses add emphasis to a sentence or statement. Normally the use of it allows the sentence to be complete with or without it.
Em dashes may also add or increase emphasis but are normally treated as an aside. Think of it as a comment by the author to inject themselves, sometimes in ways which do not form a complete sentence.
For example: When you read this sentence (in your mind) it should feel complete and correct. Perhaps you read in your own voice — something I don’t normally do — or without one at all.
> I don't understand what the issue even is here, and the RFC also doesn't clearly outline it.
The issue is written there but may not make sense unless you know someone who stylistically writes with high-than-average em dash usage. I, for example, get inquiries and comments at work from employees who ask what LLM model I used for “generating these reports” because of the presence of em dashes. They do not believe me when I say not a single word was written by LLMs because, “there’s an em dash. Only LLMs use em dashes!” This is categorically untrue and erodes the authenticity of work from people because of the correlation.
Their aim is to implement a new Unicode character which programs like text editors could inject when a person types an em dash. It attributes to a human being behind the document, typing characters out individually. Actions like copy-pasting text in bulk wouldn’t replace em dashes since it can’t attribute a human as writing it out.
> Em dashes may also add or increase emphasis but are normally treated as an aside. Think of it as a comment by the author to inject themselves, sometimes in ways which do not form a complete sentence.
A semicolon is better for this purpose. Good writing doesn't have mad tangents anyway, there should be a flow and natural transition.
> Good writing doesn't have mad tangents anyway, there should be a flow and natural transition.
In general, yes. Technical documents, research reports, news articles, and other formal publications should follow this.
Anything else which allows a bit more freedom in expression? I’d say it’s a matter of taste.
I had freewritten, generally free expression type documents in mind when I wrote my statement, e.g. blog articles or opinion pieces. The problem is 'a matter of taste' can be used to excuse/justify anything.
That's more of a feature than it is a problem.
Agree to disagree. It allows badly written stuff to be defended, I would argue more often than alternative more acceptable case scenarios.
Outside of settings requiring formalized style, people are free to write and to speak however they wish.
Others are free to dislike this.