Python's splitlines does more than just newlines

yossarian.net

113 points by Bogdanp 4 days ago


dleeftink - 4 days ago

For more controlled splitting, I really like Unicode named characters classes[0] for more precise splitting and matching tasks.

[0]: https://en.wikipedia.org/wiki/Unicode_character_property#Gen...

mixmastamyk - 4 days ago

Splitlines is generally not needed. for line in file: is more idiomatic.

cuckoos-jicamas - 4 days ago

str.split() function does the same:

>>> s = "line1\nline2\rline3\r\nline4\vline5\x1dhello"

>>> s.split() ['line1', 'line2', 'line3', 'line4', 'line5', 'hello']

>>> s.splitlines() ['line1', 'line2', 'line3', 'line4', 'line5', 'hello']

But split() has sep argument to define delimiter according which to split the string.. In which case it provides what you expected to happen:

>>> s.split('\n') ['line1', 'line2\rline3\r', 'line4\x0bline5\x1dhello']

In general you want this:

>>> linesep_splitter = re.compile(r'\n|\r\n?')

>>> linesep_splitter.split(s) ['line1', 'line2', 'line3', 'line4\x0bline5\x1dhello']

RainyDayTmrw - 2 days ago

Is there a parser ambiguity/confusion vector here?

meken - 4 days ago

TIL: Python has a splitlines function

zb3 - 3 days ago

Useful to know for security purposes, surprises like that might cause vulnerabilities..

wvbdmp - 4 days ago

What, no <br\s*\/?>?

- 3 days ago
[deleted]
zzzeek - 4 days ago

in the same theme, NTLAIL strip(), rstrip(), lstrip() can strip other kinds of characters besides whitespace.

7bit - 4 days ago

This article provides no additional value to the splitlines() docs.