IPv6 zones in URLs are a mistake
xeiaso.net141 points by xena a day ago
141 points by xena a day ago
It gets worse than that.
The Python `ipaddress` library has an `ip_address` address that returns either an IPv4Address or IPv6Address if the passed string is a valid IPv4 or IPv6 address, or throws a ValueError if the address is invalid.
I've seen code that uses that function to determine if a user-supplied string is a valid IP before passing it to a command line. At first glance, that seems fine, but some shell metacharacters are valid in the IPv6 zone ID.
`fe80::1%a;whoami>${PATH:0:1}tmp${PATH:0:1}pwned` is a valid IPv6 IP, and if you did `ping fe80::1%a;whoami>${PATH:0:1}tmp${PATH:0:1}pwned`, you'd have the output of `whoami` written to /tmp/pwned.
Obviously, people shouldn't writing code that puts user input into a shell call without the proper method of execution (ie, shell=False when using subprocess.Popen), but people often think "I validated it, it's fine" and then get popped because their validation wasn't as good as they thought it was.
EDIT: In case it isn't clear, `${PATH:0:1}` is necessary in the attack payload because a `/` is invalid in a zone ID. `${PATH:0:1}` is a tricky way to get a `/` character by just grabbing the first character of your PATH environment variable.
> `fe80::1%a;whoami>${PATH:0:1}tmp${PATH:0:1}pwned` is a valid IPv6 IP, and if you did `ping fe80::1%a;whoami>${PATH:0:1}tmp${PATH:0:1}pwned`, you'd have the output of `whoami` written to /tmp/pwned.
Is this really a Python problem? `subprocess.run` for example defaults to `shell=False` so you have to set `shell=True`, and on top of that be building up argv?
The "default" API for `subprocess.run` has you doing `subprocess.run(["ping", ip])` which... I think just entirely avoids this problem?
There's def a general sort of "oh people will just copy/paste stuff into a shell" or the whole shell script arg escaping mess. Just feels like Python is not really doing anything bad here.
do note that even if you don't do shell expansion you're still subject to "smart" programs interpreting a single argv that starts with a dash as a parameter and its argument. I'm sure there's going to be a CVE about this at some point if there hasn't already.
I would argue that command line is for human input, so the failure already happened when they composed a `ping` shell command programmatically.
Granted, a lot of software works like that, but the command line was invented as a human interface, we just bungee-strapped a computer instead.
On the other hand, seperating concerns by process boundaries leads to more secure, composable and stable code. By not reinventing the wheel, you avoid a whole class of problems. Of course a stable API or library might be better, but convenience always wins out.
Maybe the crazy part is also what is a valid IPv6 string. Amd for safety mostly-never pass anything to the shell.
IPv6 addresses are annoyingly complex. This isn't reason why because the shell-passing thing is a bad idea anyway, but it illustrates this.
And it gets even more fun when browsers such as firefox implemented this, then decided no we won't do it and removed the feature -- now there's no way to access your router web interface over link-local address...
(rationale being that whatwg said no: https://github.com/whatwg/url/issues/392 ; firefox bug https://bugzilla.mozilla.org/show_bug.cgi?id=700999 )
The "solution" is to use a proxy such as https://github.com/twisteroidambassador/prettysocks/tree/ipv... which incidentally encode the % as a `s` and handle special URLs like this http://fe80--1ff-fe23-4567-890as3.ipv6-literal.net for you through the socks dns resolution feature... I've never found anything else that works recently -_-
I very much didn't test it, but this patch might do the job on Firefox (provided there's no code in the UI doing extra validation on top):
--- a/netwerk/base/nsURLHelper.cpp
+++ b/netwerk/base/nsURLHelper.cpp
@@ -928,3 +928,3 @@ bool net_IsValidIPv4Addr(const nsACString& aAddr) {
bool net_IsValidIPv6Addr(const nsACString& aAddr) {
- return mozilla::net::rust_net_is_valid_ipv6_addr(&aAddr);
+ return true;
}
Or if you actually wanted to do some validation, pass the address to getaddrinfo(): bool net_IsValidIPv6Addr(const nsACString& aAddr) {
struct addrinfo *res, hints = {.ai_flags = AI_NUMERICHOST};
int err = getaddrinfo(aAddr.get(), nullptr, &hints, &res);
if (err) return false;
bool isValid = res[0].ai_family != AF_INET;
freeaddrinfo(res);
return isValid;
}
This way it's valid if your OS considers it a valid address.Could you do the same trick just putting a temporary entry in the hosts file?
You complain about URL encoding ? Enter UNC encoding ...
https://devblogs.microsoft.com/oldnewthing/20100915-00/?p=12...
> \\fe80--1ff-fe23-4567-890as3.ipv6-literal.net\share
The most amazing part about this is that Microsoft used a public domain for it and then lost the domain registration.
And don't even care to make a serious effort to get it back. I suspect if they tried using the UDRP with a claim "we lost it by accident, cybersecurity risk, current owner is just squatting on it without actively using it" – they'd have quite decent odds of success, given the attitudes of the average UDRP arbitrator. The current holder would of course argue "you lost it more than a decade ago, you should be estopped by the passage of time" – but again, the average UDRP arbitrator would likely weigh the "cybersecurity risk" argument higher.
"IPv6 is weird. One of the more strange parts of the standard is that every interface's link local addresses are in fe80::whatever`."
How is IPv6 weird here, it's the exact same thing in IPv4, no? If you have two different network interfaces, you have to identify which is which somehow, either by assigning a specific IP range to it or by adding some kind of identifier.
Making zones part of addresses in the first place was probably a mistake, I agree, but the problem of address conflicts when users can choose arbitrary addresses certainly isn't a design flaw of IPv6.
It's not the same as IPv4. IPv4 doesn't solve this problem. If eth0 and eth1 are both 169.254.0.20 on two different networks, you can't specify that you want to ping 169.254.0.1 on a specific interface. There's no way to disambiguate both destinations.
https://linux.die.net/man/8/ping
ping takes a -I argument you specify which interface to use.
except in ipv4 getting a link-local address means "I fucked up DHCP" and isn't really meant to be a feature it didn't really work in ipv4 land, and as per the OP, doesn't work in ipv6 land too. Just give everything a proper address and leave link-local to mdns or whatever it was meant to support
> except in ipv4 getting a link-local address means "I fucked up DHCP" […]
No, it means "there is no infrastructure on this link segment". No router (to send out IPv6 RAs), and (as you say) no (working) DHCP server.
Still being able to have network connectivity automatically in this scenario can still be handy. If mDNS is running on things, then the user doesn't even have to bother manually setting an address: the link light comes on and you have connectivity to the local segment.
A link-local address necessarily needs a way to specify a link, and the link is local to the sending host and not something the receiving host knows. I suppose they could have used the upper address bits, but the sending host would need to know to convert them to 0 when sending the packet out on the wire, and with the interface ID when receiving.
There aren't address conflicts. And users aren't choosing this, it's part of the IPv6 spec. Each interface has a unique address, but you can't tell from looking at an address which network it lives on.
Not really.
Nothing prevents host from configuring a static link-local address, like fe80::1234. Not only that, some networks choose to have some standard link local address as a default gateway. For example, a router or a L3 switch can have fe80::1 on its downstream interfaces. This way, all hosts on all networks have fe80::1 as the default gateway and the router will have fe80::1 address on multiple interfaces.
Furthermore, you can (and some say, should) use link local addresses on transit links between your network devices, eg, between layers of switches in a hyperscale-sized data center network. Typically, the addresses will be deterministically configured, for example, consider
-(e1.0)[switch1](e1.1)—--(e2.48)[switch2](e2.25)-(eth0)[server1]
We have server1 connected to top-of-rack switch2 connected to aggregator switch1. Link between switch1 and switch2 is point-to-point transit. You can use exclusively link local addresses there. There are a few approaches:
- e2.48 gets fe80::2, e1.1 gets fe80::1 - all upstream ports are always fe80::2 in all network, all downstream ports are always fe80::1. A good thing is that link configuration is the same on all switches regardless of the Clos layer.
- switch1 serial number is 1001, switch2 serial number is 2002. Then, e2.48 gets fe80::2002, e1.1 gets fe80::1001. This way, all interfaces on a switch N have address fe80::N
You then can set up BGP session between the link local addresses and it either will always be either fe80::1 <-> fe80::2 or fe80::N <-> fe80::M. Switches also have a loopback address for ping, and other ICMP traffic. Either has advantages and disadvantages.
This is discussed in more details in RFC 6164, and a more high level overview is provided in RFC 7404.
I think the weirdness comes from the use of multiple addresses at once, specifically fe80::whatever addresses always being present and getting used even on normal setups when everything's working fine and a global address is configured, as opposed to 169.254.whatever addresses, which most networks never intend to use and so usually only show up when something is wrong.
Isn't 127/8 always present in IPv4, without I'll consequences?
I meant it's one address per interface, and loopback has always been its own interface.
One address per host is more common in serious networks that don't have endless IP addresses (10/8 block) allocated to them.
There is no problem with allocating one 127.0.0.0/8 to every interface on your host, because 127.0.0.0/8 is only ever accessible to the host itself. So even if you have multi-homed a single routable IPv4 address to 2 NICs on your server (for redundancy), you can still assign 127.0.0.1 to the first and 127.0.0.2 for the second, which you can then use to bind a port to a specific interface in the pair. (I don’t know if anyone actually does this.)
How would the receiving host know which 127 address you imagined belongs to it?
What do you mean “receiving host?” 127/8 is reserved for loopback. If you bind a socket to an interface with an address in that range, you can only use it to communicate with yourself. The sending and receiving hosts are the same.
I mean the host that receives the packet. Weren't you suggesting to use 127/8 as an alternative to link-local addresses?