IPv6 zones in URLs are a mistake

141 points by xena a day ago

It gets worse than that.

The Python `ipaddress` library has an `ip_address` address that returns either an IPv4Address or IPv6Address if the passed string is a valid IPv4 or IPv6 address, or throws a ValueError if the address is invalid.

I've seen code that uses that function to determine if a user-supplied string is a valid IP before passing it to a command line. At first glance, that seems fine, but some shell metacharacters are valid in the IPv6 zone ID.

`fe80::1%a;whoami>${PATH:0:1}tmp${PATH:0:1}pwned` is a valid IPv6 IP, and if you did `ping fe80::1%a;whoami>${PATH:0:1}tmp${PATH:0:1}pwned`, you'd have the output of `whoami` written to /tmp/pwned.

Obviously, people shouldn't writing code that puts user input into a shell call without the proper method of execution (ie, shell=False when using subprocess.Popen), but people often think "I validated it, it's fine" and then get popped because their validation wasn't as good as they thought it was.

EDIT: In case it isn't clear, `${PATH:0:1}` is necessary in the attack payload because a `/` is invalid in a zone ID. `${PATH:0:1}` is a tricky way to get a `/` character by just grabbing the first character of your PATH environment variable.

rtpg - a day ago

> `fe80::1%a;whoami>${PATH:0:1}tmp${PATH:0:1}pwned` is a valid IPv6 IP, and if you did `ping fe80::1%a;whoami>${PATH:0:1}tmp${PATH:0:1}pwned`, you'd have the output of `whoami` written to /tmp/pwned.
Is this really a Python problem? `subprocess.run` for example defaults to `shell=False` so you have to set `shell=True`, and on top of that be building up argv?
The "default" API for `subprocess.run` has you doing `subprocess.run(["ping", ip])` which... I think just entirely avoids this problem?
There's def a general sort of "oh people will just copy/paste stuff into a shell" or the whole shell script arg escaping mess. Just feels like Python is not really doing anything bad here.
- btown - a day ago
  
  Never underestimate the power of an LLM that's spent its entire context passing its own self-generated strings to `bash`, to think "maybe the quickest way to get this done is to pass a self-generated string to `bash`."
  - - a day ago
    
    [deleted]
- AshamedCaptain - 16 hours ago
  
  do note that even if you don't do shell expansion you're still subject to "smart" programs interpreting a single argv that starts with a dash as a parameter and its argument. I'm sure there's going to be a CVE about this at some point if there hasn't already.
deepsun - 11 hours ago

I would argue that command line is for human input, so the failure already happened when they composed a `ping` shell command programmatically.
Granted, a lot of software works like that, but the command line was invented as a human interface, we just bungee-strapped a computer instead.
- manjalyc - 7 hours ago
  
  On the other hand, seperating concerns by process boundaries leads to more secure, composable and stable code. By not reinventing the wheel, you avoid a whole class of problems. Of course a stable API or library might be better, but convenience always wins out.
edoceo - a day ago

Maybe the crazy part is also what is a valid IPv6 string. Amd for safety mostly-never pass anything to the shell.
- frollogaston - 2 hours ago
  
  IPv6 addresses are annoyingly complex. This isn't reason why because the shell-passing thing is a bad idea anyway, but it illustrates this.

evgpbfhnr - a day ago

And it gets even more fun when browsers such as firefox implemented this, then decided no we won't do it and removed the feature -- now there's no way to access your router web interface over link-local address...

(rationale being that whatwg said no: https://github.com/whatwg/url/issues/392 ; firefox bug https://bugzilla.mozilla.org/show_bug.cgi?id=700999 )

The "solution" is to use a proxy such as https://github.com/twisteroidambassador/prettysocks/tree/ipv... which incidentally encode the % as a `s` and handle special URLs like this http://fe80--1ff-fe23-4567-890as3.ipv6-literal.net for you through the socks dns resolution feature... I've never found anything else that works recently -_-

Dagger2 - 17 hours ago

I very much didn't test it, but this patch might do the job on Firefox (provided there's no code in the UI doing extra validation on top):

  --- a/netwerk/base/nsURLHelper.cpp
  +++ b/netwerk/base/nsURLHelper.cpp
  @@ -928,3 +928,3 @@ bool net_IsValidIPv4Addr(const nsACString& aAddr) {
   bool net_IsValidIPv6Addr(const nsACString& aAddr) {
  -  return mozilla::net::rust_net_is_valid_ipv6_addr(&aAddr);
  +  return true;
   }

Or if you actually wanted to do some validation, pass the address to getaddrinfo():

  bool net_IsValidIPv6Addr(const nsACString& aAddr) {
    struct addrinfo *res, hints = {.ai_flags = AI_NUMERICHOST};
    int err = getaddrinfo(aAddr.get(), nullptr, &hints, &res);
    if (err) return false;

    bool isValid = res[0].ai_family != AF_INET;

    freeaddrinfo(res);
    return isValid;
  }

This way it's valid if your OS considers it a valid address.

- 2 hours ago

[deleted]
zamadatix - 14 hours ago

Could you do the same trick just putting a temporary entry in the hosts file?

AshamedCaptain - a day ago

You complain about URL encoding ? Enter UNC encoding ...

https://devblogs.microsoft.com/oldnewthing/20100915-00/?p=12...

> \\fe80--1ff-fe23-4567-890as3.ipv6-literal.net\share

Dagger2 - a day ago

The most amazing part about this is that Microsoft used a public domain for it and then lost the domain registration.
- skissane - a day ago
  
  And don't even care to make a serious effort to get it back. I suspect if they tried using the UDRP with a claim "we lost it by accident, cybersecurity risk, current owner is just squatting on it without actively using it" – they'd have quite decent odds of success, given the attitudes of the average UDRP arbitrator. The current holder would of course argue "you lost it more than a decade ago, you should be estopped by the passage of time" – but again, the average UDRP arbitrator would likely weigh the "cybersecurity risk" argument higher.
jujube3 - a day ago

Only a chopped unc would use that notation.

Tharre - a day ago

"IPv6 is weird. One of the more strange parts of the standard is that every interface's link local addresses are in fe80::whatever`."

How is IPv6 weird here, it's the exact same thing in IPv4, no? If you have two different network interfaces, you have to identify which is which somehow, either by assigning a specific IP range to it or by adding some kind of identifier.

Making zones part of addresses in the first place was probably a mistake, I agree, but the problem of address conflicts when users can choose arbitrary addresses certainly isn't a design flaw of IPv6.

WhyNotHugo - a day ago

It's not the same as IPv4. IPv4 doesn't solve this problem. If eth0 and eth1 are both 169.254.0.20 on two different networks, you can't specify that you want to ping 169.254.0.1 on a specific interface. There's no way to disambiguate both destinations.
- fragmede - 12 hours ago
  
  https://linux.die.net/man/8/ping
  ping takes a -I argument you specify which interface to use.
- sznio - 21 hours ago
  
  except in ipv4 getting a link-local address means "I fucked up DHCP" and isn't really meant to be a feature it didn't really work in ipv4 land, and as per the OP, doesn't work in ipv6 land too. Just give everything a proper address and leave link-local to mdns or whatever it was meant to support
  - throw0101a - 16 hours ago
    
    > except in ipv4 getting a link-local address means "I fucked up DHCP" […]
    No, it means "there is no infrastructure on this link segment". No router (to send out IPv6 RAs), and (as you say) no (working) DHCP server.
    Still being able to have network connectivity automatically in this scenario can still be handy. If mDNS is running on things, then the user doesn't even have to bother manually setting an address: the link light comes on and you have connectivity to the local segment.
trumpdong - a day ago

A link-local address necessarily needs a way to specify a link, and the link is local to the sending host and not something the receiving host knows. I suppose they could have used the upper address bits, but the sending host would need to know to convert them to 0 when sending the packet out on the wire, and with the interface ID when receiving.
masfuerte - a day ago

There aren't address conflicts. And users aren't choosing this, it's part of the IPv6 spec. Each interface has a unique address, but you can't tell from looking at an address which network it lives on.
- ivlad - a day ago
  
  Not really.
  Nothing prevents host from configuring a static link-local address, like fe80::1234. Not only that, some networks choose to have some standard link local address as a default gateway. For example, a router or a L3 switch can have fe80::1 on its downstream interfaces. This way, all hosts on all networks have fe80::1 as the default gateway and the router will have fe80::1 address on multiple interfaces.
  Furthermore, you can (and some say, should) use link local addresses on transit links between your network devices, eg, between layers of switches in a hyperscale-sized data center network. Typically, the addresses will be deterministically configured, for example, consider
  -(e1.0)[switch1](e1.1)—--(e2.48)[switch2](e2.25)-(eth0)[server1]
  We have server1 connected to top-of-rack switch2 connected to aggregator switch1. Link between switch1 and switch2 is point-to-point transit. You can use exclusively link local addresses there. There are a few approaches:
  - e2.48 gets fe80::2, e1.1 gets fe80::1 - all upstream ports are always fe80::2 in all network, all downstream ports are always fe80::1. A good thing is that link configuration is the same on all switches regardless of the Clos layer.
  - switch1 serial number is 1001, switch2 serial number is 2002. Then, e2.48 gets fe80::2002, e1.1 gets fe80::1001. This way, all interfaces on a switch N have address fe80::N
  You then can set up BGP session between the link local addresses and it either will always be either fe80::1 <-> fe80::2 or fe80::N <-> fe80::M. Switches also have a loopback address for ping, and other ICMP traffic. Either has advantages and disadvantages.
  This is discussed in more details in RFC 6164, and a more high level overview is provided in RFC 7404.
josephcsible - a day ago

I think the weirdness comes from the use of multiple addresses at once, specifically fe80::whatever addresses always being present and getting used even on normal setups when everything's working fine and a global address is configured, as opposed to 169.254.whatever addresses, which most networks never intend to use and so usually only show up when something is wrong.
- nine_k - a day ago
  
  Isn't 127/8 always present in IPv4, without I'll consequences?
  - josephcsible - a day ago
    
    I meant it's one address per interface, and loopback has always been its own interface.
    
    trumpdong - a day ago
    
    One address per host is more common in serious networks that don't have endless IP addresses (10/8 block) allocated to them.
    
    dcrazy - a day ago
    
    There is no problem with allocating one 127.0.0.0/8 to every interface on your host, because 127.0.0.0/8 is only ever accessible to the host itself. So even if you have multi-homed a single routable IPv4 address to 2 NICs on your server (for redundancy), you can still assign 127.0.0.1 to the first and 127.0.0.2 for the second, which you can then use to bind a port to a specific interface in the pair. (I don’t know if anyone actually does this.)
    
    trumpdong - 19 hours ago
    
    How would the receiving host know which 127 address you imagined belongs to it?
    
    dcrazy - 15 hours ago
    
    What do you mean “receiving host?” 127/8 is reserved for loopback. If you bind a socket to an interface with an address in that range, you can only use it to communicate with yourself. The sending and receiving hosts are the same.
    
    trumpdong - 11 hours ago
    
    I mean the host that receives the packet. Weren't you suggesting to use 127/8 as an alternative to link-local addresses?