The curious case of shell commands, or how "this bug is required by POSIX" (2021)

notes.volution.ro

162 points by wonger_ 8 days ago

Yeah, system() should definitely be deprecated and you should never use it if you write any new program. At least there is exec*() and posix_spawn() under POSIX. Under Windows there is no such thing and every program might parse the command line string differently. You can't naively write a generic posix_spawn() like interface for Windows, see this related Rust CVE: https://blog.rust-lang.org/2024/04/09/cve-2024-24576/ Why is it a CVE in Rust, but not in any other programming language? Did other language handle it better? Dunno, I just know that Rust has a big fat warning about this in their documentation (https://doc.rust-lang.org/std/process/struct.Command.html#me...), but e.g. Java doesn't (https://docs.oracle.com/javase/8/docs/api/java/lang/ProcessB...).

steamrolled - 7 days ago

The main reason system() exists is that people want to execute shell commands; some confused novice developers might mix it up with execl(), but this is not a major source of vulnerabilities. The major source of vulnerabilities is "oh yeah, I actually meant to execute shell".
So if you just take away the libcall, people will make their own version by just doing execl() of /bin/sh. If you want this to change, I think you have to ask why do people want to do this in the first place.
And the answer here is basically that because of the unix design philosophy, the shell is immensely useful. There are all these cool, small utilities and tricks you can use in lieu of writing a lot of extra code. On Windows, command-line conventions, filesystem quirks, and escaping gotchas are actually more numerous. It's just that there's almost nothing to call, so you get fewer bugs.
The most practical way to make this class of bugs go away is to make the unix shell less useful.
- rcxdude - 6 days ago
  
  most calls to system() that I've seen could be replaced with exec without much difficulty. There's relatively few that actually need the shell functionality.
  - oguz-ismail - 6 days ago
    
    system() involves fork()ing, setting up signal handlers, exec()ing and wait()ing. You won't be replacing it with exec, most of the time you'll be reimplementing it for absolutely no reason.
    
    kragen - 6 days ago
    
    Python has os.spawnl, os.spawnv, etc., which fork()s, wait4()s, etc., without involving a shell. This is much better; this is the library function you should be using instead of system() most of the time. Unfortunately I don't think glibc has an equivalent!
    strace -o tmp.spawnlp -ff python3 -c 'import os; os.spawnlp(os.P_WAIT, "true", "true")'
    In parent:
    clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fdc03233310) = 225954 wait4(225954, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 225954 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=225954, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
    In child:
    set_robust_list(0x7fdc03233320, 24) = 0 gettid() = 225954 clock_gettime(CLOCK_MONOTONIC, {tv_sec=2458614, tv_nsec=322829153}) = 0 clock_gettime(CLOCK_MONOTONIC, {tv_sec=2458614, tv_nsec=323030718}) = 0 execve("/usr/local/bin/true", ["true"], 0x7ffdc5008458 /* 44 vars */) = -1 ENOENT (No such file or directory) execve("/usr/bin/true", ["true"], 0x7ffdc5008458 /* 44 vars */) = 0
    Here, I think strace shows clone() rather than fork() because glibc's fork() is a library function that invokes clone(), rather than a real system call.
    
    oguz-ismail - 6 days ago
    
    > Python has os.spawnl, os.spawnv, etc., which fork()s, wait4()s, etc., without involving a shell.
    Good. How do you pipeline commands with these?
    
    kragen - 6 days ago
    
    These functions can't do it. In Python you have to use the subprocess module if you want to pipeline commands without the bugs introduced by the shell. From https://docs.python.org/3.7/library/subprocess.html#replacin...:
    p1 = Popen(["dmesg"], stdout=PIPE) p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE) p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits. output = p2.communicate()[0]
    Of course, now, nobody has an hda, and dmesg is root-only. A more modern example is in http://canonical.org/~kragen/sw/dev3/whereroot.py:
    p1 = subprocess.Popen(["df"], stdout=subprocess.PIPE) p2 = subprocess.Popen(["grep", "/$"], stdin=p1.stdout, stdout=subprocess.PIPE) p1.stdout.close() return p2.communicate()[0]
    Note that the result here is a byte string, so if you want to print it out safely without the shell-like bugginess induced by Python's default character handling (what happens if the device name isn't valid UTF-8?), you have to do backflips with sys.stdout.buffer or UTF-8B.
    Python got a lot of things wrong, and it gets worse all the time, but for now spawning subprocesses is one of the things it got right. Although, unlike IIRC Tcl, it doesn't raise an exception by default if one of the commands fails.
    Apart from the semantics of the operations, you could of course desire a better notation for them. In Python you could maybe achieve something like
    (cmd(["df"]) | ["grep", "/$"]).output()
    but that is secondary to being able to safely handle arguments containing spaces and pipes and whatnot.
    
    oguz-ismail - 5 days ago
    
    Dunno, so much work to achieve so little. I'm even more inclined to stick with shell scripts now
    
    kragen - 5 days ago
    
    The Bourne shell is definitely less work unless you want your code to correctly or reliably handle user input. Then it's more work.