shaftway 7 hours ago

Neat writeup.

I did a very similar thing in Rust, but to solve a different problem. I wanted to be able to easily make nice rich command line arguments for bash scripts without cluttering up the script too much. My minimal demo is:

    $ cat demo.sh
    eval "$(argparse-sh --string text -- "$@")";
    echo "$TEXT"
    
    $ ./demo.sh "Hello world"
    Hello World
I always think it's valuable to build these things for yourself; maybe not always valuable to ship them. It de-magic-ifies the libraries you use and can only help you grow as a programmer.

I'm still learning as a Rust developer, and I'm sure there are terrible things in my code. The hard part for me is finding ways to make things more idiomatic in isolation. I don't have a code review feedback loop I can use to speed up improvement.

https://github.com/Hounshell/argparse-sh

epage 7 hours ago

Context: maintainer of clap, a Cargo team member

Regarding the CLI parser

> This will take in our arguments (from something like std::env::args) and return our matches, or an error.

`std::env::args` will panic on non-UTF8 content, like a file path. You could instead error on non-UTF8 content. Until recently, you had to pull in a dependency or reinvent some non-trivial stuff to properly deal with `OsStr`s. There are now `unsafe` functions for dealing with them. I'd like to extend things further to have a proper "pattern" API for `OsStr` which would allow almost everything a CLI parser needs to deal with `OsStr` without a dependency and without `unsafe`.

---

Regarding the discussion on dependencies, I think there are reasonable and valid situations to be careful of adding dependencies (see https://tweedegolf.nl/en/blog/119/sudo-rs-depencencies-when-... and the follow up https://www.reddit.com/r/rust/comments/1b92j0k/sudors_depend...) but the reasoning here focuses on the wrong things imo.

> That would add 23 dependencies to my little project, if you count transitive dependencies. This can go up higher if you turn on a few features: derive, env, unicode, and wrap_help bring you up to 38 dependencies!

People overly focus on dependency counts. Yes, they mention dependency counts aren't a meaningful metric later but the lack of nuance here suggests they've not internalized that, including talking about the impact of optional dependencies when they advocate for optional dependencies later.

Clap can be trimmed down to just 4 dependencies. 1 of those exist for build performance. One might be able to be removed but is very light weight. The last is functionality that would exist either way, whether in its own crate or another.

> More concretely, by having no external dependencies you reduce your bug surface area. Sure, you own all the bugs now—but you won't get leftpad-ed, and you won't get dependabot alerts for third-removed transitive dependencies that now you've gotta patch.

crates.io is leftpad safe in all but the most extreme cases (law enforcement forces the deletion of a crate).

As I point out at the beginning, you already have a bug in this trivial code, one that is often hit when people think a CLI parser is trivial and they don't need dependencies.

> On the other hand, you miss out on nice things.

I think this is an understatement. imo one of the reasons we are seeing a lot of high quality CLIs out there is because its so easy to build on the work of others.

You also get very inconsistent results which makes the user experience much worse. Take the CLI parser shown here, it doesn't handle many conventions people expect, like multiple short flags (`-zxvf`). Having to deal with each CLI parser's quirks or only living with a subset of them all is not great.

> I think more things should be built from scratch and, ideally, without dependencies. You get to know the problem space better, and most things don't need the big sophisticated solution—but you pay for the whole dependency you pull in.

In creating a "product", the problem space of CLI parsing is not core. Same with a lot of what other dependencies provide. Instead of reinventing the wheel, you can better focus on the core of what you are trying to provide.

As for big sophisticated solutions, let's take the CLI space. There are many CLI parsers that you can pick from to adapt to the needs of your specific problem (https://github.com/rosetta-rs/argparse-rosetta-rs) but do you want to go into discovery mode for every dependency for every project, pivot between them as requirements change, or deal with bouncing between APIs for non-core parts of your projects? I don't.

  • axegon_ 6 hours ago

    Side note: thanks to everyone involved in the development of Clap, working with it is truly a pleasure.

  • oguz-ismail 7 hours ago

    >`std::env::args` will panic on non-UTF8 content, like a file path.

    Tell me this is a joke.

    • Hemospectrum 6 hours ago

      The documentation is quite clear on this point:

      > The returned iterator will panic during iteration if any argument to the process is not valid Unicode. If this is not desired, use the args_os function instead.

      std::env::args_os encodes paths as an OsString, which is allowed to contain invalid Unicode. You can then perform your own Unicode validation as needed, instead of the "ASAP" behavior of std::env::args.

      https://doc.rust-lang.org/std/env/fn.args.html

    • namibj 7 hours ago

      What part do you hope/expect to be a joke there?

    • Analemma_ 6 hours ago

      I think this is fine? 99.99% of the time in application-land I want to be working with valid UTF-8 only, and an equal percentage of the time, filenames and CLI args cooperate. And as the sibling comments say, this is all well-documented.

      Frankly I think the onus should be on operating systems to get with the program and be UTF-8 everywhere (I think UCS-2 on Windows and "bag of bytes" filenames on Linux are braindead), but until that happens we at least have std::env::args_os as an escape hatch.

jvanderbot 2 hours ago

I dislike CLAP. It is functional but huge. ARGP is a much better option that mirrors the unix header well enough. It's also very light.

Arch-TK 7 hours ago

I dislike clap too, it requires too much work to configure it in a sensible way (sensible being defined as working like most unix utilities have worked for the entire time I've used them), it comes with very "modern" defaults which while I appreciate are aiming to improve the situation, when I'm writing an utility in rust, it's usually a port of something I wrote in another language and I don't want to deviate unnecessarily. There's also the aspect of just how many dependencies it pulls in.

But, there are a few issues with this argument parser.

First and foremost, while there's no problem with forcing your option names to be str/String, you should still process OsStr/OsString unless none of your arguments are ever planning to be OS paths. The reason for this is that making your programs accept all the valid unix path names (which might not be valid UTF-8) is just the right thing to do, the alternative is an arbitrary restriction on your end users. It's about as annoying to run into these kinds of issues as it is to run into applications which don't handle spaces in filenames.

Next, there's the inability to handle multiple short options combined.

Also there's the lack of proper handling for options which require arguments vs options with optional arguments (-ovalue, -o value, --opt=value and --opt value should all work for the former case, but for the latter case it only makes sense to accept -ovalue and --opt=value due to the implications in the alternative case). Although this isn't that important and generally confuses people anyway so maybe it should be avoided.

Last (in this list, but no guarantee it's exhaustive), there's no handling for `--` to end passing options. This can have security implications.

It's a bit of a shame there isn't a zero dependency direct clone of python's argparse. Or something like that even in the standard library. argparse is relatively easy to use, not necessarily designed to be low overhead or fast (god help you if you're in a situation where option parsing is your bottleneck, but I can also appreciate the desire for not wasting cycles where there's no reason to waste them).

I think it's a good idea that people are writing their own low-dependency programs. But it's important that you understand the subject matter in detail if you plan on doing something like this for anything you're hoping to be used by anyone other than yourself.

While clap deviates a lot from the expectations of an option parser (I think part of the deviation is that the people behind clap want to do things "better" than they've been done in the past, but the problem with this motivation is that at some point better isn't important if it is at odds with interface design which has been around for a long time), it does for the most part handle most of these things in the expected way.

For me personally, I would reach for getargs (specifically, my own fork of getargs which does the handling of ArgsOs in a way I find to be optimal) can handle all of the above outline things correctly. There's also lexopt which looked promising when I last looked at it.

  • epage 7 hours ago

    > I dislike clap too, it requires too much work to configure it in a sensible way (sensible being defined as working like most unix utilities have worked for the entire time I've used them), it comes with very "modern" defaults which while I appreciate are aiming to improve the situation, when I'm writing an utility in rust, it's usually a port of something I wrote in another language and I don't want to deviate unnecessarily. There's also the aspect of just how many dependencies it pulls in.

    Which deviations are you concerned about?

    > It's a bit of a shame there isn't a zero dependency direct clone of python's argparse. Or something like that even in the standard library. argparse is relatively easy to use, not necessarily designed to be low overhead or fast (god help you if you're in a situation where option parsing is your bottleneck, but I can also appreciate the desire for not wasting cycles where there's no reason to waste them).

    As a maintainer of a CLI parser, I think there is too much policy to put in the standard library. If you go for something much simpler, like lexopt, I think its more doable but then again, I'm finding I'm writing y own lexopt-like library because it has too much policy in it.

tantalor 7 hours ago

> benefit of keeping my project's dependencies much lighter

This seems like pointless exercise.

  • Hemospectrum 7 hours ago

    Compile times are Rust's single biggest weakness. A lot of work is going into speeding up the compiler, but right now, the biggest wins in compile time reduction come from reorganizing your own code and eliminating overly generic dependencies, particularly those that introduce loads of transitive dependencies for building procedural macros. Clap and Serde are major offenders here. For many projects, eliminating such dependencies can speed up build times by a factor of 10 (and similarly reduce disk usage). Depending on your circumstances, it can be worth the effort.

    • saghm an hour ago

      It's worth noting that compiling the same code split up into separate crates compared to in a single crate will usually be faster due to the fact that cargo parallelizes per crate. That doesn't mean that having fewer dependencies won't still potentially help due to reducing the total amount of code that needs to be compiled, or that you couldn't split up the code you're writing yourself into separate crates, but if you're looking to bring in a dependency, the number of transitive dependencies you'd be bringing in as a result isn't necessarily going to be a good heuristic for the amount of extra compile time you'd be adding. In particular, a lot of packages that provide a large amount of functionality are already split up by those developers and then "combined" into a single package that brings them in as dependencies for this exact reason. If they're doing this right, they'll often have these exposed via cargo features that you can disable is they're not needed, and I'd argue that before looking to entirely swap out a direct dependency they're currently using, it's worth checking if the "bloat" they're unhappy with is actually something they can just disable via the feature set in cargo.

      Notably, a lot of the "extra" stuff that people have mentioned elsewhere in this thread is stuff that you can potentially turn off when using clap for arg parsing; looking at the documentation[1] to refresh myself, you can opt out of any combination of output color, automatic help text generation, automatic usage documentation, additional context being added to errors, and automatic generation of suggestions when the command is invoked incorrectly, and some of the stuff like being able to derive implementations by default and specific support for Unicode is already off by default.

      [1]: https://docs.rs/clap/latest/clap/_features/index.html

    • epage 3 hours ago

      Except for the most trivial programs, clap should not make a factor of 10 difference, especially for incremental builds.

      For serde, I can see it if you use it a lot. There is an experiment to try to cache the macro expansion results to not need to run macro expansion during incremental builds. For full rebuilds, there is an in-work PR to decouple serde's traits from the derive so deserializers can be built in parallel to the rest of your serde code. There is also an RFC for declarative derives. Likely, serde won't be able to use it immediately but we'd like to get it to the point that serde can, getting rid of some big dependencies.

    • wging 4 hours ago

      I don't think speed of a full compile (including your dependencies) is the right thing to optimize for -- and incremental compiles don't rebuild dependencies. In a project that depends on serde, for example, most compiles in an ordinary dev cycle are probably not recompiling serde. And sccache reduces the incidence of unnecessary rebuilds far further. Even clean builds tend to get through my dependencies quite quickly.

    • tantalor 6 hours ago

      TIL, this is missing context, thanks for explaining.