Rust for tokenising and parsing

260 points by thunderbong 17 hours ago

miningape 11 hours ago

About 2 months ago I would have said the same as the author, but I kept running against the hard edges of Rust: the borrow checker. I realised that while I really liked using algebraic data types (e.g. Enums) and pattern matching, the borrow checker and the low level memory concerns meant I spent a lot of time fighting the borrow checker instead of fighting the PL issues at the heart of my project. So while tokenising/parsing was nice, interpreting and typechecking became the bane of my existence

With that realisation I started looking for another more suitable language - I knew the FP aspects of Rust are what I was looking for so at first I considered something like F# but I didn't like that it's tied to microsoft/.NET. Looking a bit further I could have gone with something like Zig/C but then I lose the FP niceness I'm looking for. I also spent a fair amount of time looking at Go, but eventually decided that 1. I wanted a fair amount of syntax sugar, and 2. golang is a server side language, a lot of its features and library are geared towards this use case.

Finally I found OCaml, what really convinced me was seeing the syntax was like a friendly version of Haskell, or like Rust without lifetimes. In fact the first Rust compiler was written in OCaml, and OCaml is well known in the programming language space. I'm still learning OCaml so I'm not sure I can give a fair review yet, but so far it's exactly what I was looking for.

krick 10 hours ago

Bringing up goland always annoys me for some reason. Like, it's really practical, it is fast, but not actually low-level, it compiles fast, and most importantly it is very popular and has all the libraries. It seems like I should use it. But I just almost irrationally hate the language itself. Everything about it is just so ugly. It's a language invented in 2009 by some C-people who are apparently oblivious to everything that excited PL design folks for the last 20 years (as of 2009). PHP in 2009 was already a more modern and better designed language than goddamn golang. And golang didn't really improve since. I just cannot let it go somehow.
- guappa 7 hours ago
  
  I've had to use go occasionally and it feels like the language is designed to stop me from achieving my goals.
  The standard library is unimpressive (to be generous), it has plenty of footguns like C but none of its flexibility.
  Also for some reason parenthesis AND \n are required. So you get the worse of C and python there.
  - danudey 3 hours ago
    
    > The standard library is unimpressive (to be generous)
    Coming from Python, this is one of the major things that I just can't get past with golang (despite having to use it for work). The standard library has a lot of really interesting/impressive/useful things to cover niche cases, but is missing a lot of what I would consider basic functionality that I keep running into requiring me to go get an external module to solve the problem.
    Then, on top of that, the documentation for external modules is extremely terrible. In many cases the best you can get is API documentation in the form of "these are the functions, this is what they take and return" with no explanation of what those values need to be, what the function does with them, and so on; a simple list of functions. In others, there is that plus example code which doesn't work because it hasn't been updated since the last time backwards-incompatible changes were made so you end up down a rabbit hole of trying to debug someone else's wrong code.
    The only thing letting me write effective golang at this point is that VSCode can autocomplete a lot of method calls, API calls, and so on, and then tell me what parameters they need, but even then I'm just guessing about what function might exist and what it might be called.
    The language itself is okay and the more I use it the more I understand why they implemented all the stuff I hate (like a lack of proper error handling leading to half of my lines of code being boilerplate `if err != nil` blocks), but if the tooling around it wasn't so good no one would take it remotely seriously.
  - aftbit 4 hours ago
    
    You're intended to run gofmt on every save. golang is designed to be a sort of straight-jacket that forces everyone to write code in the same way (style etc) so that the junior devs can understand it clearly.
    
    grey-area 4 hours ago
    
    And so that people (of any level) don't bikeshed over silly things like tabs and spaces or { and newlines.
    I really like this about go - that it formats code for you, and miss it in other languages where we have linters but not formatters, which is a terrible idea IMO.
    
    umanwizard an hour ago
    
    What mainstream languages don’t have formatters nowadays? Rust has rustfmt, C and related languages have clang-tidy, python has Black…
    
    mistrial9 19 minutes ago
    
    that's your preference but not universal
- pimeys 9 hours ago
  
  I know, I'm on the same boat. What I realized is I just need to avoid the companies using Go and I don't really need to be vocal about my dislike. It's not my loss if others find the language useful, but for me it either solves problems I'm not interested in solving, or the language and tooling just does not make it for me.
  But, I can always just write Rust and be happy where I am. Or, to be honest, would not be very unhappy with F#, Haskell or Ocaml either.
  - mistrial9 17 minutes ago
    
    > I just need to avoid the companies using Go
    and they also will avoid you! A monthly go-lang meetup in San Francisco impressed me as the only meetup I have ever been to where no one (in a crowded venue) seemed to want to talk to anyone outside their clique
  - meowface 8 hours ago
    
    >What I realized is I just need to avoid the companies using Go
    What do you mean, exactly?
    
    leftyspook 7 hours ago
    
    I'd imagine not seeking employment at Google is a big part.
    
    tempest_ 7 hours ago
    
    Go is definitely used a lot more outside of google as of late.
    Anecdotally I would say where a lot of companies would have used Java in the past they are now turning to go for their server-side/backend service implementations.
    
    danudey 3 hours ago
    
    Also it feels like pretty much everything in the k8s/container/etc. space is go-related, which kind of makes sense.
- pjmlp 8 hours ago
  
  It is worse than that, as Go initially lacked generics (introduced by CLU and ML in 1976), still doesn't do even basic Pascal enumerations (1970) rather the iota/const dance, let alone the 1990's programming language design surface.
  I only advocate for it on the scenarios where a garbage collected C is more than enough, regardless of the anti-GC naysayers, e.g. see TamaGo Unikernel.
  - randomdata 4 hours ago
    
    > still doesn't do even basic Pascal enumerations
    The term you are looking for is sum types (albeit in a gimped form in the case of Pascal). Enumerations refer to the value applied to the type, quite literally, and is identical in Pascal as every other language with enumerations, including Go. There is only so much you can do with what is little more than a counter.
    
    samatman 2 hours ago
    
    I'm fairly sure he's referring to enumerations actually.
    Pascal doesn't require case matching of enumerations to be exhaustive, but this can be turned on as a compiler warning in modern Pascal environments, FreePascal / Lazarus and such.
    Go only has enums by convention, hence the "iota dance" referred to. I've argued before that this does qualify as "having enums" but just barely.
    It wouldn't have been difficult to do a much better job of it, is the thing.
    
    randomdata 2 hours ago
    
    > Pascal doesn't require case matching of enumerations to be exhaustive
    Normally in Pascal you would not match on the enumeration at all, but rather on the sum types.
    type Foo = (Bar, Baz) case foo: Bar: ... // Matches type Bar Baz: ... // Matches type Baz
    The only reason for enumerations in Pascal (and other languages with similar constructs) is because under the hood the computer needs a binary representation to identify the type, and an incrementing number (an enum) is a convenient source for an identifier. In a theoretical world where the machine is magic you could have the sum types without enums, but in this reality...
    Thus, yes, in practice it is possible to go around the type system and get the enumerated value out with Ord(foo), but at that point its just an integer and your chance at exhaustive matching is out the window. It is the type system that allows more flexibility in what the compiler can tell you, not the values generated by the enumeration.
    > Go only has enums by convention
    "Enums by convention" would be manually typing 1, 2, 3, 4, etc. into the code. Indeed, that too is an enumeration, but not as provided by the language. Go actually has enums as a first-class feature of the language[1]. You even say so yourself later on, so this statement is rather curious. I expect you are confusing enums with sum types again.
    [1] Arguably Pascal doesn't even have that, only using enums as an implementation detail to support its sum types. Granted, the difference is inconsequential in practice.
  - Thaxll 7 hours ago
    
    So every language have to implement every features released in the last 50 years?
    
    pjmlp 6 hours ago
    
    Not necessarly, but designing for stone age computers isn't ideal either, even C, Fortran and COBOL have progressed during those 50 years.
    
    treyd 7 hours ago
    
    I expect it to implement features that have become ubiquitous in every other mainstream language from the last 30 years, yes.
    
    randomdata 25 minutes ago
    
    But then what purpose would it serve? The last 30 years has brought no lack of new languages, not to mention evolution of older languages. Just use one of them.
    The purpose of creating yet another language with Go was to break from what everyone else was doing, to see if a "simple" language would stop developers from playing with fun language toys all day to instead focus on actual engineering.
    Arguably it was successful in that.
    
    Thaxll 6 hours ago
    
    That's how you end up with C++ and soon C#.
    
    biorach 5 hours ago
    
    No, the person you were replying to was advocating for the intersection of ubiquitous features. C++ seems to be aiming for the union.
    
    umanwizard an hour ago
    
    It’s really not. The strawman that if we add a few features that have stood the test of time in every other language we’ll end up with C++ is just not true. Nobody is proposing adding SFINAE-based conditional compilation, rvalue references, multiple inheritance, or any of the million other Byzantine features that make C++ virtually impossible to use correctly. Adding sum types and a match statement does not necessarily start you down that path.
    
    pjmlp 6 hours ago
    
    We also end up with C23, Fortran 2023, COBOL 2023, Scheme R7RS,... even those oldies embrace modern ideas.
    
    miohtama 5 hours ago
    
    When we can call it Go++
- mattgreenrocks 7 hours ago
  
  I’m convinced there’s a contingent of devs who don’t like/grok abstraction. And it overlaps partially with stated goals of an easy language to onboard inexperienced devs with.
  Nothing wrong with that, but it will probably never work for me. Newer versions of Java are much more enjoyable to work with versus Go.
  - keldaris 5 hours ago
    
    > I’m convinced there’s a contingent of devs who don’t like/grok abstraction.
    I am one of those. I grok abstractions just fine (have commercially written idiomatically obtuse Scala and C#, some Haskell for fun, etc.), but I don't enjoy them.
    I use them, of course (writing everything in raw asm is unproductive for most tasks), but rather than getting that warm fuzzy feeling most programmers seem to get when they finish writing a fancy clever abstraction and it works on the first try, I get it when I look at a piece of code I've written and realize there is nothing extraneous to take away, that it is efficient and readable in the sense of being explicit and clear, rather than hiding all the complexity away in order to look pretty or maximize more abstract concerns (reusability, DRY, etc.).
    This mindset is a very good fit for writing compute-heavy numerical code, GPU stuff and lots of systems level code, not so much for being a cog in a large team on enterprise web backends, so I mostly write numerical code for physics simulations. You can write many other things this way and get very fast and bloatfree websites or anything else, but it doesn't work well in large teams or people using "industry best practices". It also makes me prefer C to Rust.
    
    fuzztester 4 hours ago
    
    >I get it when I look at a piece of code I've written and realize there is nothing extraneous to take away,
    "Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away."
    - Antoine de Saint-Exupery
    https://www.brainyquote.com/quotes/antoine_de_saintexupery_1...
  - grey-area 4 hours ago
    
    Strange that you bracket don't like/don't understand together like that.
    The vast majority understand abstractions just fine, though each takes time to understand. However most people like their own abstractions best, and those of other people less. To me hell is living in a world of bad abstractions created by someone else.
    Every abstraction created adds to cognitive load when reading the code and to the maintenance burden of that code. So you have an abstraction budget, which is usually in overspent IME and needs to be carefully controlled. Most of the most horrible codebases are horrible because they have too many of the wrong sort of abstraction.
    
    mattgreenrocks 3 hours ago
    
    Everyone lands at a different spot.
    Personally, I don't want to write any new code in something that doesn't have ADTs, or the moral equivalent (Java's sealed classes). I've already written a lifetime of code without them, so I suppose part of that is not wanting to write another 20 years of the same code. :)
  - ralegh 6 hours ago
    
    I guess I’m in that camp. I can come up with a good abstraction after working on a problem for a while and refactor it into my code. Or I can come up with a really simple abstraction (eg a Go interface with 2-3 methods), and that usually works well. But I try to avoid starting a project by defining a bunch of abstractions, since I just end up writing loads of boiler plate. Yes, I’m probably doing some things wrong.
    
    mattgreenrocks 3 hours ago
    
    Sounds about right. Proper abstractions are difficult to get right up front, might as well pull them out only when they're obvious and profitable.
- seanw444 5 hours ago
  
  Man, Go gets a lot of hate on here. It's certainly not the most flexible language. If I want flexibility + speed, I tend to choose Nim for my projects. But for practical projects that I want other people to be able to pick up quickly, I usually opt for Go. I'm building a whole product manufacturing rendering system for my company, and the first-class parallelism and concurrency made it super pleasant.
  I will say that the error propagation is a pain a lot of the time, but I can appreciate being forced to handle errors everywhere they pop up, explicitly.
  - f154hfds 4 hours ago
    
    So much of language opinion is based on people's progression of languages. My progression (of serious professional usage) looked like this:
    Java -> Python -> C++ -> Rust -> Go
    I have to say, given this progression going to Rust from C++ was wonderful, and going to Go from Rust was disappointing. I run into serious language issues almost daily. The one I ran into yesterday was that defer's function arguments are evaluated immediately (even if the underlying type is a reference!).
    https://go.dev/play/p/zEQ77TIP8Iy
    Perhaps with a progression Java -> Go -> Rust moving to rust could feel slow and painful.
    
    62951413 3 hours ago
    
    I'm curious how one ends up with such ahistorical sequence. I'd expect it to be more aligned with the actual PL history. Mainstream PLs have had a fairly logical progression with each generation solving well understood problems from the previous one. And building on top of the previous generation's abstractions.
    Turbo Pascal for education, C as professional lingua franca in mid-90s (manual memory management). C++ was all the rage in late 90s (OOP,STL) . Java got hot around 2003 (GC, canonical concurrency library and memory model). Scala grew in popularity around 2010-2012 (FP for the masses, much less verbosity, mainstream ADTs and pattern matching). Kotlin was cobbled together to have the Scala syntactic sugar without the Haskell-on-the-JVM complexity later.
    And then they came up with golang which completely broke with any intellectual tradition and went back to before the Java heyday.
    Rust feels like a Scala with pointers so the "C++ => Rust" transition looks analogous to the "Java => Scala" one.
    
    aaronblohowiak 2 hours ago
    
    >I'm curious how one ends up with such ahistorical sequence.
    they are all actively in-use.. if gp is earlier in their career, it could all be in last 10 years.
  - materielle 3 hours ago
    
    Go is definitely of the “worse is better” philosophy. You can basically predict what someone will think of Go if you know how they feel about that design philosophy.
    I remember that famous rant about how Go’s stdlib file api assumes Unix, and doesn’t handle Windows very well.
    If you are against “worse is better” like the author, that’s a show stopping design flaw.
    If you are for it, you would slap a windows if statement and add a unit test when your product crosses that bridge.
  - speed_spread 3 hours ago
    
    The problem is that most of the time, errors are not to be handled but only bubbled up. I've also seen it in Java with checked exceptions: the more explicit error handling is, the more developers feel they should somehow try to do _something_ with the error when the correct thing to do would actually be to fail in the most straightforward manner. The resulting code is often much heavier than necessary because of this and the stacktraces also get polluted by overwrapping.
- dinosaurdynasty 4 hours ago
  
  I use golang for work and have done a fair amount of Rust programming. Rust feels like the higher level language. This really shouldn't be the case.
- tuveson 4 hours ago
  
  Fast to compile, fast to run, simple cross-compilation, a big standard library, good tooling…
  As ugly and ad-hoc as the language feels, it’s hard to deny that what a lot of people want is just good built-in tooling.
  I was going to say that maybe the initial lack of generics helped keep compile times low for go, but OCaml manages to have good compile times and generics, so maybe that depends on the implementation of generics (would love to hear from someone with a better understanding of this).
  - klodolph 3 hours ago
    
    There are a million little decisions that affect compile time. A big factor here is inlining. When you inline functions, you may improve the generated code or you may make it worse. It’s hard to predict the result because the improvements may come about because of various other code transformation passes which you perform after inlining. After inlining, the compiler detects that certain code paths are impossible, certain calls can be devirtualized, etc., and this can enable more inlining.
    Rust is designed with the philosophy of zero-cost abstractions. (I don’t like the name, because the cost is never zero, but it is what it is.) The abstractions usually involve a lot of function calls and you need a compiler with aggressive inlining in order to get reasonable performance out of Rust. Usage of generics still results in the same non-virtual calls which can be inlined. But the compiler then has to do a lot of work to evaluate inlining for every instantiation of every generic.
    Go is designed with the philosophy of simple abstractions, which may come with a cost. Generics are implemented in a way that means you are still doing a lot of dynamic dispatch. If you need speed in Go, you should be writing the monomorphic code yourself. Generics don’t get instantiated for every single type you use them with. They only get instantiated for every “shape” of type.
    
    sophacles 3 hours ago
    
    > Rust is designed with the philosophy of zero-cost abstractions. (I don’t like the name, because the cost is never zero, but it is what it is.)
    So when the generated asm is the same between the abstraction and the non-abstraction version, wheres the cost?
    
    steveklabnik 2 hours ago
    
    The point is that different people have different understandings of "cost." You're correct that that kind of cost is what "zero cost abstractions" means, but there are other costs, like "how long did it take to write the code," that people think about.
    
    samatman 2 hours ago
    
    Cognitive cost is the most important cost to minimize.
    A Rust project's cognitive cost budget comes out of what's left over after the language is done spending. This is true of any language, but many language designers do not discount cognitive costs to zero, which, with the "zero cost abstraction" slogan, Rust explicitly does.
- madeofpalk 7 hours ago
  
  go’s error handling patterns, while lacking every established feature that makes it ergonomic, is baffling.
  Embarrassing that developers are still forgetting nil pointer checks in 2024.
  - ziml77 6 hours ago
    
    The error handling is one of the worst parts of Go for me. Every call that can fail ends up being followed by 3 lines of error handling even if it's just propagating the error up. The actual logic get drowned out.
    
    danudey 3 hours ago
    
    I would kill for some kind of `err_yield(err)` construct that handles propagating the error if it's the caller's problem to deal with.
    That said, I discovered that Go has the ability to basically encapsulate one error inside of another with a message; for example, if you get an err because your HTTP call returned a 404, you can pass that up and say "Unable to contact login server: <404 error here>". But then the caller takes that error and says "Could not authenticate user: <login error here>", and _their_ caller returns "Could not complete upload: <authentication error here>" and you end up with a four-line string of breadcrumbs that is ostensibly useful but not very readable.
    Python's `raise from` produces a much more readable output since it amounts to much more than just a bunch of strings that force you to follow the stack yourself to figure out where the error was.
- LinXitoW 6 hours ago
  
  Thank god I'm not the only one. I can still remember when the Go zealots were everywhere (it's cooled down now). Every feature Go didn't have was "too complicated and useless", while the few features it did have were "essential and perfect".
  I've really tried giving Go a go, but it's truly the only language I find physically revolting. For a language that's supposed to be easy to learn, it made sooooo many weird decisions, seemingly just to be quirky. Every single other C-ish language declares types either as "String thing" or "thing: String". For no sane reason at all, Go went with "thing String". etc. etc.
  I GENUINELY believe that 80% of Gos success has nothing to do with the language itself, and everything to do with it being bankrolled and used by a huge company like Google.
- randomdata 4 hours ago
  
  > But I just almost irrationally hate the language itself.
  That's the point. It's a rejection of the keyboard jockeys who become more concerned with the code itself than the problem being solved.
  - wyager 4 hours ago
    
    Golang was created specifically so that Google could mitigate the downsides from lower their hiring standards. It doesn't have any higher design aspirations.
    "The key point here is our programmers are Googlers, they’re not researchers. They’re typically, fairly young, fresh out of school, probably learned Java, maybe learned C or C++, probably learned Python. They’re not capable of understanding a brilliant language but we want to use them to build good software. So, the language that we give them has to be easy for them to understand and easy to adopt." - Rob Pike
    I suppose in a sense this is rejecting the "keyboard jockeys", but probably not in the way you mean.
    You cannot separate the tool used to solve a problem from the problem itself. The choice of tool is a critical and central consideration.
    
    grey-area 4 hours ago
    
    I think you're giving far too much weight to that off the cuff quote from one of the creators of Go.
    Really I think it's more useful to view it as a better C in the less is more tradition, compared to say C++ and Java, which at the time were pretty horrible. That's my understanding of its origin. It makes sense in that context; it doesn't have pretensions to be a super advanced state of the art language, but more of a workhorse, which as Pike pointed out here could be useful for onboarding junior programmers.
    Certain things about it I think have proven really quite useful and I wish other languages would adopt them:
    * It's easy to read precisely because the base language is so boring * Programs almost never break on upgrade - this is wonderful * Fewer dependencies, not more * Formatters for code
    Lots of little things (struct tags for example) I'm not so keen on but I think it's pretty successful in meeting its rather modest goals.
    
    wyager 2 hours ago
    
    > Really I think it's more useful to view it as a better C
    But Go is nothing at all like C, and it's completely unsuitable for most of the situations where C is used. I'm having trouble even imagining what you're getting at with this comparison. The largest areas of overlap I can think of are "vaguely similar syntax style" and "equally bad and outdated type system". Pretty much everything else of substance is different. Go is GC'd, Go has a runtime, etc.
    
    randomdata 4 hours ago
    
    That's saying the same thing. If you give someone the ability to understand a brilliant language, they will turn their attention to the language and away from the problem. That's just human nature. Shiny attracts us. Let's be honest, we all have way more fun diving deep into crafting the perfect type in Haskell. But Pike indicates that if you constrain developers to bounds that they already know, where they can't turn their attention to learning about what is brilliant, then they won't be compelled away from what actually matters – the engineering.
- marcosdumay 4 hours ago
  
  Whatever reasons are there for using or not the language, tokenising and parsing are absolutely not a problem you want to solve with it.
- devmor 5 hours ago
  
  golang feels like someone wanted to write a "web focused" version of C, but decided to ignore every issue and complaint about C raised in the past 25 years
  It's a very simple and straightforward language, which I think is why people like it, but it's just a pain to use. It feels like it fights any attempt at using it to do things optimally or quickly.
- dlisboa 7 hours ago
  
  > It's a language invented in 2009 by some C-people who are apparently oblivious to everything that excited PL design folks for the last 20 years (as of 2009).
  Is there a term equivalent to "armchair quarterback" in programming? Most programmers are already in armchairs.
  It's the equivalent of yelling at the TV that the ultra-successful mega-athlete sucks. I can't imagine the thought process that goes into thinking Ken Thompson, Rob Pike and Robert Griesemers are complete idiots that have no clue of what they were doing.
  - biorach 5 hours ago
    
    No one said they are complete idiots.
    They made a deliberate decision to design a language that did not take many developments in PL design since the 70's into account.
    They had their reasons, which make sense in the context of their employer and their backgrounds.
    Many people, myself included, prefer to program with languages that do not focus so much on simplicity
    
    billmcneale 3 hours ago
    
    It was not deliberate, it was ignorance. Time and again, the Go team made comments in various forums for years showing they really knew nothing about programming language development past 2000.
    All they knew was C and that they wanted to create a language that compiles faster than C++. That's all.
    
    biorach an hour ago
    
    You're talking about Ken Thompson, Rob Pike and Robert Griesemers, among others.
    You're not doing yourself any favors.
rtpg 10 hours ago

I think one core thing that you have to do with ASTs in Rust is to absolutely not store strings of the like in your AST. Instead you want to use something like static string libraries (so you get cheap clones and interning) and for things like positions in text you want to use just indices. Absolutely avoid storing references if you can avoid it!
The more your stuff is held in things that are cheap/free to clone, the less you have to fight the borrow checker… since you can get away with clones!
And for actual interpretation there’s these libraries that can help a lot for memory management with arenas and the like. It’s super specialized stuff but helps to give you perf + usability. Projects like Ruffle use this heavily and it’s nice when you figure out the patterns
Having said that OCaml and Haskell are both languages that will do all of this “for free” with their built in reference counting and GC… I just like the idea of going very fast in Rust
radicalbyte 10 hours ago

I've been writing a lot of Golang in the last year and I wouldn't use it for writing a parser. It's just a modernised C, the model it provides is very simple (coming from C# the simplicity actually made it harder to learn!) and is very well suited to small, focused applications where a low conceptual load are beneficial and the trade off of verbosity are acceptable.
F# or even the latest version of C# are what I would recommend. Yes Microsoft are involved but if you're going to live in a world where you won't touch anything created by evil corporations then you're going to have a hard time. Java, Golang, Python, TypeScript/Javascript and Swift all suffer from this. That leaves you with very little choice.
I'd be interested in hearing your thoughts over OCaml after a year or so of using it. The Haskell-likes are very interesting but Haskell itself has a poor learning curve / benefit ratio for me (Rust is similar there actually; I mastered C# and made heavy use of the type system but that involved going very very deep into some holes and I don't have the time to do that with Rust).
- galangalalgol 4 hours ago
  
  F# and Ocaml are still functionally identical to the point many programs would compile in either right? F# Ocaml and Rust seem a lot more similar to me than any of them are to haskell, or go for that matter. I like Haskell, but my brain hasn't made thinking that way native yet.
- MrMcCall 4 hours ago
  
  Python suffers by having been created by an evil corporation?
  Have I missed something GvR or his team did?
- guappa 6 hours ago
  
  I wouldn't use it to write an hello world :D
balencpp 41 minutes ago

Did you discover Scala 3 and give it a thought? I think of it as Rust with an _overall_ stronger type-system, but where you don't have to worry about memory management. It has an amazing standard library, particularly around collections. You get access to the amazing JVM ecosystem. And more. Martin Odersky in fact sees Scala's future lying in being a simpler Rust.
Also, regarding F#. It runs on .NET, and indeed, since the ecosystem and community are very small, you need to rely on .NET (basically C#) libraries. But it's really not "tied" to Microsoft and is open source.
ralegh 11 hours ago

I wouldn't call Go a 'server side' language. The Go compiler is written in Go, for example! Cross compilation and (relatively) small binaries make it super easy for distribution. Syntax sugar is a fair point though, it doesn't lend itself to functional-y pattern matching.
- eminent101 10 hours ago
  
  > The Go compiler is written in Go, for example!
  Do you know how they avoid the GC in the Go implementation of the Go compiler? If I understand correctly they need to implement the Go garbage collector in their Go implementation of the Go compiler. But Go already has a garbage collector. So how do they avoid invoking Go's garbage collector so that they can implement the garbage collector of the Go language they are implementing?
  Not sure if I'm making sense but I'd like to know more about this from those who understand this more than I do.
  - enugu 9 hours ago
    
    We can think of the Compiler as a function from a string to a string - high level (HLC), to low level code(LLC). LLC can include the garbage collection code(if it is run as a standalone executable instead of garbage collection being done by a separated runtime).
    The compiler executable itself is running in a compilation process P which uses memory and has its own garbage collection. (The compiler executable was itself generated by a compilation, using a compiler written in Go itself(self-hosting) or initially, in another language).
    But the compilation process P is unrelated to the process Q in which the generated code, LLC, will run when first executed. The OS which runs LLC doesn't even know about the compiler - LLC is just another binary file. The garbage collection in P doesn't affect garbage collection in Q.
    Indeed, it should be easy for the compiler to generate an assembly program which constantly keeps allocating more memory until the system runs out, while compiling say a loop which allocates a struct within a loop running a billion times. Unless, of course, you explicitly also generate a garbage collector as part of the low level code.
    Your question does become very interesting in the realm of security, there is a famous paper called "Trusting Trust" where a compiled compiler can still have backdoors even if the compiled code is trustworthy and the compiler code is trustworthy but the code which compiled the compiler had backdoors.
  - miningape 10 hours ago
    
    Remember that a compiler generates an executable file (can almost be thought of as an ASM transpiler), this file must contain everything the language needs to operate (oversimplification) so that includes the runtime as well as the compiled instructions from the user's code. This is compared to an interpreter which doesn't require you to pack all the implementation details into a binary, so instead you can use the host language's runtime.
    All this to say: the output of a compiler is by necessity not tied to the language the compiler is written in, instead it is tied to the machine the executable should run on. A compiler "merely" translates instructions from a high level language to a machine executable one. So stuff like a GC must be coded, compiled and then "injected" into the binary so the user's code can interact with it. In an interpreted language this isn't necessary, since the host language is already running and contains these tools which would otherwise have to be injected into the binary.
  - James_K 8 hours ago
    
    They just use the implementation from the last version of the compiler, which you can follow back in a long chain to the first implementation. As for the implementation of the garbage collector, it probably just doesn't allocate anything. The basics of a garbage collector are a function "alloc" and another one "collect". The function to allocate memory usually looks something like this:
    char heap[100000000]; int heap_end; void *alloc(int n_bytes) { void *out = &heap[heap_end] heap_end += n_bytes; return out; }
    As you can see, it doesn't need to allocate any memory to do this.
    
    umanwizard 5 hours ago
    
    > They just use the implementation from the last version of the compiler
    The garbage collector isn’t part of the compiler, it’s part of the runtime. It’s worth being clear about this distinction because I think it’s the root of the OP’s confusion.
    
    wrs 4 hours ago
    
    Sure, but what’ll really bake OP’s noodle is that Go’s GC is written in Go too. :) [0]
    [0] https://go.dev/src/runtime/mgc.go
  - umanwizard 6 hours ago
    
    How does clang, a C++ compiler that is itself written in C++, use <feature from C++> that it is itself implementing?
    Why wouldn’t it be able to?
    I don’t understand how your question specifically relates to garbage collection, or why the compiler would need to avoid it. The Go compiler is a normal Go program and garbage collection works in it the same way it does in any other Go program.
  - P-Nuts 6 hours ago
    
    I’ve never used Go myself, but according to this https://go.dev/doc/install/source you need a Go compiler to compile Go. However, for the early versions, you needed a C compiler to compile Go.
    So at some point, someone wrote enough of a Go GC in C to support enough of Go to compile itself.
  - dboreham 8 hours ago
    
    Definitely not making sense. Other answers appear to assume you don't know what a compiler is, but I'm not so sure. Re-state the question perhaps?
  - afandian 10 hours ago
    
    I don't understand the question as it's written.
    But the shape of the question feels like you're asking about whether an interpreter (which the compiler is not) uses the GC of the host language?
    
    ben0x539 9 hours ago
    
    I think they're asking how the code in the Go runtime (not the compiler, that being an interesting but also maybe non-obvious distinction!) that implements the garbage collector, a core feature of the language, avoids needing the garbage collector to already exist to be able to run, being written in the language that it's a core feature of. I suspect the answer is just something like "by very carefully not using language features that might tempt the compiler to emit something that requires an allocation". I think it's a fair question as it's not really obvious that that's possible--do you just avoid calling make() and new() and forming pointers to local variables that might escape? Do you need to run on a magical goroutine that won't try to grow its stack with gc-allocated segments? Can you still use slices (probably yes, just not append() or the literal syntax), closures (probably only trivial ones without local captures?), maps (probably no)...?
    I think the relevant code is https://github.com/golang/go/blob/master/src/runtime/mgc.go and adjacent files. I see some annotations like //go:systemstack, //go:nosplit, //go:nowritebarrier that are probably relevant but I wouldn't know if there's any other specific requirements for that code.
    
    dboreham 8 hours ago
    
    Why would the runtime not be allowed to use GC? I can understand the question being: how do you implement the GC code without using GC?
    
    ben0x539 8 hours ago
    
    Yeah that's the code I mean
    
    the_gipsy 7 hours ago
    
    On a high level the question is "how do you bootstrap X, if you need X to bootstrap X?".
    
    pmontra 7 hours ago
    
    This is correct but it's never as hard as it seems.
    First, that is a problem only for the very first version of X. Then you use X for version X+1.
    Second, building from source usually doesn't mean having to build every single dependency. Some .so or .dll are already in the system. Only when one has to build everything from scratch the first step would have to solve the original X from X problem but I think that even a Gentoo full system build doesn't start with a user setting in bytes in RAM with switches (?), setting the program counter of the CPU and its registers to eventually start the bootstrap process.
- miningape 11 hours ago
  
  Well now I've got to go check out the go compiler! That sounds really interesting. I was mainly referring to go having a lot more developed concurrency features, which while they're great I didn't really want to use them for my toy language, it seemed like I was throwing away a lot of what makes golang great just because of the nature of my project.
  The rest of the golang ecosystem I found really nice actually, and imo it had a really great set of tools for reading/writing to files - and also I like that everything is apart of the go binary, it certainly is easier than juggling between opam and dune (used for OCaml for example).
  - ralegh 10 hours ago
    
    That's fair, the concurrency features are very handy though optional of course.
    The ecosystem and tooling are great, probably the best I've worked with. But the main reason I reach for Go is that it's got tiny mental overhead. There's a handful of language features so it becomes obvious what to use, so you can focus on the actual goal of the project.
    There are some warts of course. Heavy IO code can be riddled with err checks (actually, why I find it a bit awkward for servers). Similarly the stdlib is quite verbose when doing file system manipulation, I may try https://github.com/chigopher/pathlib because Python's pathlib is by far my favourite interface.
ThePhysicist 10 hours ago

Lots of folks use Golang on the client side, even on mobile (for which Go has really great support with go-mobile). Of course it adds around 10-20 MB to your binary and memory footprint but in todays world that's almost nothing. I think Tailscale e.g. uses Golang as a cross-platform Wireguard layer in their mobile and desktop apps, seems to work really well. You wouldn't build a native UI using Golang of course but for low-level stuff it's fantastic. Tinygo even allows you to write Golang for microcontrollers or the web via Webassembly, lots of things aren't supported there but a large part of the standard library is.
- ducktapeofficer 9 hours ago
  
  Saying a the language adds 10-20 mb and go on to say it's almost nothing is avoiding the issue raised. The footprint always matters and we should use the right tool for the right job.
  - packetlost 5 hours ago
    
    It's not ignoring it, it's saying that 20Mb of data isn't really a lot these days which is objectively true for most contexts.
materielle 3 hours ago

I love Go for writing servers. And in fact, I do it professionally. But I totally agree that for parsers, it’s not the right tool for the job.
First off, the only way to express union types is with runtime reflection. You might as well be coding in Python (but without the convenient syntax sugar).
Second off, “if err != nil” is really terrible in parsers. I’m actually somewhat of a defender of Go’s error handling approach in servers. Sure, it could have used a more convenient syntax. But in servers, I almost never return an error without handling it or adding additional context. The same isn’t true in parser’s though. Almost half of my parser code was error checks that simply wouldn’t exist in other languages.
For Rust, I think the value proposition is if you are also writing a virtual machine or an interpreter, your compiler front end can be written in the same language as your backend. Your other alternatives are C and C++, but then you don’t have sum types. You could write the front end in Ocaml, but then you would have to write the backend and runtime in some other language anyways.
packetlost 5 hours ago

Go is not a great language to write parsers in IMO, it just doesn't have anything that makes such a task nice. That being said, people seem to really dislike Go here, which is fine, but somewhat surprising. Go is extremely simple. If you take a look at it's creators pedigree, that should make a ton of sense: they want you to make small, focused utilities and systems that use message passing (typically via a network) as the primary means of scaling complexity. What I find odd about this is that it was originally intended as a C++ replacement.
- dinosaurdynasty 4 hours ago
  
  Go is simple at the cost of increasing the complexity of stuff written in it.
- marcosdumay 4 hours ago
  
  Message passing is an horrible means of scaling complexity, unless you are so big and have so many developers working with you that you can't use anything else.
MrMcCall 4 hours ago

(OCaml Question Ahead)
I agree on F#. It changed my C && OO perspective in fantastic ways, but I too can't support anything Microsoft anymore.
But, seeing as OCaml was the basis for F#, I have a question, though:
Does OCaml allow the use of specifically sized integer types?
I seem to remember in my various explorations that OCaml just has a kind of "number" type. If I want a floating point variable, I want a specific 32- or 64- or 128-bit version; same with my ints. I did very much like F# having the ability to specify the size and signedness of my int vars.
Thanks in advance, OCaml folks.
- neonsunset 3 hours ago
  
  F# is a far better option from a practical standpoint when compared to alternatives. By simple virtue of using .NET and having access to very wide selection of libraries that make it a non-issue when deciding to solve a particular business case. It also has an alternate compiler Fable which can target JS allowing the use of F# in front-end.
  Other options have worse support and weaker tooling, and often not even more open development process (e.g. you can see and contribute to ongoing F# work on Github).
  This tired opinion ".net bad because microsoft bad" has zero practical relevance to actually using C# itself and even more so F# and it honestly needs to die out because it borders on mental illness. You can hate microsoft products, I do so too, and still judge a particular piece of techology and the people that work on it on their merits.
pimeys 10 hours ago
OCaml has been pretty common tool to write parsers for many years. Not a bad choice.
I've written parsers professionally with Rust for two companies now. I have to say the issues you had with the borrow checker are just in the beginning. After working with Rust a bit you realize it works miracles for parsers. Especially if you need to do runtime parsing in a network service serving large traffic. There are some good strategies we've found out to keep the borrow checker happy and at the same time writing the fastest possible code to serve our customers.
I highly recommend taking a look how flat vectors for the AST and using typed vector indices work. E.g. you have vector for types as `Vec<Type>` and fields in types as `Vec<(TypeId, Field)>`. Keep these sorted, so you can implement lookups with a binary search, which works quite well with CPU caches and is definitely faster than a hashmap lookup.
The other cool thing with writing parsers with Rust is how there are great high level libraries for things like lexing:
https://crates.io/crates/logos
The cool thing with Logos is it keeps the source data as a string under the surface, and just refers to a specific locations in it. Now use these tokens as a basis for your AST tree, which is all flat data structures and IDs. Simplify the usage with a type:
```
    #[Clone, Copy]
    struct Walker<'a, Id> {
        pub id: Id,
        pub ast: &'a Ast,
    }

    impl<'a, Id> Walker<'a, Id> {
        pub fn walk<T>(self, other_id: T) -> Walker<'a, T> {
            Walker { id: other_id, ast: self.ast }
        }
    }
```
Now you can specialize these with type aliases:
```
    type TypeWalker<'a> = Walker<'a, TypeId>;
```
And implement methods:
```
    impl<'a> TypeWalker<'a> {
        fn as_ref(&self) -> &'a Type {
            &self.ast[self.id]
        }
        
        fn name(&self) -> &'a str {
            &self.as_ref().name
        }
    }
```
From here you can introduce string interning if needed, it's easy to extend. What I like about this design is how all the IDs and Walkers are Copy, so you can pass them around as you like. There's also no reference counting needed anywhere, so you don't need to play the dance with Arc/Weak.
I understand Rust feels hard especially in the beginning. You need to program more like you write C++, but with Rust you are enforced to play safe. I would say an amazing strategy is to first write a prototype with Ocaml, it's really good for that. Then, if you need to be faster, do a rewrite in Rust.
- miningape 10 hours ago
  
  Thanks for your comment, you've given me a lot to chew on and I think I'll need to bookmark this page.
  > I've written parsers professionally with Rust for two companies now
  If you don't mind me asking, which companies? Or how do you get into this industry within an industry? I'd really love to work on some programming language implementations professionally (although maybe that's just because I've built them non-professionally until now),
  > Especially if you need to do runtime parsing in a network service serving large traffic.
  I almost expected something like this, it just makes sense with how the language is positioned. I'm not sure if you've been following cloudflare's pingora blogs but I've found them very interesting because of how they are able to really optimise parts of their networking without looking like a fast-inverse-sqrt.
  > There's also no reference counting needed anywhere, so you don't need to play the dance with Arc/Weak.
  I really like the sound of this, it wasn't necessarily confusing to work with Rc and Weak but more I had to put in a lot of extra thought up front (which is also valuable don't get me wrong).
  > I would say an amazing strategy is to first write a prototype with Ocaml, it's really good for that.
  Thanks! Maybe then the Rust code I have so far won't be thrown in the bin just yet.
  - pimeys 10 hours ago
    
    > If you don't mind me asking, which companies? Or how do you get into this industry within an industry? I'd really love to work on some programming language implementations professionally (although maybe that's just because I've built them non-professionally until now),
    You do not need to write programming languages to need parsers and lexers. My last company was Prisma (https://prisma.io) where we had our own schema definition language, which needed a parser. The first implementation was nested structures and reference counting, which was very buggy and hard to fix. We rewrote it with the index/walker strategy described in my previous comment and got a significant speed boost and the whole codebase became much more stable.
    The company I'm working for now is called Grafbase (https://grafbase.com). We aim to be the fastest GraphQL federation platform, which we are in many cases already due to the same design principles. We need to be able to parse GraphQL schemas, and one of our devs wrote a pretty fast library for that (also uses Logos):
    https://crates.io/crates/cynic-parser
    And we also need to parse and plan the operation for every request. Here, again, the ID-based model works miracles. It's fast and easy to work with.
    > I really like the sound of this, it wasn't necessarily confusing to work with Rc and Weak but more I had to put in a lot of extra thought up front (which is also valuable don't get me wrong).
    These are suddenly _very annoying_ to work with. If you come from the `Weak` side to a model, you need to upgrade it first (and unwrap), which makes passing references either hard or impossible depending on what you want to do. It's also not great for CPU caches if your data is too nested. Keep everything flat and sorted. In the beginning it's a bit more work and thinking, but it scales much better when your project grows.
    > Thanks! Maybe then the Rust code I have so far won't be thrown in the bin just yet.
    You're already on the right path if you're interested in Ocaml. Keep going.
    
    miningape 10 hours ago
    
    I should've expected prisma! It's actually my main "orm" for my TS web projects, so thanks for that! Also grafbase seems interesting, I've had my fair share of issues with federated apollo servers so it'd be interesting to check out.
    > If you come from the `Weak` side to a model, you need to upgrade it first (and unwrap), which makes passing references either hard or impossible depending on what you want to do.
    You're literally describing my variable environment, eventually I just said fuggit and added a bunch of unsafe code to the core of it just to move past these issues.
- packetlost 5 hours ago
  
  There's also the phenomenal pest library. It probably wouldn't be as fast, but I've found that usually parsing a performance critical part of a system. If it is, a manually writing the parser is definitely the way to go.
- marcosdumay 4 hours ago
  
  > Especially if you need to do runtime parsing in a network service serving large traffic
  Yeah, that's the focus of it, and the thing you can use Rust well.
  All the popular Rust parsing libraries aren't even focused on the use that most people use "parser" to name. They can't support language-parsing at all, but you only discover that after you spent weeks fighting with the type-system to get to the real PL problems.
  Rust itself is parsed by a set of specialized libraries that won't generalize to other languages. Everything else is aimed at parsing data structures.
  - pimeys 4 hours ago
    
    There is also the rust-analyzer which is a separate binary. Should compile with a stable rust compiler. I remember reading it's source together with the zig compiler. Both are quite impressive codebases.
PittleyDunkin 4 hours ago

The borrow checker is definitely a pain, but it stops being such a pain once you design your types around ownership and passing around non-owned pointers or references or indexes.
- lsllc 3 hours ago
  
  This. I've found the same, being effective in Rust really requires that you change your way of thinking about your data structures (and code layout). Once I realized that, I was no longer fighting the borrow checker and I've been about to build complex code that more or less worked immediately. As I look back on it, I think what a pain it would have been to write and debug in C, although doing it in C would appear to be "easier".
kemaru 5 hours ago

You wouldn't be losing FP niceness with Zig, and the pattern matching and enum situation is also similar to Rust. Even better, in a few areas, for example arbitrary-width integers and enum tagging in unions/structs. Writing parsers and low level device drivers is actually quite comfortable in Zig.
omginternets 5 hours ago

I had a similar journey of enlightenment that likewise led me to OCaml. Unless you're doing low-level systems programming, OCaml will give you the "if it compiles, it's probably right" vibe with much less awkward stuff to type.
almostdeadguy 7 hours ago

With some patience and practice, I think reasoning about borrows becomes second nature. And what it buys you with lexing/parsing is the ability to do zero-copy parsing.
FrustratedMonky 8 hours ago

FSharp is OCaml to great extent. So if you don't have the need to stay away from MS/.NET, it is more 'open source' than the rest of MS products. MS did release Fsharp with an Open Source License.
But, it does still run on .NET.
At this point, isn't every major language controlled by one main corporate entity?
Except Python? But Python doesn't have algebraic types, or very complete pattern matching.
- adsharma 4 hours ago
  
  I still believe that a variant of python that has algebraic types and pattern matching beats Rust for writing parsers quickly.
  My effort has been in adding these features to a front end language that transpiles to an underlying FP language, including but not limited to Rust.
smolder 10 hours ago

I think you missed something if you felt the borrow checker made things too hard. You can just copy and move on. Most languages do less efficient things anyway.
- miningape 10 hours ago
  
  Oh no, you're right - especially looking at my last few commits this is very much what some parts of the project became. And when I was looking at it I felt like I was throwing away so much of the goodness Rust provides and it really irritated me.
  Looking at Primeys' comment he actually gave some really interesting suggestions on how to manage this without needing Rc / weak pointers or copying loads of dynamic memory all over the place. Instead you have a flat structure of copy-able elements, giving you better cache locality and a really easy way to work with them.
wyager 4 hours ago

Having written probably several hundred kloc of both Haskell and OCaml, I strongly prefer Haskell. A very simple core concept wrapped in an extremely powerful shell. Haskell is a lot better for parsing tasks because (among other considerations) its more powerful type system can better express constraints on grammars.
neonsunset 8 hours ago

> I considered something like F# but I didn't like that it's tied to microsoft/.NET.
Could you explain your thought process when deciding to not use F# because it runs on top of .NET? (both of which are open-source, and .NET is what makes F# fast and usable in almost every domain)
- balencpp an hour ago
  
  I am genuinely curious too. .NET is a very mature, very performant runtime, and I think of F#, a beautiful, productive language, running on it a big pro. Perhaps things used to be different about/regarding Microsoft?
  - neonsunset 12 minutes ago
    
    Yeah. I'm having so much fun with F# that I absolutely did not anticipate. Sure, it's something everyone using .NET knows about but I genuinely underestimated it and wish more people gave it a try. Such a good language.
    As for the hate - my pet theory is that developers need something like a sacrificial lamb to blame their misfortunes on, and a banner to rally under which often happens to be "against that other group" or "against that competing language", and because .NET is platform that happens to be made by microsoft and is a host for two very powerful multi-paradigm languages causes it to be a point of contention for many. From what I've seen, other languages do not receive so much undeserved hate and here on HN some like Go, Ruby or BEAM family receive copious amount of unjustified praise not rooted in technical merits.

noelwelsh 14 hours ago

This is, to me, an odd way to approach parsing. I get the impression the author is relatively inexperienced with Rust and the PL ideas it builds on.

A few notes:

* The AST would, I believe, be much simpler defined as an algebraic data types. It's not like the sqlite grammar is going to randomly grow new nodes that requires the extensibility their convoluted encoding requires. The encoding they uses looks like what someone familiar with OO, but not algebraic data types, would come up with.

* "Macros work different in most languages. However they are used for mostly the same reasons: code deduplication and less repetition." That could be said for any abstraction mechanism. E.g. functions. The defining features of macros is they run at compile-time.

* The work on parser combinators would be a good place to start to see how to structure parsing in a clean way.

huijzer 12 hours ago

> I get the impression the author is relatively inexperienced
The author never claimed to be an experienced programmer. The title of the blog is "Why I love ...". Your notes look fair to me, but calling out inexperience is unnecessary IMO. I love it if someone loves programming. I think that's great. Experience will come.
- guappa 11 hours ago
  
  If someone didn't study the state of the art of tokenising and parsing and still wants to write about it, it's absolutely ok to call it out as being written by someone who has only a vague idea of what they're talking about.
- benji-york 7 hours ago
  
  >> I get the impression the author is relatively inexperienced
  > calling out inexperience is unnecessary IMO. I love it if someone loves programming. I think that's great.
  I'll observe that the commenter did not make the value judgement about inexperience that you appear to think they did.
- palata 10 hours ago
  
  > calling out inexperience is unnecessary IMO
  I don't know the author, so it's useful for me to see in the comments that some people think they are not so experienced.
  Doesn't mean I won't respect the author at all, it's great that they write about what they do!
- robertlagrant 9 hours ago
  
  "calling out" is too fuzzy a term to be useful. It covers "mentioning" and "accusing". I wouldn't use it, for that reason.
  "unnecessary" is the same. Who defines what's necessary? Is Hacker News necessary?
  - kreetx 5 hours ago
    
    It's definitely necessary: it provides an answer for those who do have knowledge about parsing, read this and wonder why didn't the author do this other often used practice instead.
- kiayokomo 2 hours ago
  
  Strongly disagree. There should be a higher standard of articles. This amateur "look what I can do" is just noise. Here's an idea, don't tell the world about what you've done unless it is something new. We don't care and it wastes our time and fills the internet with shit. Not everyone deserves a medal for pooping.
- hiddencost 9 hours ago
  
  Posts on here sometimes come from the world expert, and sometimes from enthusiastic amateurs.
  I wrote a compiler in school many years ago, but besides thinking "this project is only one a world class expert or an enthusiastic amateur would attempt", I wasn't immediately sure which I was dealing with.
soegaard 4 hours ago

> * "Macros work different in most languages. However they are used for mostly the same reasons: code deduplication and less repetition." That could be said for any abstraction mechanism. E.g. functions. The defining features of macros is they run at compile-time.
In the context of the blog post, he wants to generate structure definitions. This is not possible with functions.

ryandv 15 hours ago

I don't know. Having written a small parser [0] for Forsyth-Edwards chess notation [1] Haskell takes the cake here in terms of simplicity and legibility; it reads almost as clearly as BNF, and there is very little technical ceremony involved, letting you focus on the actual grammar of whatever it is you are trying to parse.

[0] https://github.com/ryandv/chesskell/blob/master/src/Chess/Fa...

[1] https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notati...

PittleyDunkin 15 hours ago

Haskell definitely takes the cake in terms of leveraging parser combinators, but you’re still stuck with Haskell to deal with the result.
- wesselbindt 12 hours ago
  
  That's what they call a "win-win".
- pjmlp 13 hours ago
  
  For some of us, "being stuck with Haskell" isn't a problem.
  - anilakar 12 hours ago
    
    For the rest, being stuck with real-world problems instead of self-inflicted ones is preferable :-)
    
    pjmlp 8 hours ago
    
    Doesn't seems so, otherwise the computing world wouldn't be so full of NIH syndrome. :)
    
    instig007 10 hours ago
    
    re-inventing known language features in inferior languages isn't more real-world, it's a self-inflicting kool-aid thirst ^_^
    
    HKH2 9 hours ago
    
    Has Cabal been fixed yet?
    
    kreetx 5 hours ago
    
    Yes, a long time ago [0]. Depending on your needs, stack might still have advantages as the direct tool used by the developer (as it uses cabal underneath anyway).
    [0] https://stackoverflow.com/a/51016806/4126514
    
    instig007 5 hours ago
    
    At least since August 2017: https://downloads.haskell.org/~cabal/Cabal-3.0.0.0/doc/users...
    You don't need switching to Stack (as other commenters suggest) to have isolated builds and project sandboxes etc. If you want to bootstrap a specific compiler version, a-la nvm/pyenv/opam, use GHCup with Cabal project setup: https://www.haskell.org/ghcup/
    
    ryandv 5 hours ago
    
    Yes. Use stack [0].
    [0] https://docs.haskellstack.org/en/stable/
  - PittleyDunkin 4 hours ago
    
    I like haskell a lot, but it's not like there's any shortage of reasons why people don't use it. Replicating parser-combinators in other languages is a huge win.
- zxexz 11 hours ago
  
  As someone who really enjoys Haskell, I used to think like that. But I realized for problems like parsing, it really is just excellent.
- instig007 13 hours ago
  
  don't make it sound as if it's bad, it's actually superb on all these levels: the typelevel, the SMP runtime, and throughput.
- fuzztester 12 hours ago
  
  $ echo "Haskell" | sed 's/ke/-ki'
  Has-kill
  $
  - fuzztester 12 hours ago
    
    | sed '/k/sk'
    Has-skill
    $
    
    orf 12 hours ago
    
    Write the full transform in Haskell?
    
    itishappy 5 hours ago
    
    {-# LANGUAGE OverloadedStrings #-} import Prelude hiding (putStrLn) import Data.Text (Text, replace) import Data.Text.IO (putStrLn) transform :: Text -> Text transform = replace "k" "sk" . replace "ke" "-ki" main :: IO () main = putStrLn $ transform "Haskell"
    
    mkl 10 hours ago
    
    sed: -e expression #1, char 5: unterminated `s' command
    It's like this:
    sed 's/find/replace/'
    
    fuzztester 6 hours ago
    
    Whoops, it was a typo. I do know how to use the sed command, at least the basics; see my previous use of it ( https://news.ycombinator.com/item?id=42084984 ). But thanks, good catch.
nine_k 15 hours ago

But this is not unaided Haskell, it's a parser combinator library, isn't it?
Do you see an obvious reason why a similar approach won't work in Rust? E.g. winnow [1] seems to offer declarative enough style, and there are several more parser combinator libraries in Rust.
[1]: https://docs.rs/winnow/latest/winnow/
- codebje 14 hours ago
  data Color = Color { r :: Word8 , b :: Word8 , c :: Word8 } deriving Show hex_primary :: Parser Word8 hex_primary = toWord8 <$> sat isHexDigit <*> sat isHexDigit where toWord8 a b = read ['0', 'x', a, b] hex_color :: Parser Color hex_color = do _ <- char '#' Color <$> hex_primary <*> hex_primary <*> hex_primary
  Sure, it works in Rust, but it's a pretty far cry from being as simple or legible - there's a lot of extra boilerplate in the Rust.
  - jeroenhd 12 hours ago
    
    I think it's a stretch to call parser combinator code in Haskell simple or legible. Most Haskell code is simple and legible if you know enough Haskell to read it, but Haskell isn't exactly a simple or legible language.
    Haskell demonstrates the use of parser combinators very well, but I'd still use parser combinators in another language. Parser combinators are implemented in plenty of languages, including Rust, and actually doing anything with the parsed output becomes a lot easier once you leave the Haskell domain.
    
    kreetx 5 hours ago
    
    I'd say Haskell is even simpler than Rust: the syntactic sugar of monads/do-notation makes writing parsers easy. The same sugar transfers to most other problem domains.
  - orf 12 hours ago
    
    The nom crate has an RGB parser example: https://docs.rs/nom/latest/nom/#example
    It’s slightly longer, but more legible.
- mrkeen 13 hours ago
  
  But it doesn't take much to go from 0 to a parser combinator library. I roll my own each year for advent of code. It starts at like 100 lines of code (which practically writes itself - very hard to stray outside of what the types enforce) and I grow it a bit over the month when I find missing niceties.
lynx23 13 hours ago

I wouldn't consider FEN a great parsing example, simply because it can be implement in a simple function with a single loop.
Just a few days ago, I wrote a FEN "parser" for an experimental quad-bitboard impelementation. It almost wrote itself.
P.S.: I am the author of chessIO on Hackage

gritzko 13 hours ago

I have the experience of writing parsers (lexers) in Ragel, using Go, Java C++, and C. I must say, once you have some boilerplate generator in place, raw C is as good as the Rust code the author describes. Maybe even better because simplicity. For example, this is the most of code necessary to have a JSON parser: https://github.com/gritzko/librdx/blob/master/JSON.lex

In fact, that eBNF only produces the lexer. The parser part is not that impressive either, 120 LoC and quite repetitive https://github.com/gritzko/librdx/blob/master/JSON.c

So, I believe, a parser infrastructure evolves till it only needs eBNF to make a parser. That is the saturation point.

dvdkon 12 hours ago

That repetitivness can be seen as a downside, not a virtue. And I feel that Rust's ADTs make working with the resulting syntax tree much easier.
Though I agree that a little code generation and/or macro magic can make C significantly more workable.
djoldman 8 hours ago
I love love love ragel.
Won't the code here:
https://github.com/gritzko/librdx/blob/master/JSON.lex
accept "[" as valid json?
```
   delimiter = OpenObject | CloseObject | OpenArray | CloseArray | Comma | Colon;
   primitive = Number | String | Literal;
   JSON = ws* ( primitive? ( ws* delimiter ws* primitive? )* ) ws*;
   Root = JSON;
```
(pick zero of everything in JSON except one delimiter...)
I usually begin with the RFCs:
https://datatracker.ietf.org/doc/html/rfc4627#autoid-3
I'm not sure one can implement JSON with ragel... I believe ragel can only handle regular languages and JSON is context free.
- gritzko 3 hours ago
  
  That is a lexer, so yes, it accepts almost any sequence of valid tokens. Pure Ragel only parses regular languages, but there are ways.

tptacek 17 hours ago

So, just to kick this off: I wrote an eBPF disassembler and (half-hearted) emulator in Rust and I also found it a pleasant language to do parsing-type stuff in. But: I think the author cuts against their argument when they manage to necessitate a macro less than 1/6th of the way into their case study. A macro isn't quite code-gen, but it also doesn't quite feel like working idiomatically within the language, either.

Again: not throwing shade. I think this is a place where Rust is genuinely quite strong.

thesz 15 hours ago

How can one define an infinite grammar in Rust?
E.g., a context-free rule S ::= abc|aabbcc|aaabbbccc|... can effectively parse a^Nb^Nc^N which is an example of context-sensitive grammar.
This is a simple example, but something like that can be seen in practice. One example is when language allows definition of operators.
So, how does Rust handle that?
- ryandv 15 hours ago
  In Haskell I think it's something like:
  {-# LANGUAGE OverloadedStrings #-} import Data.Attoparsec.Text import qualified Data.Text as T type ParseError = String csgParse :: T.Text -> Either ParseError Int csgParse = eitherResult . parse parser where parser = do as <- many' $ char 'a' let n = length as count n $ char 'b' count n $ char 'c' char '\n' return n ghci> csgParse "aaabbbccc\n" Right 3
  - thesz 6 hours ago
    
    The question comes from Haskell, yes: https://byorgey.wordpress.com/2012/01/05/parsing-context-sen...
    You used monadic parser, monadic parsers are known to be able to parse context-sensitive grammars. But, they hide the fact that they are combiinators, implemented with closures beneath them. For example, that "count n $ char 'b'" can be as complex as parsing a set of statements containing expressions with an operator specified (symbol, fixity, precedence) earlier in code.
    In Haskell, it is easy - parameterize your expression grammar with operators, apply them, parse text. This will work even with Applicative parsers, even unextended.
    But in Rust? I haven't seen how it can be done.
- jeroenhd 12 hours ago
  Using parser combinator library "nom", this should probably do what you'd want:
  fn parse_abc(input: &str, n: usize) -> IResult<&str, (Vec<char>, Vec<char>, Vec<char>)> { let (input, result) = tuple(( many_m_n(n, n, char('a')), many_m_n(n, n, char('b')), many_m_n(n, n, char('c')) ))(input)?; Ok((input, result)) }
  It parses (the beginning of) the input, ensuring `n` repetitions of 'a', 'b', and 'c'. Parse errors are reported through the return type, and the remaining characters are returned for the application to deal with as it sees fit.
  https://play.rust-lang.org/?version=stable&mode=debug&editio...
  - oguz-ismail 10 hours ago
    
    > this should probably do what you'd want
    If you have to specify N, no, it doesn't
jamra 17 hours ago

Link us your eBPF disassembler if you can. Sounds cool.
- tptacek 16 hours ago
  
  It's not. If you wrote one, it'd be more interesting than mine.

sshine 6 hours ago

One mind-blowing experience for me:

I can take my parser combinator library that I use for high-level compiler parsers, and use that same library in a no-std setting and compile it to a micro-controller, and deploy that as a high-performance protocol parser in an embedded environment. Exact same library! Just with fewer String and more &'static str.

So toying around with compilers translates my skill-set rather well into doing embedded protocol parsers.

hu3 17 hours ago

Related, I love Rob Pike's talk about lexical Scanning in Go (2011).

Educational and elegant approach.

https://www.youtube.com/watch?v=HxaD_trXwRE

emmanueloga_ 15 hours ago

That talk is great, but I remember some discussion later about Go actually NOT using this technique because of goroutine scheduling overhead and/or inefficient memory allocation patterns? The best discussion I could find is [1].
Another great talk about making efficient lexers and parsers is Andrew Kelley's "Practical Data Oriented Design" [2]. Summary: "it explains various strategies one can use to reduce memory footprint of programs while also making the program cache friendly which increase throughput".
--
1: https://news.ycombinator.com/item?id=31649617
2: https://www.youtube.com/watch?v=IroPQ150F6c
- chubot 15 hours ago
  
  Yeah I actually remember that too, this article mentions it:
  Coroutines for Go - https://research.swtch.com/coro
  The parallelism provided by the goroutines caused races and eventually led to abandoning the design in favor of the lexer storing state in an object, which was a more faithful simulation of a coroutine. Proper coroutines would have avoided the races and been more efficient than goroutines.
tptacek 15 hours ago

I feel like that talk has more to do with expressing concurrency, in problems where concurrency is a natural thing to think about, than it does with lexing.

brundolf 14 hours ago

Something that was hard when I wrote a full AST parser in Rust was representing a hierarchy of concrete AST types, with upcasting and downcasting. I was able to figure out a way, but it required some really weird type shenanigans (eg PhantomData) and some macros. Looks like they had to do crazy macros here too

Curious what the rest of the prior art looks like

elcritch 14 hours ago

Hmmm, yeah Rust’s ADTs and matching syntax would be great. Until you got to the up/down casting. I’m inexperienced enough in Rust to know if there’s good ways to handle it. Dynamic traits maybe?
ainiriand 13 hours ago

Sorry to bother you, but would that be open-source by any chance? Is there any public repo available? Thank you.
- brundolf 35 minutes ago
  
  Yup! You can find it here: https://github.com/brundonsmith/bagel-rs/blob/master/src/mod...
  [trying to remind myself how this works because it's been a while]
  So it's got macros for defining "union types", which combine a bunch of individual structs into an enum with same-name variants, and implement From and TryFrom to box/unbox the structs in their group's enum
  ASTInner is a struct that holds the Any (all possible AST nodes) enum in its `details` field, alongside some other info we want all AST nodes to have
  And then AST<TKind> is a struct that holds (1) an RC<ASTInner>, and (2) a PhantomData<TKind>, where TKind is the (hierarchical) type of AST struct that it's known to contain
  AST<TKind> can then be:
  1. Downcast to a TKind (basically just unboxing it)
  2. Upcast to an AST<Any>
  3. Recast to a different AST<TKind> (changing the box's PhantomData type but not actually transforming the value). This uses trait implementations (implemented by the macros) to automatically know which parent types it can be "upwardly casted to", and which more-specific types it can try and be casted to
  The above three methods also have try_ versions
  What this means then is you can write functions against, eg, AST<Expression>. You will have to pass an AST<Expression>, but eg. an AST<BooleanLiteral> can be infallibly recast to an AST<Expression>, but an AST<Any> can only try_recast to AST<Expression> (returning an Option<AST<Expression>>)
  Another cool property of this is that there are no dynamic traits, and the only heap pointers are the Rc's between AST nodes (and at the root node). Everything else is enums and concrete structs; the re-casting happens solely with that PhantomType, at the type level, without actually changing any data or even cloning the Rc unless you unbox the details (in downcast())
  I worked in this codebase for a while and the dev experience was actually quite nice once I got all this set up. But figuring it out in the first place was a nightmare
  I'm wondering now if it would be possible/worthwhile to extract it into a crate
- norman784 8 hours ago
  
  I wrote my fairly share of parsers the last year, and the one I liked a lot is from Salsa examples, you can find it here[0].
  [0] https://github.com/salsa-rs/salsa/blob/e4d36daf2dc4a09600975...
- yu3zhou4 12 hours ago
  
  Maybe it can work as a quick glimpse into how parser and lexer can work in Rust https://github.com/jmaczan/0x6b73746b
  I wrote it long time ago and it’s not fully implemented tho

samsartor 3 hours ago

Imperative rust is really good for parsing, but you can also get a long way with regexes. Especially if you are just prototyping or doing Advent of Code.

I do still like declarative parsing over imperative, so I wrote https://docs.rs/inpt on top of the regex crate. But Andrew Gallant gets all the credit, the regex crate is overpowered.

djfobbz 5 hours ago

Sorry my OCD is kicking in but "Asterisk" is spelled wrong as "Asteriks" in your entire sample code.

nhatcher 11 hours ago

Well good luck parsing sqlite syntax! I had to write a (fairly small) subset sqlite parser for work a couple of years ago. I really like sqlite, it's always a source of inspiration.

The railroad diagrams are tremendously useful:

https://www.sqlite.org/syntaxdiagrams.html

I don't think the lemon parser generator gets enough credit:

https://sqlite.org/src/doc/trunk/doc/lemon.html

With respect of the choice of the language, any language with Algebraic Data Types would work great. Even Typescript would be great for this.

FWIW I wrote a small introduction to writing parsers by hand in Rust a while ago:

https://www.nhatcher.com/post/a-rustic-invitation-to-parsing...

ketzo 15 hours ago

So how do you debug code written with macros like this, or come into it as a new user of the codebase?

I’m imagining seeing the node! macro used, and seeing the macro definition, but still having a tough time knowing exactly what code is produced.

Do I just use the Example and see what type hints I get from it? Can I hover over it in my IDE and see an expanded version? Do I need to reference the compiled code to be sure?

(I do all my work in JS/TS so I don’t touch any macros; just curious about the workflow here!)

schneems 15 hours ago
Run:
```
    $ cargo expand
```
And you’ll see the resulting code.
Rust is really several languages, ”vanilla” rust, declarative macros and proc macros. Each have a slightly different capability set and different dialect. You get used to working with each in turn over time.
Also unit tests is generally a good playground area to understand the impacts of modifying a macro.
guitarbill 15 hours ago

rust-analyzer, the Rust LSP used in e.g. VSCode, can expand declarative and proc macros recursively.
it isn't too bad, although the fewer proc macros in a code base, the better. declarative macros are slightly easier to grok, but much easier to maintain and test. (i feel the same way about opaque codegen in other languages.)

WiSaGaN 16 hours ago

I think except macros, most of these features are ML family language features as well. Rust stands out because it can implement this in an efficient, zero overhead abstraction way.

kldx 15 hours ago

I like MegaParsec in haskell quite expressive, based on my limited experience using nom in Rust

sksxihve 16 hours ago

I've found that the logos crate is really nice for writing lexers in rust

https://docs.rs/logos/0.14.2/logos/

James_K 8 hours ago

Mentioning macros as a reason to love Rust goes against my experience with them.

tonyhart7 6 hours ago

I love using macros, writing it however

nicoco 5 hours ago

Every rust article: "Look how great this rust feature is and how clean and concise the resulting code is!"

Me: "How can a programming language be so damn complex? Am I just dumb?"

biorach 5 hours ago

There's plenty of complex programming languages out there. Some are worth putting the time into. If you can program well in some other language you can get your head around Rust - give it some time - it's worth it.

omani 14 hours ago

this is the third day in a row this article is being posted here.

this time it got traction. funny how HN works.

https://news.ycombinator.com/item?id=42055954

https://news.ycombinator.com/item?id=42058920

jamra 16 hours ago

Does anyone have a good EBNF notation for Sqlite? I tried to make a tree-sitter grammar, which produces C code and great Rust bindings for it. But they use some lemon parser. Not sure how to read the grammar from that.

mingodad 12 hours ago

The lemon tool that is used by SQLite can output the grammar as SQL database that you can manipulate. There is https://github.com/ricomariani/CG-SQL-author that goes way beyond and you'll need to create the Rust generation, you can play with it here with a Lua backend https://mingodad.github.io/CG-SQL-Lua-playground/ .
Also I'm collecting several LALR(1) grammars here https://mingodad.github.io/parsertl-playground/playground/ that is an Yacc/Lex compatible online editor/interpreter that can generate EBNF for railroad diagram, SQL, C++ from the grammars, select "SQLite3 parser (partially working)" from "Examples" then click "Parse" to see the parse tree for the content in "Input source".
I also created https://mingodad.github.io/plgh/json2ebnf.html to have a unified view of tree-sitter grammars and https://mingodad.github.io/lua-wasm-playground/ where there is an Lua script to generate an alternative EBNF to write tree-sitter grammars that can later be converted to the standard "grammar.js".
rstuart4133 15 hours ago
Not EBNF or anything standard, but possibly readable enough. It is an LR(1) grammar that has tested on all the test cases in Sqlite's test suite at the time:
https://lrparsing.sourceforge.net/doc/examples/lrparsing-sql...
The grammer contains things you won't have seen before, like Prio(). Think of them as macros. It all gets translated to LR(1) productions which you can ask it to print out. LR(1) productions are simpler than EBNF. They look like:
```
   symbol1 := symbol2 symbol3
   symbol1 := symbol4 symbol3
   symbol3 := token1 symbol2 token2
   ...
```
Documentation on what the macros do, and how to get it to spit out the LR1(1) productions is here:
https://lrparsing.sourceforge.net/doc/html/
It was used to do a similar task the OP is attempting.
- jamra 3 hours ago
  
  This is great. Do you have any pointers to where those tests are? It’s hard to test the grammar without those.
  Edit: Never mind. I see it right there under the parser. Thanks!
andrewflnr 16 hours ago

It looks pretty much like BNF. Not too far off, anyway. https://sqlite.org/src/doc/trunk/doc/lemon.html#syntax
emmanueloga_ 14 hours ago

Perhaps this ANTLR v4 sqlite grammar? [1]
--
1: https://github.com/antlr/grammars-v4/tree/master/sql/sqlite
- jamra 4 hours ago
  
  I actually have some experience porting these antlrs over to tree-sitter. I'll give it a shot.

ForHackernews 11 hours ago

I'll throw in a plug for https://pest.rs/ a PEG-based parser-generator library in Rust. Delightful to work with and removes so much of the boilerplate involved in a parser.

tda 10 hours ago

I have been using this tool. The best feature imho is that you can quickly iterate on the grammar in the browser using the online editor in the homepage.
I was struggling though with the lack of strong typing in the returned parse tree, though I think some improvements have beenade there which I did not have a chance to look into yet
- ForHackernews 10 hours ago
  
  That feature is on the roadmap for Pest 3: https://github.com/pest-parser/pest/issues/882

jurschreuder 14 hours ago

I cannot agree less, C++ is the best and always will be. You youngsters made up this new dialect that can also compile with the C++ compiler. This is like people putting VS Code in dark mode thinking they're now also working in the Terminal like the Gods of Binary.

arlort 13 hours ago

Rust being a dialect of c++ is certainly a novel take
- tialaramex 5 hours ago
  
  I expect they are thinking of the "Safe C++" proposal P3390. This proposes to provide the syntax and other features needed to grant (a subset of the future) C++ the same safety properties as safe Rust via an equivalent mechanism (a borrow checker for C++ and the lifetime annotations to drive it, the destructive move, the nominative typing and so on).
  Much as you might anticipate (although perhaps its designer Sean Baxter did not) this was not kindly looked upon by many C++ programmers and members of WG21 (the C++ committee)
  The larger thing that "Safe C++" and the reaction to it misses is that Rust's boon is its Culture. The "Safe C++" proposal gives C++ a potential safety technology but does not and cannot gift it the accompanying Safety Culture. Government programmes to demand safety will be most effective - just as with other types of safety - if they deliver an improved culture not just technological change.
  - whytevuhuni 3 hours ago
    
    That sounds significantly more like C++ trying to be a dialect of Rust, rather than the other way around. I don't think that was the GGP's main gripe.
    But more importantly, Safe C++ is just not a thing yet. People seem to discount the herculean effort that was required to properly implement the borrow checker, the thousands of little problems that needed to be solved for it to be sound, not to mention a few really, really hard problems, like variance, lifetimes in higher-kinded trait bounds, generic associated types, and how lifetimes interact with a Hindley-Milner type system in general.
    Not trying to discount Safe C++'s efforts of course. I really hope they, too, succeed. I also hope they manage to find a syntax that's less... what it is now.
    
    tialaramex an hour ago
    
    I don't think Safe C++ has a Hindley-Milner type system? I think it's just the "Just the machine integers wearing funny hats†" types from C which were passed on to C++
    In K&R C this very spartan type system makes some sense, there's no resources, you're on a tiny Unix machine, you'd otherwise be grateful for an assembler. In C++ it does look kinda silly, like an SUV with a lawnmower engine. Or one of those very complicated looking board games which turns out to just be Snakes and Ladders with more steps.
    But I don't think Safe C++ fixes that anyhow.
    † Technically maybe the C pointer types are not just the integers wearing a funny hat. That's one of many unresolved soundness bugs in the language, hence ISO/IEC DTS 6010 (which will some day become a TR)
    
    whytevuhuni 23 minutes ago
    
    No, Safe C++ does not have that type system. I was just trying to emphasize the amount of, let's be honest, downright genius that had to go into that lifetime specification and borrow checker implementation.
    For C++, it'll be about cramming lifetimes into diamond-inheritance OOP, which... feels even harder.
    Safe C sounds like a much, much more believable project, if such a proposal were to exist.