Hardening mode for the compiler

discourse.llvm.org

146 points by vitaut 16 hours ago

wyldfire 12 hours ago

A really good accompaniment to this is Carruth's "C++, bounds checking, performance, and compilers" [1]:

> ... strong belief that bounds checks couldn’t realistically be made cheap enough to enable by default. However, so far they are looking very affordable. From the above post, 0.3% for bounds checks in all the standard library types!

There's more to the hardening story than just bounds checks. But it's a big part IMO.

[1] https://chandlerc.blog/posts/2024/11/story-time-bounds-check...

tempodox 12 hours ago

Even if bounds checks were only active in debug builds, that would already be of high value.
- delta_p_delta_x 8 hours ago
  > Even if bounds checks were only active in debug builds
  In MSVC or Clang, when compiled against the Microsoft C++ STL, they already are. So,
  auto x = std::vector{1, 2, 3, 4, 5}; std::println("{}", x[5]);
  throws a very specific exception at runtime under debug mode.
  In fact on Windows, even the C runtime has debug checks. That's why there are four options to choose from when linking against the modern UCRT:
  /MT (static linking, no debug) /MTd (static linking, with debug) /MD (dynamic linking, no debug) /MDd (dynamic linking, with debug)
  For what 'debug in the C runtime' entails, see this comment I made a while ago[1]. As I mentioned, Unix-likes have no equivalent; you get one libc, and if you want to statically link against it, you have to release your own source code because it's GPL. Not sure why people put up with it.
  [1]: https://news.ycombinator.com/item?id=40361096
  - uecker 7 hours ago
    
    Not sure what you mean by "Not sure why people put up with it". Glibc is licensed under LGPL, so you can distribute it proprietary software even with static linking under some conditions. And there also other alternatives.
  - fweimer 5 hours ago
    
    The GNU equivalent is -D_GLIBCXX_ASSERTIONS: https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_macros...
    It mostly impacts templated code, so it's a compiler flag, not a linker flag. Many distributions have been using this flag to build C++ code for quite some time.
    (And this concerns GNU libstdc++, not glibc, so different licensing rules apply.)
    
    delta_p_delta_x 4 hours ago
    
    A considerable number of C++ libraries are header-only, so having this macro defined is the onus of the libraries' consumers, rather than just the package managers'.
    Also, I mentioned no libc equivalent, and that remains true. Regardless of the libc distribution (glibc, musl, BSD, macOS libSystem), none of them have a debug mode in the vein that Windows UCRT does.
- pjmlp 11 hours ago
  
  That at least has been covered almost since C++ exists.
  First in compiler vendors frameworks, pre C++98, afterwards with build settings.
  It is quite telling from existing community culture, that some folks only read their compiler manuals when government knocks on the door.
  - tester756 9 hours ago
    
    >It is quite telling from existing community culture, that some folks only read their compiler manuals when government knocks on the door.
    What do you want to say?
    Is this bad? I think this is desired. Only in c or c++ world people act like understanding how compiler internals work (often poorly) is desired
    
    pjmlp 9 hours ago
    
    Where in the world reading a compiler manual means understanding compiler internals?!?
    One does not need to understand compiler internals to be aware what build flags are used to turn bounds checking on the standard library.
    
    RossBencina 7 hours ago
    
    > Only in c or c++ world people act like understanding how compiler internals work (often poorly) is desired
    I think this says more about other parts of the developer ecosystem than about C and C++. Understanding how the compilers work (and how CPUs work) is fundamental to software development.
    
    throw-qqqqq 7 hours ago
    
    I have never known a situation where LESS knowledge about the compiler (flags, options, hell even internal workings) have been better - on the contrary.
- rwmj 7 hours ago
  
  This is why you should have an option (enabled routinely in your CI) to run your tests under valgrind.
  - throw-qqqqq 6 hours ago
    
    I do this too, but valgrind is slow! In my experience, the runtime increases a factor of 10/20/30x ..
    
    rwmj 6 hours ago
    
    Yes, it's definitely slow. That's why we have an option to run the same tests normally or with valgrind, and mainly use valgrind runs in our CI system after changes have been committed.
    
    thefaux an hour ago
    
    I can't help but notice the subtle ways in which git has corrupted our understanding of the word commit.

another_twist 10 hours ago

Maybe an easier way out is to add safe access instructions to LLVM itself. Its an IR after all, it should be possible to do a 3 phase update - add instructions to the IR, update the intermediate LLVM generator, then update the targetting backends.

rurban 8 hours ago

They should also turn off the C11 Unicode identifier bugs with -fhardened, which enabled homoglyph attacks. There is no plan for C26 to fix this. No unicode identifiers without proper security measures

rwmj 7 hours ago

What is the threat profile here? I don't understand how this would be exploited in the real world. Once you're linking to a library, there are so many ways for the library to exploit your main program (eg. by running arbitrary code in constructors).
- rurban 6 hours ago
  
  https://github.com/rurban/libu8ident
  Search for homoglyph attacks and the unicode security guidelines for identifiers
  - rwmj 6 hours ago
    
    OK that is pretty interesting. For the TL;DR crowd, the exploit was:
    if(environmentǃ=ENV_PROD){ // bypass authZ checks in DEV return true; }
    where the 'ǃ' is a Unicode homoglyph (U+1C3 "LATIN LETTER ALVEOLAR CLICK") which obviously completely changes the nature of the code.
    I'll note that GCC gives a clear warning here ("suggest parentheses around assignment used as truth value"), so as always, turn on -Werror and take warnings seriously!
    
    quuxplusone an hour ago
    
    The shown code is JavaScript; it wouldn't compile as C, because "environment[alveolar-click]" was never declared, and C requires declare-before-use. Does the advice to use GCC -Werror still apply to JavaScript? (I'd guess no, but I don't know for sure if I'm missing something.)
    
    rwmj an hour ago
    
    It compiles fine as C (using gcc-15.1.1-2.fc43.x86_64). Here's the complete program that I tested before posting the comment above:
    int environmentǃ; int main() { if(environmentǃ=0){ // bypass authZ checks in DEV return 0; } return 1; }
    The output of GCC is:
    $ gcc -Wall test.c test.c: In function ‘main’: test.c:4:6: warning: suggest parentheses around assignment used as truth value [-Wparentheses] 4 | if(environmentǃ=0){ | ^~~~~~~~~~~~
    In a real exploit you'd have to be smarter about hiding the variable declaration (maybe in a library or something).

ajb 8 hours ago

In the long term, it might be best to disable the ability to switch off checks using command line flags (which usually means, the whole executable) and only allow it on individual functions. Although the current mechanism to switch them off per function isn't idiot proof either (you need to remember to "#pragma diagnostic pop" after ) - we really need to be able to do it in a function attribute.

dilawar 14 hours ago

> So this mode needs to set user expectations appropriately: your code breaking between compiler releases is a feature, not a bug.

Good luck. I feel that the C++ community values backward compatibility way too much for this to succeed. Most package maintainers are not going to like it a bit.

pjmlp 13 hours ago

There has been plenty of breakage throughout ISO revisions.
The biggest problem is ABI, in theory that isn't something that standard cares about, in practice all compiler vendors do, thus proposals that break ABI from existing binary libraries tend to be an issue.
Another issue is that WG21 nowadays is full of people without compiler experience, willing to push through their proposals, even without implementations, which then compiler vendors are supposed to suck it up and implement them somehow.
After around C++14 time, it became cool to join WG21 and now the process is completely broken, there are more than 200 members.
There is no guidance on an overall vision per se, everyone gets to submit their pet proposal, and then needs to champion it.
Most of these folks aren't that keen into security, hence the kind of baby steps that have been happening.
- dzaima 13 hours ago
  
  Compilers at least allow specifying the standard to target, which solves the ISO revision issue. But breaking within the same -std=... setting is quite a bit more annoying, forcing either indefinite patching on otherwise-complete functional codebases, or keeping potentially every compiler version on your system, both of which are pretty terrible options.
  - pjmlp 11 hours ago
    
    Breaking within the same std, is something impossible to prevent in compiled languages with enough freedom in build.
    Even the C ABI many talk about, most of them don't have any idea of what they are actually talking about.
    First of all, it is the OS ABI, in operating systems that happened to be written in C.
    Secondly, even C binary libraries have plenty of breakage opportunities within the same std, and compiler.
    ABI stability even in languages that kind of promise it, is in reality an half promise.
    Bytecode, or some part of the language is guaranteed to be stable, while being tied to a specific version, not all build flags fall under the promise, and not much is promised over the standard library.
    Even other good examples that go to great efforts like Java, .NET or Swift, aren't fully ABI safe.
    
    yjftsjthsd-h 10 hours ago
    
    > First of all, it is the OS ABI, in operating systems that happened to be written in C.
    It may be per-OS (I wouldn't try linking Linux and NT object files even if they were both compiled from C by GCC with matching versions and everything), but enough details come from C that I think it's fair to call it a C ABI. Like, I can write unix software in pascal, but in order to write to stdout that code is gonna have to convert pascal strings into C strings. OTOH, pascal binaries using pascal libraries can use pascal semantics even on an OS that uses C ABIs.
    
    pjmlp 10 hours ago
    
    Strings is the easy part.
    Try to link two binary libraries in Linux, both compiled with GCC, while not using exactly the same compiler flags, or the same data padding, for example things like structures.
    Since committee people can explain it even better,
    "To Save C, We Must Save ABI"
    https://thephd.dev/to-save-c-we-must-save-abi-fixing-c-funct...
    
    uecker 8 hours ago
    
    Sorry, this is nonsense. Binaries link just fine on Linux. If you use a compiler flag that changes the ABI, then you are on your own, of course, but the GCC docu makes it very clear which specific flags those are. There is some corner cases with problems where you get problems if you use different compilers, e.g. atomic alignment (by adopting the broken C++ design into C) and some other corner cases where compilers did things slightly different.
    
    RossBencina 7 hours ago
    
    > e.g. atomic alignment (by adopting the broken C++ design into C)
    I would like to learn more about that. Do you mean this:
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65146
    
    uecker 6 hours ago
    
    Things like this resulting in differnt alignment between Clang and GCC on x86_64 for _Atomic struct { char a[3]; }; See: https://godbolt.org/z/v5hsjhzj9
    The problem is that in C++ these atomics are library types, but in C they are built-in types which should have a clearly specified ABI. But the goal was to make them compatibility with C++ library types, which is a rather stupid idea, which pulls in even more problems.
    
    pjmlp 6 hours ago
    
    Assuming you have control over all binaries.
    Exactly because one gets issues with multiple compilers is yet another prof why there isn't such thing as official C ABI.
    
    uecker an hour ago
    
    There official platform ABIs. There are not called C ABIs though.
    
    dzaima 10 hours ago
    
    It's certainly not impossible to write code that breaks, or modify a library in an ABI-incompatible way, but ABI stability, at least on Linux, does largely Just Work™. A missing older shared library can be quite problematic, but that's largely it.
    And while, yes, there are times where ABIs are broken, compiler versions affecting things would add a whole another uncontrollable axis on top of that. I would quite like to avoid a world of "this library can only be used by code compiled with clang-25" as much as possible.
    
    pjmlp 10 hours ago
    
    Works most of the time, probably, isn't really the meaning of stable.
    
    dzaima 10 hours ago
    
    Can't solve the issue of "you just don't have the library (or a specific version thereof) installed".
    But you can make it worse by changing "You must have version X of library Y installed" to "You must have version X of library Y compiled by compiler Z installed".
    As-is, one can reasonably achieve ABI stability for their C library if they want to; really all it takes is "don't modify exposed types or signatures of exposed functions" and "don't use intmax_t", and currently you can actually break the latter.
    
    pjmlp 9 hours ago
    
    You forgot about binaries compiled with incompatible build or linker flags.
    There is a reason why commercial software has several combinations on their SDKs, for their libraries.
    Release, debug, multi-threaded, with math emulation, with fast math, specific CPU ISA with and without SIMD, and these are only the most common ones.
    
    dzaima 9 hours ago
    
    Release vs debug shouldn't affect ABI (or, at least, the library author can decide whether it does; all it takes is.. not having `#if DEBUG` in your exposed header files changing types or function signatures).
    Multi-threading doesn't affect ABI in any way at all.
    fast-math doesn't affect ABI (maybe you mean the setting of FTZ/DAZ? but modern clang & gcc don't do that either, and anyway that breaks everything float in general, ABI itself is one of the few float things that don't immediately break, really).
    Presence or absence of SIMD extensions, hard-float, or indeed any other form of extension, also doesn't modify the ABI by itself.
    There's a separate -mabi=... that controls hard-float & co, but generally people don't touch that, and those that do, well, have a clear indication of "abi" in "-mabi" that tells them that they're touching something about ABI. (SIMD does have some asterisks on passing around SIMD types, but gcc does give a -Wpsabi warning when using a natively-unsupported SIMD type in a function signature; and this affects only very specialized low-level libraries, and said functions should be marked via an attribute to assume the presence of the necessary intended extension anyway, and probably are header-only and thus unaffected in the first place)
    That said, it would probably make sense to have a way to configure -mabi at the function level (if this doesn't already exist).
    General CPU ISA is one thing that does inescapably affect ABI of compiled programs; but you can have a stable ABI within one ISA. But yes, there's the wider requirement of "You must have version X of library Y for ISA W installed", but yet "You must have version X of library Y for ISA W compiled by compiler Z installed" is still worse.
    
    pjmlp 5 hours ago
    
    Forgetting that C now has threading capabilities, and some of it can get exposed via ABI?
    C89 was long time ago.
    We are not talking about what gcc, clang do in their specific implementations, we are talking about C.
    All those examples with compiler flags are exactly workarounds around the one true ABI that C doesn't actually have.
    
    dzaima 5 hours ago
    
    I'm of course not saying that C has a single universal ABI. But for any single platform (OS+ISA) where it is possible and meaningful to have shared libraries in the first place, there's a pretty clear single ABI to follow, so the distinction is, for practical purposes, completely meaningless. (ok windows does have some mess, but that's windows for you, it'll achieve a mess in anything)
    Still have no clue what you mean by threading; sure, threads exist, even officially so in C11, but still just in completely no way whatsoever affect ABI any more than any other part of the standard library, i.e. "as stable as the stdlib impl is".
    
    imtringued 7 hours ago
    
    This is honestly what pisses me off about the whole ABI thing. The ABI is defined by the OS, not the compiler. The compiler just implements the ABI, but somehow everyone lets their OS be defined by what a particular C/C++ implementation does. This then leads to FFI realistically only being possible by using a C/C++ compiler for interfacing, which defeats the point of an OS wide ABI.
    
    1718627440 5 hours ago
    
    I would consider the compiler to be part of the OS. You can't use an OS, if there is no way to create programs for it.
  - dvtkrlbs 10 hours ago
    
    I wish the additional proposak that would add Eust like editions with the cpp moduled were expected. So sad it didnt pass.
  - charcircuit 12 hours ago
    
    Assuming the code is position independent why can't the linker translate the ABI?
    
    dzaima 12 hours ago
    
    Maybe some things could be translated by a linker, but a linker can't change the size/layout of an in-memory data structure, and there's no info on what to translate from, even if info was added on what to translate to, anyway.
    
    tempodox 12 hours ago
    
    Data sizes, alignment, the way stuff is loaded into registers, all that can change.
    
    tialaramex 2 hours ago
    
    My favourite weird ABI choice is: Who does what for a barrier? The barrier needs one party to do work for a full fence, but which party that should be doesn't matter... There can be an x86 FENCE over on the store side, or we can put the x86 FENCE on the load. We could do both but that's pointless, so don't do that. However if we do neither we haven't built a barrier and now there's probably a horrible bug.
    
    KingLancelot 5 hours ago
    
    [dead]
thefaux an hour ago

For many, backwards compatibility == long term employment.
porridgeraisin 10 hours ago

I don't like that statement (or that whole paragraph) one bit either. My packages breaking between compiler releases is most definitely a big fat bug.
If bounds checks are going to be added, cool, -fstl-bounds-check. Or -fhardened like GCC. But not by default.
Working existing code is working existing code, I don't care if it looks "suspicious" to some random guy's random compiler feature.
- convolvatron an hour ago
  
  I'm kind of with you on the coding-style warning flags. it does really bother me that some opinionated person has decided that the way I use parenthesis needs to be punished.
  but I totally disagree with your second point. running code often has real problems with race conditions, error handling, unwanted memory reuse, out of bounds pointers, etc. if a new version of the compiler can prove these things for me - that's invaluable.

hexagrams64 2 hours ago

[dead]