Hyperlight: Virtual machine-based security for functions at scale

opensource.microsoft.com

171 points by yoshuaw 8 months ago

yoshuaw 8 months ago

The Azure Upstream team has been working on a really fast hypervisor library written in Rust for the past three years. It does less than you'd conventionally do with hypervisors, but in turn it can start VMs around 2 orders of magnitude faster (around 1-2ms/VM).

I think this is really cool, and the library was just released on GitHub for anyone to try. I’m happy I got to help them write their announcement post — and I figured this might be interesting for folks here!

dangoodmanUT 8 months ago

Do you think requiring to use their packages is too limiting for widespread usage? Seems like you're forced to use Rust or C atm.
This seems like a limitation that sits in a somewhat unusable place: For something simple and platform-specific (e.g. a HTTP transform) we can just use JS where the boot time perf makes up for the execution perf, and for something more serious like a full-fledged API 120ms should be more than enough time (and we can preemtively scale as long as we're not already at 0)
- yoshuaw 8 months ago
  
  The way to think about Hyperlight is as a security substrate intended to host application runtimes. You’re right that the Hyperlight API only supports C and Rust today — but you can use that to for example load Python or JS runtimes which can then execute those languages natively.
  But we might be able to do even better than that by leveraging Wasm Components [1] and WASI 0.2 [2]. Using a VM guest based on Wasmtime, suddenly it becomes possible to run functions written in any language that can compile to Wasm Components — all using standard tooling and interfaces.
  I believe the team has a prototype VM guest based on Wasmtime working, but they still needed a little more time before it’s ready to be published. Stay tuned for future announcements?
  [1]: https://component-model.bytecodealliance.org/introduction.ht...
  [2]: https://wasi.dev
  - 0x457 8 months ago
    
    We went full circle - I remember using python wrapper to run Rust on AWS Lambda.

fwsgonzo 8 months ago

Looks like my TinyKVM project, except it runs specialized programs instead of regular ELFs? TinyKVM also runs functions, with a fast execution timeout. I proved that without I/O you can essentially run KVM programs with native performance, and sometimes more due to automatic hugepages. I measured LLMs to run at 99.7% native speed using eg. Mistral 7B. For example, the STREAM memory benchmark doesn't use hugepages by default, and so the terminal version runs slower than the TinyKVM version due to hugepage-tables, but of course runs at the same speed once you modify the benchmark to use the same advantage. However, it does require modifying the program.

See: https://ieeexplore.ieee.org/document/10475832

I also implemented VM resets using page-table rewrites and CoW memory sharing, so that no memory is shared across different requests. This can be implemented as tail-latency in a cache.

I ended up adding support for most languages. All the systems languages, Go, v8, LuaJit etc. Go was by far the most annoying to support as it uses signals.

generalizations 8 months ago

I don't have access to that paper - and when I looked for TinyKVM all I found was the rpi-based project that uses the other definition of KVM. Is your project online somewhere? Or is it proprietary?
- fwsgonzo 8 months ago
  
  I can't publish/open-source it, sadly. But the paper I can share: https://www.dropbox.com/scl/fi/38e0la5m6zkc04tlm03w8/Introdu...
  - bjconlan 8 months ago
    
    Also appreciate the reference. I just realized you're the libriscv author (and as pointed out includeOS contributor). Love all your work!
  - generalizations 8 months ago
    
    That's cool. Thanks dude.
zekrioca 8 months ago

Wouldn’t containers provide the same effect as TinyKVM?
- fwsgonzo 8 months ago
  
  Yes, if you don't need sandboxing then containers are just way easier to deal with. Although I wish they didn't use so much space.
  - zekrioca 8 months ago
    
    Why couldn’t one mathematically recreate the limitations of a VM through a namespace by means of SELinux?
    
    kevincox 8 months ago
    
    Because the Linux kernel is incredibly complicated and shouldn't be trusted as a strong security boundary. A simple hypervisor likes has far fewer vulnerabilities so is an easier to trust boundary. They are in very different security tiers.
    I would summarize as containers are good for mostly trusted isolation (teams within a company, purchased software) VMs are good for general untrusted software (different tenants in a cloud provider) and separate physical hardware is for scenarios where attacks are likely (military, known malicious code). Of course use cases are very fuzzy, but I wouldn't run fully untrusted code in the same kernel as anything of value.
intelVISA 8 months ago

Nice project, yeah this looks like a hobbled (in true MS fashion) version of TinyKVM!
Were you inspired by includeOS, Mirage, or similar?
- fwsgonzo 8 months ago
  
  I wrote the IncludeOS kernel bits

generalizations 8 months ago

> These micro VMs operate without a kernel or operating system, keeping overhead low. Instead, guests are built specifically for Hyperlight using the Hyperlight Guest library, which provides a controlled set of APIs that facilitate interaction between host and guest

Sounds like this is closer to a chroot/unikernel than a "micro VM" - a slightly more firewalled chroot without most of the os libs, or a unikernel without the kernel. Pretty sure it's not a "virtual machine" though.

Only pointing this out because these sorts of containers/unikernels/vms exist on a spectrum, and each type carries its own strengths and limitations; calling this by the wrong name associates it with the wrong set of tradeoffs.

wmf 8 months ago

I guess if it uses CR3 it's a "process" and if it uses VMLAUNCH it's a "VM".
- generalizations 8 months ago
  
  Heh. Going by that delineation we end up with very VM-ish containers and (now) very container-ish VMs. Though this seems like it's even more stripped down than a unikernel - which would also be a "VM" here.
0cf8612b2e1e 8 months ago

I thought a chroot was not considered a real security boundary?
- ronsor 8 months ago
  
  Chroot is a real security boundary as long as you use it properly. That said, namespaces on Linux are much superior at this point, so I can only recommend using `chroot` for POSIX compliance.
  - derefr 8 months ago
    
    chroot is great for all sorts of things, but they're not security-related.
    A lot of tools expect to do things to "your system" at absolute paths — chroot lets those tools operate against an explicitly wired-up semi-virtualized simulacra of your system, designed to pass through just the parts of those operations you want to your real host, while routing the rest of the effects into a "rootfs in a can", that you're either building up, or will immediately throw away.
    Think: debootstrap; or pivot-root; or mounting your rootfs to fix your GRUB config and re-run update-grub from your initramfs rescue shell.
- kevincox 8 months ago
  
  Yes. Anything that shares a kernel is a very weak security boundary as the kernel is complex and vulnerabilities are regularly discovered.
0x457 8 months ago

Okay, so every name in this post makes sense it's just some of these words started to mean very different things with time.
This is not a hypervisor or a vm manager. It's a library that lets you run a C or Rust function[1] inside a VM managed by platform hypervisor[2].
[1]: As in Function (computer programming) not a AWS Lambda.
[2]: KVM or mshv on Linux and Windows Hypervisor on linux

oneplane 8 months ago

So in essence, this is somewhere between a unikernel+firecracker combo and a WASM module, but using VT.

apitman 8 months ago

Don't see any mention of firecracker, which is the first thing I think of in this space. Anyone have a TL;DR comparison?

eyberg 8 months ago

Firecracker can run ordinary linux/GPOS vms and unikernels.
Unikernels can run inside of firecracker.
Unikernels are focused on single applications whereas general purpose operating systems are focused on multiple applications.
This is focused on running functions embedded inside a host program. So it is fairly different than other things out there and in a class of its own.
- ATechGuy 8 months ago
  
  > each function request to have its own hypervisor for protection.
  They are talking about isolating serverless functions, not host program functions. In that sense, it is exactly what Firecracker does for lambda functions
  - eyberg 8 months ago
    
    Firecracker boots up a runtime that has a full blown operating system in it - lambda just happens to call a known program with a known function. In that sense sure it provides similar functionality but it's really quite different. That's not what fly uses firecracker for, for instance.
    Qemu/firecracker are in the same space - this is different.
    These are most definitely in a different boat as you embed the guest functions inside the host program and then you register those functions. Taken from the readme:
    > The host can call functions implemented and exposed by the guest (known as guest functions).
    > Once running, the guest can call functions implemented and exposed by the host (known as host functions).
    This is more in the 'safe plugin' type of space. As with most things in this space - the best way to learn about them is to simply try it out.
    
    rwmj 8 months ago
    
    libkrun (on Linux) is probably a closer comparison (though still not quite the same). https://github.com/containers/libkrun
    
    stogot 8 months ago
    
    > The host can call functions implemented and exposed by the guest (known as guest functions).
    Can you explain this a bit more? Why/when would a developer want to do this? What’s the advantage over firecracker?
    
    dboreham 8 months ago
    
    It's faster (shorter start time).

spai2 8 months ago

How does the micro VM's guest API talk to the host process? Does the communication between the two have to go through the hypervisor?

spankalee 8 months ago

They mention that most guests are expected to run code in a VM/interpreter... I wonder if they have a build of V8 or JSC for their environment?

yoshuaw 8 months ago

I believe the team has a working build of JerryScript [1] to test out the C bindings, but I’m not sure that will be released.
My understanding is that work on the Wasmtime VM guest is ongoing, which will enable Hyperlight to run the StarlingMonkey engine [2]. This is a WebAssembly build of Firefox’s SpiderMonkey engine which was donated by Fastly to the Bytecode Alliance.
That said though, I agree it would be great to see runtimes like V8 and JSC run directly on Hyperlight. There are good reasons why people might prefer those over StarlingMonkey (compat comes to mind), and it would be neat to see how much faster they could start compared to conventional VM deployments.
[1]: https://jerryscript.net/
[2]: https://github.com/bytecodealliance/StarlingMonkey

u8080 8 months ago

So in general this is kludge to implement app isolation via "VM", because existing CPU architectures suck at isolating code?

sim7c00 8 months ago

i wondered how it worked in rust but the guest entrypoint>init>main is wrapped in unsafeblock as is a lot of other low level operations it does. interesting stuff

broknbottle 8 months ago

Cool to see them using just

7e 8 months ago

Use CHERI for this?

m3kw9 8 months ago

[flagged]