Reverse Engineering Vercel's BotID

99 points by hazebooth a day ago

ATechGuy 19 hours ago

> At the moment, it seems Basic mode is so basic that it allows everything to pass as human. That’ll likely change as they gather more telemetry to better identify what a bot signal looks like.

So they are basically collecting telemetry in the name of "free basic anti-bot" solution.

cchance 18 hours ago

free basic anti-bot solution that literally NEVER BLOCKS A BOT, like what the actual fuck

codedokode a day ago

Note that the bot detection script uses WebGL to obtain GPU name. I assume this (fingerprinting) is the most popular use of WebGL. Sad that independent browsers like Firefox do not supply fake values.

nullpt_rs 21 hours ago

Sadly, spoofing GPU vendor & renderer can be an even larger flag since they can hash the resulting image of the canvas to compare it with a database of collected fingerprints[0]
[0]: https://research.google/pubs/picasso-lightweight-device-clas...
- reaperducer 21 hours ago
  
  Until a major player gets on board. Then it works.
  Apple does this by sending an imposter user agent from Safari on iPads.
  If only that was expanded to iPhones, too. And then send rotating, or randomized user agents.
  - nerdsniper 21 hours ago
    
    Apple does it because they don’t have a vested financial interest in internet-wide tracking.
    Google does.
    And while Mozilla does too because the vast majority of their funding comes from Google, it’s more pertinent that they don’t have the market share to pull this off. Firefox would just stop working on major websites if they did this.
  - ZebulonP 14 hours ago
    
    Doesn't that just move the goal post though? Instead of using your GPU vendor for the fingerprint they can just hash the output canvas after they a bunch of odd rendering calls, getting a hash from the quirks of your graphics driver and GPU hardware.
- andrewmcwatters 20 hours ago
  
  It’s funny that trying to click on the Google Scholar link there falsely identifies me as a bot.
grishka 15 hours ago

IMO the use of <canvas> needs to be behind a permission prompt, the same as e.g. geolocation or WebRTC. Few websites actually need canvas/WebGL for legitimate purposes.
- chocolatkey 12 hours ago
  
  This would break way too many websites to be feasible. And if implemented, would be something requested on so many sites that users would learn to automatically say yes which would weaken the power of permission prompts in general.
  For example, almost every major Japanese book/comic site uses canvas in their e-reader
  - codedokode 8 hours ago
    
    The best solution would be if canvas only allowed displaying pixels on the page but not drawing (meaning you need to bring your own drawing library) so that it would be unusable for fingerprinting.

b0a04gl 19 hours ago

why is bot detection even happening at render time instead of request time. why can't tell you’re a bot from your headers, UA, IP, TLS fingerprint. imo making it a surveillance. 'you're a bot, ok not just go away, let’s fingerprint your GPU and assign you a behavioral risk score anyway'

n2d4 19 hours ago

It's really hard to detect it at request time. It's practically trivial for an attacker to fake headers to resemble a real browser.
- baby_souffle 17 hours ago
  
  You absolutely have options at request time. Arguably, some of the things you can only do at request time are part of a full and complete mitigation strategy.
  You can fingerprint the originating TCP stack with some degree of confidence. If the request looks like it came from a Linux server but the user agent says Windows, that's a signal.
  Likewise, the IP address making the request has geographic information associated with it. If my IP address says I'm in Romania but my browser is asking for the English language version of the page... That's a signal.
  Similar to basic IP/Geo, you can do DNS and STUN based profiling, too. This helps you catch people that are behind proxies or VPNs.
  To blur the line, you can use JavaScript to measure request timing. Proxies that are going to tamper with the request to hide its origins or change its fingerprint will add a measurable latency.
  - n2d4 16 hours ago
    
    None of these are conclusive by any means. The IP address check you mentioned would mark anyone using a VPN, or English speakers living abroad. Modern bot detection combines lots of heuristics like these together, and being able to run JavaScript in the browser (at render-time) adds a lot more data that can be used to make a better prediction.
  - cAtte_ 15 hours ago
    
    > If my IP address says I'm in Romania but my browser is asking for the English language version of the page... That's a signal.
    jesus christ don't give them ideas. it's annoying enough to have my country's language forced on me (i prefer english) when there's a perfectly good http header for that. now blocking me based on this?!
- indrora 17 hours ago
  
  Anubis does it pretty decently.
  - iovoid 14 hours ago
    
    Anubis is not meant to fully stop bots, only slow them down so they don't take down your service. This kind of bot detection is meant to prevent automation.