Show HN: Firecrawl-Simple – Stable fork of Firecrawl optimized for self-hosting

github.com

34 points by skeptrune 3 days ago

Firecrawl Simple is a stripped down and stable version of firecrawl optimized for self-hosting and ease of contribution.

The upstream firecrawl repo contains the following blurb:

>This repository is in development, and we're still integrating custom modules into the mono repo. It's not fully ready for self-hosted deployment yet, but you can run it locally.

Firecrawl's API surface and general functionality were ideal for our Trieve sitesearch product, but we needed a version ready for self-hosting that was easy to contribute to and scale on Kubernetes. Therefore, we decided to fork and begin maintaining a stripped down, stable version.

Fire-engine, Firecrawl's solution for anti-bot pages, being closed source is the biggest deal breaker requiring us to maintain this fork. Further, our purposes not requiring the SaaS and AI dependencies also pushes our use-case far enough away from Firecrawl's current mission that it doesn't seem like merging into the upstream is viable at this time.

ReD_CoDE 12 hours ago

As I see, you use Puppeteer, not Playwright

Also, both Firecrawl and Firecrawl Simple are really simple, and most importantly don't have proxy service which is the heart of any crawler and scraper

ramones13 3 days ago

Cool project, a mild pet peeve with this type of thing - I have to read 75% of the README before I find out what it even does. The first bits make a huge assumption about what the reader knows.

  • DriverDaily 3 days ago

    It's probably safe to assume that if you're looking for a fork of Firecrawl, you already know what Firecrawl does.

    • woleium 3 days ago

      until the fork becomes more or at least similarly popular.

  • skeptrune 3 days ago

    Good point, agreed. I assumed that most people looking at the repo would already be familiar with Firecrawl, but there should be at least a sentence or two explaining what it does regardless.