Show HN: Open-source Kibana alternative for logs and traces in ClickHouse

github.com

119 points by mikeshi42 a day ago

Hi HN, Mike and Warren here! We're excited to share some (early) work towards our next major version of HyperDX. HyperDX makes it easy to visualize/search logs & traces on top of Clickhouse (so incident & bug investigations hopefully go by a little easier). For example, if a team is thinking of migrating to Clickhouse for their observability data warehouse [1][2][3] usually due to cost or data privacy reasons, they can easily throw HyperDX on top to do the UI layer for analysis and dashboarding in a dev-friendly way (aka not needing to type paragraphs of SQL to find some logs)

Over the past year we've seen a ton of excitement in companies adopting Clickhouse-based observability stacks - but one of the biggest challenges we've seen is that the UI layer on top of Clickhouse is either clunky to use for observability use cases (ex. BI tools), or too tied to a specific ingestion architecture to scale to every use case (we used to be in this category!). For companies that needed more flexibility in how their data is ingested and stored (usually due to running at a large scale), there's really no good options for a developer experience (DX) focused observability layer on top of Clickhouse (Shopify spent 3 years building it in-house!)

Our current release works completely in the browser - and it does this by building on top of Clickhouse's HTTP interface, which our React app can directly talk to. This means you can actually try HyperDX in your browser on your own Clickhouse with no installation! This was fortunately easy for us to accomplish due to being full stack Typescript, making it incredibly easy to shift between server and client code. On top of this we've been spending time baking in performance optimizations to ensure that HyperDX can continue to leverage Clickhouse efficiently at larger data volumes. We do a few tricks like only fetching columns that are needed for the current search, and re-querying to expand the entire row if needed to fully leverage Clickhouse's columnar nature (40% faster, ymmv!) - or rewriting queries to use materialized columns to speed up Map column access when available (10x faster!).

On the DX side: we support querying using both Lucene (ex. `fullText property:value`) and SQL syntax. We've found the former to be our favorite for how concise it is. Similarly for charts, our chart builder has been upgraded to accept SQL expressions as well, so you can leverage the full power of SQL, while avoiding typing paragraphs of boilerplate SQL for time series data. We also make it easy to switch between UTC/local timestamps! Lastly, we've added high cardinality outlier analysis by charting the delta between outlier and inlier events (a la bubble up) - which we've found really helpful in narrowing down causes of regressions/anomalies in our traces.

We have a lot more planned for the full release - but wanted to get this out early to hear your feedback and opinions!

In Browser Live Demo: https://play.hyperdx.io/search

Github Repo: https://github.com/hyperdxio/hyperdx/tree/v2

Landing Page: https://hyperdx.io/v2

[1]: https://www.uber.com/blog/logging/ [2]: https://blog.cloudflare.com/log-analytics-using-clickhouse/ [3]: https://www.youtube.com/watch?v=LDj3_jMsCXg&list=PLvQF73bM4-...

lpammant 3 hours ago

Neat! I was looking to replace DataDog with an open source alternative. I'm collecting the logs and batch sending them to DataDog using their batch http-intake API. I'm looking for the quickest way to switch over - is there anything similar on HyperDX?

Also, I'd like to improve my observability using OTel in Cloudflare Workers but it looks like the example is out of date using an deprecated library which points to a new one to use instead. Might be worth updating the docs on that when you get a chance.

Deprecated: https://github.com/RichiCoder1/opentelemetry-sdk-workers New: https://github.com/evanderkoogh/otel-cf-workers

nh2 a day ago

Can you clarify: Does the full-text search for logs linearly search all logs like Loki does, or can it speed it up with an index?

The docs at https://www.hyperdx.io/docs/search don't seem to talk about this key design decision.

I have a couple 100 GB to few TB logs (all from `journald` or JSON lines), just want to store them forever, and find results fast when searching for arbitrary substrings.

Loki does not use an index, so it's pretty slow at finding results in TB-sized logs (does not return results within a few seconds, so it's not interactive).

https://quickwit.io is one thing I'm looking at integrating, that can solve much of the index-based log search.

(Note I'm not super familar with the capabilities of ClickHouse itself regarding indexed full-text search.)

  • mikeshi42 a day ago

    You'd generally add an index to your logs in Clickhouse to do searching (via ngram or token bloom filters typically: https://clickhouse.com/docs/en/optimize/skipping-indexes#blo...). There's other ways of indexing as well but that's generally the best for full text search. We use token bloom filter indexes today and find them quite effective (it can skip whole chunks of logs due to the bloom filter being able to say that a word did not appear in the chunk of logs).

    Indeed Loki is incredibly slow - Clickhouse is deployed for logging at scale (ex. trip.com is running a 50pb logging solution that allowed them to 4x their old ES cluster volume while also running queries 4-30x faster)

    • nh2 a day ago

      Thanks! When using full open-source HyperDX (beyond the Kibana part), inclusive of your choices of ingestion and controlling Clickhouse, does it set up the recommended indexes automatically?

      That is, is it a full drop-in for a typical Grafana + Loki deployment?

      For context, I'm currently following the approach described in https://xeiaso.net/blog/prometheus-grafana-loki-nixos-2020-1... where with ~40 lines of NixOS config it pushes my entire server cluster's systemd journald logs into Grafana.

      Roughly how much effort would one have to put in to achieve the same with HyperDX? If it's not too hard, I might get around to package it as a NixOS service.

      • mikeshi42 a day ago

        yes! the full stack includes our recommended schema which has the indexes set up - it's a drop in replacement for anything that would ingest Otel-based telemetry! If you already have Promtail setup - you might want to set up a collector or tweak the existing collector to take in Promtail via the Otel Loki Receiver: https://github.com/open-telemetry/opentelemetry-collector-co...

        Overall it doesn't sound very hard to me!

lunarcave 21 hours ago

A happy HyperDX customer here. Can't recommend it enough.

We wanted something good for tracing and logs, without the price tag we were used to from datadog. We've been pleasantly surprised by how easy it was to set up and start pumping telemetry.

The UI is super intuitive and the OOTB dashboards are great as well.

  • mikeshi42 19 hours ago

    Thank you, really appreciate feedback like that! :D

est a day ago

Hmm, this is not the "Kibana" alternative I imagined.

Kibana was supposed to be an easy UI. You go to Discover, and the data automatically shows in chronological order, I can explore it with different options.

Kibana is very suitable for non-tech or less-tech people. I hope your product find a clear target audience. With too much ES query JSON or SQL it would scare people off.

  • mikeshi42 a day ago

    Hrm while we aren't a 1:1 Kibana replacement today (we're not apples to apples since Kibana is locked into Elastic, whereas we're on Clickhouse) - I don't think we're too far off with our UI-based filters, Lucene filter language, and timestamp filtering/sorting/live tail.

    There is a setup modal (which is intended to be set up once by the data owner, similar to maybe how you'd set up some indexes in Elastic) - and afterwards the experience is similar IMO. If you're open to sharing more I'd love to learn more mike@hyperdx.io or if you want to open an issue/join our discord.

zX41ZdbW a day ago

It is actually really great! Works out of the box, does it with a single-page UI, and it is not slow. It's very close to a log viewer I always dreamed of. The UI is much better than Grafana.

I connected it to the system.text_log table, and it took zero time with no problems.

  • mikeshi42 a day ago

    thank you! means a lot coming from you ;)

    Speaking of the system tables - it's awesome how much telemetry is saved in there that helps us build a really powerful preset clickhouse monitoring dashboard (heavily inspired from the built-in clickhouse one of course). We figured that alone is quite useful for teams running any Clickhouse instance and want better insights into what's going on.

maxthegeek1 a day ago

We use HyperDX for our observability! We had been using google's observability suite before, because we're using GKE anyways, but HyperDX's search over traces is just waaaay better and I can't go back.

  • mikeshi42 a day ago

    Thank you Max! It's awesome to hear that :)

DAlperin a day ago

Super neat! Does the v2 branding mean that the more "fully featured" observability product is going away? Or is it all going to be rebuilt on top of clickhouse?

  • mikeshi42 a day ago

    Our v1 is completely built on Clickhouse! So v2 is making it more widely compatible to Clickhouse installations that aren't tied to our ingestion pipeline and schema. So if you're already on Clickhouse for obseravbility today, or have a preferred way of shipping data in, you can use us on top of Clickhouse now without throwing away your existing work.

    We're essentially making our existing product a lot easier to deploy into more complex observability setups based on Clickhouse - while also shipping a few new capabilities like SQL and event deltas while we're at it!

    • ayewo 17 hours ago

      Timeplus Proton [1] is an OSS fork of Clickhouse that adds support for streaming queries. Timeplus Proton is wire-compatible with Clickhouse and its streaming support makes the log tailing use case you mentioned above easy to setup:

      > - You can live tail events, I don't think Grafana for Clickhouse has that (I'm a big fan of tail -f)

      So it sounds like your v2 will work with any DB that is wire-compatible with Clickhouse, correct?

      [1] https://github.com/timeplus-io/proton

      • mikeshi42 8 hours ago

        Yup it'd work if the Clickhouse HTTP interface is preserved, along with the CH-specific functions/settings we leverage. (I'm not sure how deviated your fork is from CH and which version it's based on)

        Proton looks like a neat queue/db combo - I'm going to have to dive in deeper some time

mathfailure a day ago

I think your project needs a 'Comparison to Kibana' section. Sell your project to me: I am currently using Kibana, why should I switch?

  • mikeshi42 a day ago

    Totally fair! Here's a few on the top of my head, they're mostly about Elastic really but of course Kibana is only really useful on ES:

    1. At my last job we were running some of the largest elastic clusters of our hyperscaler's cloud - elastic is slow, expensive and finicky to operate at decent scale. We've found the exact opposite with Clickhouse, it's fast, easy to operate, and supports things like S3-backed storage directly in their open source product. As an example, Uber switched from Elastic -> Clickhouse and halved their infra footprint while increasing volume.

    2. Elastic is tricky to manage, field type conflicts come up super common at scale and are annoying to deconflict. Clickhouse is a lot more flexible in its schema to avoid those problems (and give you knobs to fine tune performance at a more granular level with their indexes/schemas)

    3. We allow for both SQL and Lucene, both are relatively "standard" languages that engineers are likely already familiar in one way or another. Compared to elastic moving to ES|QL, another vendor-specific language that will be difficult to onboard to. The last thing you want during an incident is trying to recall vendor-specific languages for querying that critical data!

    tl;dr - we try to make it easy to "do observability" on what we think is the best DB for observability today (Clickhouse), analogous to what Kibana did for ES.

  • dengolius 12 hours ago

    This world needs a new Kibana that is lightweight and not written in java/typescript.

jillesvangurp 18 hours ago

If you want an opensource / non AGPL licensed alternative for Kibana, Opensearch also includes a fork of Kibana in the form of Opensearch Dashboards.

Clickhouse not being Elastic/Opensearch based means they would need to reinvent that wheel in any case because Kibana cannot use Clickhouse for storage. So this isn't so much an alternative but an essential component to make Clickhouse useful. Since you can't use Kibana for that. From various accounts here; they seem to have done a decent job.

Of course the key strength of Kibana is that it builds on features that Elasticsearch has; like aggregations that are probably more limited in Clickhouse. Same with Opensearch Dashboards. It depends on your use case whether you actually need that of course.

One point of concern with Clickhouse is that, like Elastic, they require contributors to sign contributor agreements. This basically allows them to re-license the code base if they want to at some point. Which is of course what Elastic did several times now (they changed it back to AGPL a few weeks back). Like Elastic are well funded by VC money but still pre-IPO. Just saying that if you moved to Clickhouse because of the Elastic licensing debacle, you might just have moved that problem instead of solving it.

akdor1154 a day ago

Very interested - I'm currently toying with Grafana set up in the same way, i wonder how this compares?

  • mikeshi42 a day ago

    If you're using the Grafana Clickhouse plugin - a number of things we do differently to them today:

    - We support Lucene-based search, which means it's a lot easier to find the events you're looking for without needing to break into verbose SQL search. (Column/property/full text search are all super easy).

    - We're optimized exclusively for Clickhouse, which means we do a number of things to optimize the queries we run and that returns you a nice performance boost (we see a 2x perf boost, but this will vary a ton of data and queries). For example we allow you to do a search on a subset of columns (so the search is performant), and then click in and expand an entire row of interest on-demand (so we only do a SELECT * for a single row). This is also a much nicer DX than needing to specify every column you might need. We have a few other optimizations as well.

    - We have (what we think is) a nice chart builder - so you don't need to mess with template variables and macros to build a chart, but still lets you escape hatch out into SQL for the important bits.

    - We think our event deltas feature for analyzing traces is pretty neat - afaik this isn't something you get in Grafana. - You can live tail events, I don't think Grafana for Clickhouse has that (I'm a big fan of tail -f)

    Overall we focus on trying to bring an easy-to-use high-cardinality observability experience to Clickhouse, whereas Grafana seems to focus more on a highly SQL-dependent dashboard building experience (which has its own advantages of course).

    edit: fixed line breaks!

    • ChocolateGod 12 hours ago

      Do you support when using S3 object storage as a backing for Clickhouse?

      One of Grafana's advantages is its very low cost of running because you can send everything to object storage with very little configuration.

miah_ a day ago

Whats hilarious is that Kibana started out as a Open Source.

Hard to trust anything released as OSS these days that hits this site run by a for profit company.. Its all destined to have a rug pull after some VC funding. Considering HyperDX is a for profit company, I'm sure we won't have to wait long!

  • hinkley 3 hours ago

    Seems particularly true for tools that have operational implications. It's very easy to justify why something should be for-pay when it's indispensible day in and day out.

  • mathfailure a day ago

    What do we say in such cases? It was good while it lasted!

    Once that happens - eventually some new kids would appear on the block.

    Such is the life.

  • mikeshi42 a day ago

    Ahh yeah fwiw it wasn't _intended_ to be a dig at the open source status of Kibana - but rather we're open source and building on top of Clickhouse.

    On the commercial OSS side of things - I suspect the trend there is more nuanced than all OSS companies being suspect to the same problem, but rather companies that generally solve a "behind the API" problem are more susceptible to problems of cloud vendors taking their code and competing with them commercially (ex. if you're a DB like Redis, Mongo, Elastic - or a CLI like Terraform). We're building an end-user experience (more like Gitlab) - where experience differentiation matters a lot more than simple infrastructure hosting, something AWS is not particularly well suited at competing on!

    It's been 3 years of Gitlab post-IPO and they're still MIT, and that's the boat we're on as well :)

jakozaur a day ago

You can also use Quesma and real Kibana with ClickHouse too.

Disclaimer: Co-founder of Quesma.

  • mdaniel a day ago

    I didn't downvote you, but people who shill their wares without providing a link are just wasting everyone's time