zzyzxd an hour ago

Selfhosting is my hobby but I am also an SRE. I am hesitant to do this because the instruction is "too easy" -- "Simply open your firewall, download and run this installer.sh with sudo on your server and that's it!"[1].

How do I secure the webserver and the data? Where is the data on my disk? How to backup and restore? High availability?

There might be detailed documentation somewhere, or I can even read the code. But these are the important things an open source software should tell its users right off the bat.

1: https://github.com/bluesky-social/pds/blob/main/README.md

sureglymop 2 hours ago

It's great that you wrote this up!

One thing I have found with many open source/selfhostable projects is just how much running them yourself can vary. It can go from a simple compose file with everything included to having to dig for obscure services and piece together how they all form the whole.

For example, I recently looked into self hosting Zotero. It is so under documented and complex that there is almost no way one could self host that (even for just one user) without that being ones job. So one needs to make a distinction between something being open source and being feasible to use/maintain.

In the end I gave up with Zotero. Even though it could have replaced Obsidian Notes, Calibre and Syncthing all at once for me.

  • diggan 2 hours ago

    > For example, I recently looked into self hosting Zotero. It is so under documented and complex that there is almost no way one could self host that

    I've come across this a lot too. But what I've found is that it mostly applies to open source projects that offer a hosted paid version, so it kind of makes sense they'll make the experience slightly worse than it could be (consciously or subconsciously), as it pushes people to their hosted solution. I don't particularly like it though.

    Doesn't seem to be the case for Zotero specifically, but your comment reminded me that I've noticed this more often lately.

    • sbarre an hour ago

      Yeah I tend to use ease of install for community editions of hosted paid open source projects as the leading indicator of how seriously they invest in (and support) their free/community version..

  • __justplaying 2 hours ago

    Self-hosting/mirroring all these Bluesky components is currently a mixed bag as well though honestly the only outlier is the Relay, which is a beast. i currently have my copy of the PLC, a Jetstream with 2 days of data and a clone of the app on my laptop i play with sometimes and/or change things for an elaborate shitpost of Bluesky Nitro https://bsky.app/profile/alice.mosphere.at/post/3l7bpmmtiop2...

    I don't self-host my PDS yet because there is no migration path back yet (but there will be). Though maybe I'll just yolo one day and do it anyways.

98codes 2 hours ago

This is all academic for me until Bluesky gets the functionality to get an account back onto their main network, for DR if not peace of mind that an "undo" is possible.

  • diggan 2 hours ago

    Totally understandable. Personally I don't use Bluesky for anything vital, it's just data that the world wouldn't be better/worse without anyways, so I'm gonna go and give it a try even if there is no undo.

    I love that people even has the choice, so much better than not even being able to.

jazzyjackson an hour ago

Is it feasible to run a bluesky instance "on prem" and "offline" for instance as an airgapped corporate intranet ?

__justplaying 6 hours ago

author here, should you have questions!

  • moreati 2 hours ago

    What's in that 4.5 TB? e.g. message metadata? Message text? Media?

    What time window does it cover? A rolling N day window? Everything since year dot?

    Can it be pruned? e.g. only data of accounts followed or messages interacted with

  • theschmed 4 hours ago

    Thanks for making yourself available to answer questions! Hopefully this is not a dumb question.

    Is plc.directory a single point of failure for BlueSky users who want to take advantage of the benefits of a did:plc? And if so, is that a permanent thing or down the road will there be multiple interoperating did:plc directories?

    • __justplaying 4 hours ago

      yes it's a SPOF. not sure about the second question, but i do know there are plans to transfer its ownership to an independent foundation

      • pfraze 2 hours ago

        Transferring to an independent org is what we're talking about now, yes.

        The backstory to PLC is that we picked up the DID standard and looked for an existing registry-method that would satisfy requirements¹. None of them really did. We then surveyed mechanisms for decentralized operation: DHTs, open blockchains, permissioned blockchains, and federated databases. Of them, the two blockchain variants seemed perhaps promising, but still premature since (as of 2022) you there's cost variability due to load and in some cases bad transaction latency (eg 10 minutes).

        We decided the best decision was to create PLC, which matches all of the requirements except for longterm meta governance. The way we designed it was to make the registry mechanics transferrable to a different protocol in the future, so that if for instance we decided (say) a DHT was suitable (it's not) we'd be able to use the same identifiers but change resolution and mutations to a new process. Then we started talking to other SMEs to get their take.

        Ultimately the solution that's gotten the most favorable response has been setting up an ICANN-style independent organization to operate it. This can be joined with a couple of interesting systems, such as mirrors which tail a certificate-transparency-style audit log, and which could even serve as transaction witnesses to indicate when the core registry might be rejecting updates ("write censorship").

        What can I say, some things take time and stakeholder-building. Look up the history of DNS and Network Solutions Inc for a bit of a wild ride that people have forgotten about. One other thing I should point out is that the DID spec enables multiple registry methods. Atproto currently supports did:web, and if other methods show up which satisfy the requirements then we are interested.

        ¹ Secure against manipulation by the registry operators, longterm meta governance, highly available, reasonable transaction latency, reliably low cost that's not dogged by token speculation, low ecological impact.

        • jazzyjackson an hour ago

          Hey pfraze, forgive my ignorance but what role does DID serve that DNS doesn't? My favorite part about bsky is using TXT record to prove that I control my domain for username purposes, what's the downside to just generating a keypair, and using the fingerprint of the public key as my identity? (Maybe with some affordance for key rotation vis a vis KERI*) Not doubting youall weighed every possibility, just wondering what I'm missing

          *Key Event Receipt Infrastructure

          • steveklabnik an hour ago

            Not Paul, but DID is a stable ID over time, whereas dns is not. This lets you change your handle without the network losing track of who you are. I was @steveklabnik.bsky.social before I was @steveklabnik.com, and when I made the switch, all of my previous stuff was still there.

            This is a fun party trick in some sense, but also a real meaningful feature in another. If I ever decide to move from steveklabnik.com to steve.klabnik.com, a thing I have been considering for a few years, my stuff on @proto/Bluesky will be one of the only services that doesn't have the issue that's kept me from pulling the trigger: updating the entire world that that's where I am now.

            • kiitos 13 minutes ago

              DIDs are stable only in the context of a specific 'verifiable data registry' as the spec puts it.

              https://www.w3.org/TR/did-core/#dfn-verifiable-data-registry

              DIDs delegate trust and authority to a data registry, in exactly the same way that DNS delegates trust and authority to ~ICANN.

              The system model is exactly the same. The difference is only in the properties of the authoritative entity.

              • steveklabnik 7 minutes ago

                That's a good point: I was speaking in a more social manner. Because domains are human-readable, they tend to be used for humans. Bluesky could have chosen to just use domains, but I personally prefer that we have the additional layer of indirection. Plus like, you have the ability (at the low level, not really exposed in the UI in any meaningful way) to be multiple people: I can associate multiple domains with my DID.

                That said, you're not wrong that a registry is a registry.

            • pfraze 22 minutes ago

              Yes! And if this were not the case then account portability between PDS hosts would be really challenging. Same logic as keeping your phone number when you switch cell carriers

  • mintplant an hour ago

    What's the difference between social-app and the AppView?

    • pfraze 21 minutes ago

      social-app is the client side, AppView is the backend api surface

  • jervant 2 hours ago

    How are Direct Messages implemented in Bluesky if anyone can access a firehose of all network activity?

    • __justplaying 2 hours ago

      DMs are currently 1:1 only and closed source. They are working on/planning to build proper E2EE DMs that support group chats.

__justplaying 3 hours ago

How do I ask the mods to swap out the link to the actual post instead of my blog's front page?

(...also, the title, as the original has the caveat)

  • Jtsummers 3 hours ago

    It's likely the correct page was submitted. The correct page includes a canonical link in the HTML:

      <link rel="canonical" href="https://alice.bsky.sh"/>
    
    HN will replace submission links with the canonical link if it's found.
    • __justplaying 2 hours ago

      oh. time to look at the code of my blog...

  • paulgb 3 hours ago

    @dang a better URL would be https://alice.bsky.sh/post/3laega7icmi2q

    (I can't tell if Dan has an alert set up on his handle or whether he just sees everything, but hopefully that works :))

    • yorwba 3 hours ago

      dang doesn't have an alert and he doesn't see everything. https://news.ycombinator.com/item?id=41317232 The official way to contact the mods is in the footer, i.e. email hn@ycombinator.com

      • paulgb 2 hours ago

        Ah thanks, good to know. I guess I've just been lucky with it and developed a superstition that it works.

        • timerol 2 hours ago

          He is also extremely active here, so there's a good chance he reads and responds to a random comment without an email. But email is the approved (and fastest) way to go about it

  • dang 2 hours ago

    Fixed now!

jonstaab 3 hours ago

[flagged]

  • timerol 2 hours ago

    I'm sure there are HNers who built desktops with 8TB or 16TB hard drives, and have not (yet) needed the space for as many games and media as expected.

  • numpad0 2 hours ago

    8TB WD CMR is like $99, 2x48GB of DDR5 is ~$250. Memory and storage are currently way cheaper than many think it is.

  • __justplaying 2 hours ago

    didn't say it was cheap!

    • nightpool 2 hours ago

      But why is it required? Do you really need a copy of everyone's data locally? If the only way to self-host bluesky is to have an entire copy of the entire database, that seems like it's really bad from a scaling perspective.

      • jazzyjackson an hour ago

        "self host an entire copy of all user data" is a pretty cool capability to have, kind of proof that the infrastructure is really open and forkable. you seem to have misunderstood OPs goals. Serving your own data from a personal data server is a much less arduous affair.

      • half-kh-hacker 2 hours ago

        What else would "self-hosting all of Bluesky" mean other than a copy of the entire site? If you just want to participate in the network host a PDS, which only stores your own posts.

        • nightpool an hour ago

          Surely there's some middle ground between only hosting your own data and being reliant on another site to keep track of your following / followers and hosting a duplicate copy of the entire network?

          • steveklabnik an hour ago

            For sure. If you just want to host your own data, you can do that. A PDS for you and maybe some friends is very small and cheap to host.

            • nightpool an hour ago

              My understanding though is that having a PDS on its own is useless without an AppView to collect the data from the relay? Or am I misunderstanding the architecture here? https://docs.bsky.app/docs/advanced-guides/federation-archit...

              • steveklabnik 33 minutes ago

                I'm talking about the case where you wanted to run your own PDS and use all of the other infrastructure being run by Bluesky.

                If you fully want your own copy of everything, then you'd want to run a copy of everything. But you don't have to. It really depends on what your goals are. That's why the post is about the maximal scenario. "Just your own PDS" is the minimalist scenario. But I think it's the one that makes sense for 95% of users who want to self-host.

      • galactus 2 hours ago

        Uh, it is not required. You can run only a PDS if you want to self host your data and everything will work.

        But it is indeed very cool that you can actually host a relay if you want (for fun, learning, or whatever reason)

    • bombcar 2 hours ago

      Ten terabytes of spinning rust is only $100-$300 or so, that's not bad at all.

      • jonstaab 2 hours ago

        My point is not the current size, it's the eventual size if bluesky succeeds. Facebook ingests 100TB/day. Self-hosting a bluesky relay isn't (won't be) a thing.

        • galactus 2 hours ago

          It could be a thing. Not for individual tinkerers but for companies. The fact that today, with already 14 million users, is still possible for an individual to host it is amazing.