Read_869 - Mainline DHT — Censorship-Resistance Explained

February 14, 2025 00:52:44
Read_869 - Mainline DHT — Censorship-Resistance Explained
Bitcoin Audible
Read_869 - Mainline DHT — Censorship-Resistance Explained

Feb 14 2025 | 00:52:44

/

Hosted By

Guy Swann

Show Notes

I've wanted for a long time to understand how a DHT works, and how a P2P network is able to establish connections in a resilient, scalable, and decentralized way. So diving into it on the show and having to explain it, seemed the easy way to force me to figure it out. With the help of a fantastic article from the Pubky team, let's make sense of the censorship resistance of P2P networks.

Check out the original article Mainline DHT — Censorship-Resistance Explained by Severin Alexander Bühler. (Link: https://tinyurl.com/bdht6zs9)

Host Links

Check out our awesome sponsors!

Ready for best-in-class self custody?

Trying to BUY BITCOIN?

Bitcoin Games!

Bitcoin Custodial Multisig

Education & HomeSchooling

View Full Transcript

Episode Transcript

[00:00:00] Speaker A: The cost of launching an Eclipse attack skyrocketed. Gaining control over a specific bucket now requires access to a vast pool of unique IP addresses, an expensive and logistically challenging proposition. This breakthrough effectively eliminated the majority of Eclipse and Sybil attacks, transforming the mainline DHT into a far more resilient system. The best in Bitcoin made Audible I am Guy Swan and this is Bitcoin. [00:00:36] Speaker B: Audible what is up guys? Welcome back to Bitcoin. Audible I am Guy Swan, the guy who has read more about Bitcoin than anybody else you know. This episode is brought to you by the Jade plus hardware wallet. You can get 10% off with the code guy if you want to actually. [00:01:12] Speaker A: Hold your own keys and you want. [00:01:14] Speaker B: To use a sleek, good looking and easy to use package to do it. The Jade plus has been a fantastic addition to my little collection of hardware wallets. And then of course, if you're looking for a awesome mobile wallet with on chain and lightning self custodial, check out BitKit. That is the BitKit wallet link and details are in the show notes as well. And there may be a little Easter egg about that if you, you should. [00:01:41] Speaker A: Look out for a QR code in the videos. Check out YouTube. [00:01:44] Speaker B: I don't know, I don't know what's. I don't know what's up there but maybe, maybe you should. Maybe you should see. All right, we are diving into a piece today. I wanted to hit a few things with Pub key again and one of the reasons in particular I wanted to hit this piece is because one of the things that I have not, I. [00:02:03] Speaker A: Do not have a good grasp of myself or I've had a very, very vague idea of how it worked and. [00:02:09] Speaker B: If you force me to explain it I would have to tell you. Yeah, I really don't know much about it. [00:02:14] Speaker A: Is the DHT the distributed hash tables. [00:02:18] Speaker B: This is essentially the network structure for peer to peer networks for the BitTorrent network specifically. And there's a number of really, really interesting things about this but I've done a lot of digging and trying to. [00:02:31] Speaker A: Find like good pieces to read. [00:02:33] Speaker B: And the pub key guys, John or I don't think it's actually John Carvalho. [00:02:37] Speaker A: Did he publish this? [00:02:38] Speaker B: I don't think it's. Oh no, that's right, it's Severin, Severin from PubKey published this piece on their Medium page and he actually has a really great little Primer about the DHT, about the BitTorrent, the mainline DHT specifically. That's the one that Bittorrent uses kind of an overall idea of how it works, why it's censorship resistant and what the economics of an attack against the network actually are and kind of detailing. [00:03:07] Speaker A: Out the cost and how it has. [00:03:09] Speaker B: Built up the a defense against the type of attack that is most likely to occur for trying to censor the network network. And because we're in this space where you know, we're talking about nostr and. [00:03:23] Speaker A: Pubkey and Keat and the hole punch and pair stack and all of that. [00:03:27] Speaker B: Stuff, I feel like this is really relevant and it's it's also important in relation to NOSTR specifically because because relays. [00:03:36] Speaker A: Are very much just the old traditional. [00:03:38] Speaker B: Web interface or kind of format is. The idea is that things can be mirrored and moved very very quickly. But so it's kind of a whack a mole game, but it is still. [00:03:48] Speaker A: Something that can easily just be taken. [00:03:49] Speaker B: Down with DNS and it comes with all the general problems of traditional Internet. [00:03:54] Speaker A: Censorship, just not platform censorship. [00:03:58] Speaker B: And there is no specific or one. [00:04:01] Speaker A: Relay that has to be used. [00:04:02] Speaker B: You can connect to many different URLs. [00:04:05] Speaker A: And domains and that sort of thing. [00:04:07] Speaker B: So I think this will be a really fun and useful piece and if you don't follow any of it or you're having a hard time with some of the audio, I will get into. [00:04:16] Speaker A: Kind of re explaining it in the guys take. So with that let's get into today's article and it's titled Mainline DHT Censorship Resistance Explained by Severin Alexander Buehler the mainline distributed hash table, or DHT is a crucial component of decentralized networks, serving as a distributed database used to store and retrieve data across a peer to peer network. It is commonly used to provide torrents and now supports PUB key as well. However, like all dhts, it faces potential vulnerabilities such as Eclipse attacks which attempt to censor specific pieces of data. At Synonym, we created Picar Public Key Addressable Resource Records and PKDNS or Public Key Domain Name System as mechanisms that build upon the mainline DHT to improve security, resilience and censorship resistance. Picar resolves and or stores public key. [00:05:27] Speaker B: Domains or PKDs like. [00:05:35] Speaker A: Etc. Providing a decentralized way to map keys to data, while PKDNS offers an alternative to traditional DNS that is resistant to censorship and central control. In this post we provide an overview of how the default mainline DHT handles Eclipse attacks and the challenges involved in executing such an attack. A DHT Primer at its core, the mainline distributed hash table or DHT functions as a decentralized database powered by millions of nodes. It organizes these nodes and the data they store into structures called buckets, which are pivotal to its resilience and performance. Each bucket contains up to 20 nodes and is responsible for a specific range. [00:06:32] Speaker B: Of data within the network. [00:06:34] Speaker A: When data is published, it's replicated across 20 nodes, ensuring redundancy. When data is queried, it's retrieved from those same 20 nodes, creating a balance between availability and efficiency. Node placement within the DHT is determined by their IDs, very big numbers up to 2 to the 160th in size. Nodes with similar IDs are grouped into the same bucket. Likewise, data placement is tied to its public key, another large number, which determines the closest bucket where it will be stored. This design ensures that nodes and data are consistently aligned. Importantly, the DHT is dynamic. Buckets adjust as nodes join or leave the network, maintaining balance and ensuring no single bucket becomes overloaded. This dynamic nature is part of what makes dhts so resilient and scalable. Understanding this foundational architecture is essential to grasp the challenges and defenses against censorship in the mainline dht, which we will explore in detail next. The Eclipse attacks One of the most significant vulnerabilities in distributed hash tables is their susceptibility to Eclipse attacks, a tactic designed to disrupt the integrity of the network. In an Eclipse attack, a malicious actor overwhelms a specific bucket by flooding it with dishonest nodes. By pushing out loads legitimate nodes, the attacker gains control over the data stored in that bucket, making censorship of specific public keys possible. This weakness stems from the open nature of DHTS. Anyone can join the network and claim an ID. By strategically choosing IDs that align with a targeted bucket, an attacker can dominate. [00:08:39] Speaker B: That part of the network. [00:08:41] Speaker A: Before 2014, this attack vector posed a serious threat, as generating large numbers of fake nodes was relatively straightforward. The implications of such attacks were clear. Without safeguards, any determined adversary with enough resources could censor data or disrupt operations within the network. Addressing this problem became a priority, leading to critical advancements that have since reshaped DHT Security. The 2014 Breakthrough Security with Deterministic IDs In 2014, a major milestone in DHT security was achieved with the introduction of the Mainline DHT security extension, or BEP 42. This enhancement directly addressed the vulnerabilities exploited by Eclipse attacks by fundamentally changing how node IDs are assigned. Under BEP 42, a node's ID is no longer arbitrary. Instead, it is deterministically generated based on the node's public IP address using a cryptographic hash. This approach ensures that a node's ID is both unique and verifiable, making it impossible for an attacker to freely select IDs to target specific buckets. Moreover, because the ID is tied to the IP address, a node is forced to consistently use the same id, significantly reducing the attack surface. The result? The cost of launching an Eclipse attack skyrocketed. Gaining control over a specific bucket now requires access to a vast pool of unique IP addresses, an expensive and logistically challenging proposition. This breakthrough effectively eliminated the majority of Eclipse and Sybil attacks, transforming the mainline DHT into a far more resilient system. Closing the gaps the limits of BEP 42 While the 2014 BEP 42 extension significantly improved the mainline DHT's defenses, it didn't eliminate all vulnerabilities. Organizations with access to large IP address pools, such as major corporations, governments, or botnets, still pose a potential threat. [00:11:13] Speaker B: Here's why. [00:11:15] Speaker A: The mainline DHT currently consists of approximately 10 million nodes organized into buckets containing 20 nodes each. This means the network has around 500,000 buckets, and the probability of a single IP address falling into a specific bucket is 1 in 500,000. To influence one bucket, an attacker would need to control at least 20 dishonest nodes, requiring a pool of 10 million unique IP addresses. However, achieving full control over a bucket is even harder in practice. Mainline clients don't rely solely on static buckets. Queries involve random walks through the network before reaching the desired bucket, increasing the likelihood of encountering honest nodes. Additionally, nodes prefer longer lived established peers over newly joined ones, further delaying an attacker's ability to dominate in ideal conditions for the attacker. Around 60 dishonest nodes are typically required to censor data in a single bucket fully, but even then, the system's design allows occasional honest responses, ensuring no attack is entirely foolproof. While BEP42 has made Eclipse attacks prohibitively expensive and complex, the reality is that they remain theoretically possible for the most well resourced adversaries. Lets quantify the cost of such an attack to understand its practical feasibility. The true cost of IP based attacks controlling millions of unique IP addresses isn't just a technical challenge, it's a financial one. The scarcity and demand for IPv4 addresses make assembling the required resources for an Eclipse attack prohibitively expensive for most adversaries. Here's the Acquiring a large IP address pool involves either purchasing or leasing subnets. A single 16 subnet, which provides 65,536 IP addresses costs approximately $2.3 million to purchase outright. For the much larger 8 subnet offering around 16 million addresses, the cost skyrockets to $600 million. Leasing these resources is equally daunting, with a slash 16 costing $29,000 per month and a 8 reaching $7.4 million per month. Supply constraints further complicate the situation. At the time of writing, only two 16 subnets were publicly available for sale. Highlighting how illiquid this market is, assembling the required number of IPs from public marketplaces is not just expensive, it may be outright impossible. These astronomical costs effectively exclude all but the most resource rich adversaries, such as nation states or major corporations from attempting such an attack. Even for these actors, the financial burden and logistical complexity act as significant deterrence. In short, the mainline DHT's reliance on IP based node identification makes Eclipse attacks infeasible for the vast majority of potential adversaries, cementing its resilience against censorship. At Scale Examining Large IP Address Owners While the cost and scarcity of IP addresses deter most adversaries, some entities already possess vast pools of IP addresses, making them potential candidates for mounting large scale attacks on the mainline dht. Understanding who controls these resources is critical to evaluating the remaining risk. Here is a list of organizations with substantial IPv4 holdings. Number one the US Department of Defense or DoD owns 175 million IPv4 addresses. Amazon owns 109 million with 47 million of those as spare Google owns 60 million Microsoft 47 million the 911 S5 botnet, which was disabled in 202219 million over its 10 year lifetime Comcast 17 million Apple 17 million IBM 16 million if interested, you can follow the link down in the show notes in order to see the rest of the list. From this list, entities like the U.S. department of Defense and Amazon stand out. Their massive holdings and potential flexibility in allocating spare IPs mean they could in theory assemble the resources necessary for an Eclipse attack. Similarly, large botnets in the past have managed to temporarily amass millions of IPs demonstrating the feasibility of such mobilization. However, practical constraints limit most of these actors. For example, companies like Google and Microsoft rely on their IP reserves to power core services, leaving little room to spare for disruptive activities. Even if they could free up IPs, the logistics of mobilizing them for a targeted attack or are non trivial. Visibility as a Deterrent One critical factor working against these organizations is transparency. IP subnet ownership is publicly recorded, making any large scale attack traceable to its source. This alone can serve as a significant deterrent, as no legitimate organization wants to risk the reputational or legal fallout of of being identified as the perpetrator of such an attack. Beyond IPs the operational costs of an Eclipse Attack Even with a pool of 10 million IP addresses at their disposal, an attacker faces additional challenges when attempting to execute an Eclipse attack on the mainline dht. These costs, while significant, do not render such attacks impossible, particularly for resource rich adversaries. To target a specific public key or bucket, an attacker can Pre compute which IPs correspond to the desired node IDs and deploy only the required subset, approximately 60 IPs for a single bucket. This reduces the resource overhead compared to maintaining all nodes simultaneously. However, operational costs remain a factor. Here's a breakdown of the associated costs Server Infrastructure A single mid tier server capable of hosting the malicious nodes costs around $10 a day. Denial of Service Costs to disrupt the honest nodes in the targeted bucket, a sustained bandwidth attack is necessary. Assuming 1 gigabit per second per node, the DOS component of the attack would cost approximately $500 per per day. This brings the daily cost of targeting a single bucket to approximately $510 per day. [00:18:59] Speaker B: Conclusion Attacking the mainline DHT requires extraordinary. [00:19:06] Speaker A: Resources, particularly access to millions of IP addresses. This level of access is beyond the reach of most adversaries, making such attacks impractical for all but the most well funded actors. For organizations with already existing large IP pools, the daily cost of attacking a single key or bucket at a $510 a day is relatively modest. This means that while the average user is unlikely to be targeted, high profile individuals or entities could still face censorship risks from determined attackers such as nation states engaging in covert activities or well organized criminal groups. In summary, while the mainline DHT offers significant resistance to censorship, it is not impervious. Its security depends on the financial and logistical barriers faced by potential attackers, which are high but not insurmountable for certain very powerful adversaries. This underscores the importance of ongoing innovation such as Picar and PKDNS to strengthen decentralized systems against evolving threats. Stay tuned for the next post on how public key domains improve upon the default mainline design. [00:20:30] Speaker B: Alright, so I think that was a really good primer and I want to. [00:20:36] Speaker A: Kind of re explain some of the. [00:20:38] Speaker B: Elements to this and give kind of. [00:20:40] Speaker A: Explain my picture in my head of how a DHT works and how it's. [00:20:45] Speaker B: Structured and why it's censorship resistant and what this what the whole idea of. [00:20:51] Speaker A: Deterministic keys based on IP Addresses like how that changes the game. [00:20:57] Speaker B: Exactly. [00:20:58] Speaker A: And to try to put in simpler. [00:21:00] Speaker B: Terms and concepts, a lot of the. [00:21:03] Speaker A: Things that this article is trying to explain, as well as a handful of things that they did not get into. [00:21:11] Speaker B: That I think are worth digging into. So the first thing to understand, and I think the important framing to get this is how do you store a database? How do you have a database of entries that could be anything, by the way. It's really kind of arbitrary what the piece of data is in this database, but where anyone can enter or leave the network and where the database itself can grow to a size that none of the individuals or that is basically able to scale past what the typical node can actually keep track of. And also how do you keep that database in such a way that you can verify where you are in the database and the information, the tree that is elsewhere in the database? How can you verify that those entries or that data is what's being requested or delivered? And that's where quote unquote distributed hash. [00:22:14] Speaker A: Tables come in, or dht. [00:22:17] Speaker B: Now, a hash table is literally just a tree of hashes. And for anybody who's listens to the basic series and or knows kind of the fundamental things, a hash is literally just a think of it like a little function. It's a math problem that is done on a piece of data or a word, a string, any sort of just like little block of information or a. [00:22:43] Speaker A: Big block of information. [00:22:44] Speaker B: You run this function and you get. [00:22:46] Speaker A: A fixed size, which 32 bytes, 32 characters. [00:22:51] Speaker B: That's just the SHA256 and the output is completely random, so nobody has any idea what it is going into it. You have to compute the function in. [00:23:00] Speaker A: Order to get the answer. And it's the only thing you can do. [00:23:03] Speaker B: This is why it works as a proof of work. Because you can't look at a hash, you can't look at the result and then guess what the data was and you can't look at the input. Like I can't say, like my name is Guy. I have no idea. And nothing about GUI indicates what the hash is going to look like, what the fingerprint of this data is. [00:23:23] Speaker A: So I literally just have to hash it. But here's the kicker is that if. [00:23:27] Speaker B: You hash guy 2 on your computer and somebody else does it on their computer, and somebody does on their Linux machine, and somebody does on a Mac and somebody does on Windows, and somebody does it in China, the hash is exactly the same of the word guy. That means it's deterministic doesn't matter where. [00:23:44] Speaker A: It'S coming from or how many times you hash it, the hash is the same. [00:23:48] Speaker B: That's an important piece of the puzzle that we'll come back to in just a second. A hash table is just imagine there's, you know, a thousand different. I mean, you could do transactions, you can do just a database of a thousand different things and you want to be able to keep a portion of it and know how it relates to data at a different place in this huge database. Well, if you just take all of. [00:24:16] Speaker A: The things that are right next to. [00:24:17] Speaker B: Each other, like so my database entry. [00:24:19] Speaker A: Is sitting right next to some other. [00:24:21] Speaker B: Database entry and we're going to hash those two together and we will save that output in the table as referencing the piece of data that I have, plus the very next one in that spreadsheet. But we are also going to take that result of those two and we're going to combine it with the result of the two that's right next to ours. So we're going to combine four, first. [00:24:44] Speaker A: Each two into their individual hashes. [00:24:46] Speaker B: So we have two, and then we're. [00:24:47] Speaker A: Going to take those two hashes that. [00:24:48] Speaker B: Are left over, we're going to combine them into one. So we've got this little tree, we have one hash at the top, then it splits into two, and then it splits, each of those splits into two. And now we have our four data points. And specifically that hash corresponds to the. [00:25:02] Speaker A: Data that you are requesting. So you can verify that the data. [00:25:05] Speaker B: Is exactly what you've expected it to be by confirming it against the hash. And importantly, you can be given the hash just by keeping the quote, unquote root, the, the one, the one hash that is the result of all four. [00:25:21] Speaker A: Just by keeping that one. You can know anybody can send you any of the four other ones that are down the branches of this tree. [00:25:29] Speaker B: And you can know that that is. [00:25:31] Speaker A: In fact inside of the tree that you have saved. [00:25:35] Speaker B: And all you have to keep is that one hash. Now, this database in particular, when we're talking about peer to peer networks, this isn't about immutability or, you know, proving data per se, because all of the information and all of the entries are constantly changing every time people join and leave the network. [00:25:51] Speaker A: This is actually using hash tables for a different reason. [00:25:54] Speaker B: Each is just finding out where someone is in the database because the hash is really just a big number, right? [00:26:02] Speaker A: It's 32 bytes. [00:26:04] Speaker B: And it's also random. [00:26:06] Speaker A: It's a random number. [00:26:08] Speaker B: So you can't predict where in the. [00:26:11] Speaker A: Database, you're going to be placed. But the rules of the database are. [00:26:15] Speaker B: Just that you put them in order. So let's just say that there's a hash that's just nine numbers, right? So it goes up to 999,999,999, starts at zero, goes to one less than a billion. Now, if two people join this network and they randomly generate a number for themselves, one of them generates the number 7 and the next one generates the number 900 million and 1, well then. [00:26:44] Speaker A: Those numbers are right next to each. [00:26:46] Speaker B: Other because they are the only two on the network. Now if 20 other people join the network and one of them generates 115, another one generates 573,900, another one generates etc. Etc. Well, then now they get placed in the database based on them pinging their peers and then just figuring out which one they fit between. This is done with something called the XOR metric. It runs the XOR function or process, whatever. And essentially XOR is just the computer's version of what's the difference between them, except that it's not a subtraction because the difference between it's like what's the. [00:27:25] Speaker A: Difference between 15 and 10? Well, it's 5. [00:27:28] Speaker B: If you do 15 minus 10, it's. [00:27:29] Speaker A: 5 if you do 10 minus 15. [00:27:31] Speaker B: It'S a negative 5. Well, XOR is the difference regardless of. [00:27:35] Speaker A: Whichever one you use. [00:27:37] Speaker B: So XOR 1015 is 5. XOR 1510 is also 5. [00:27:43] Speaker A: It's just the difference between the two. [00:27:45] Speaker B: Numbers, or more specifically distance between the. [00:27:49] Speaker A: Two numbers in the counting space of the computer. [00:27:53] Speaker B: And this is really where the magic. [00:27:55] Speaker A: Of the distributed nature of the DHT. [00:27:59] Speaker B: Really comes in, is all of the nodes on the network only keep a certain amount of relevant information and values in the database based on where they are in the database. It's just stuff that is quote unquote close to them. So back to our example of a billion different IDs or hashes in this table, it's literally just how close you are to whatever. So if I come up and my. [00:28:29] Speaker A: Name is Guy, and then I do. [00:28:30] Speaker B: A hash of that and it happens to hash out at the number 1256. And these are done in, it's referred to as buckets, is how many and how much you keep of a certain. [00:28:44] Speaker A: Block of information that is around you. [00:28:47] Speaker B: So my hash is 1256. So I'm immediately only going to be worried about the things that are immediately like next to me. And let's Say our bucket or our block. Just because to do easy math with twos, let's say it's the size is 512. So I'm going to be keeping from the number 1000 to, to the number 1512 because 1256 is right in the middle of that. I will, I'll keep all of that data, all the values and all of the entries that show up that have that hash, that have that key attached to it. Anything that's. If somebody shows up and they say, here's my key, it's 1200, well, I'm going to keep that little value and that part of the database. Now in the big tree, remember this whole thing just makes a tree of hashes. So if you start hashing together pairs in that 512, you get nine branches. [00:29:48] Speaker A: Up to the root of that tree. And remember, that's just the root of. [00:29:51] Speaker B: An entirely, a much larger tree. [00:29:54] Speaker A: Well, in those nine branches, because this. [00:29:56] Speaker B: Is my important, this is my important part of this giant tree. [00:30:00] Speaker A: We're all out on limbs. [00:30:01] Speaker B: There were leaves all over this tree and there's two giant branches, there's four. [00:30:08] Speaker A: Medium sized branches, there's eight small branches and there's 16 tiny branches. [00:30:14] Speaker B: Like it's just this constant splintering of branches. Well, if I'm off on one of those end pieces, I'm a leaf on one of those end pieces. Well, I'm only going to keep the tiny branches and the small branches, they're right near me. So in the same context of the one to a billion, that's what I'm doing with keeping the 256 from below my number and the 256 above my number. However, I also want to know how to get to other things in the tree. [00:30:46] Speaker A: So I'm going to keep half that. [00:30:47] Speaker B: Amount of information at the next tier up. So when I'm going from the small branches to the medium branches and I have this other medium branch, this whole. [00:30:58] Speaker A: Thing, that's the double of all of. [00:31:00] Speaker B: This information I have over here on my small, small right branch branch. Well, that small left branch, I'm only going to keep information about half of the nodes, how to contact them and what their key values are. And then when we go down further towards the trunk of the tree and we have our big branches, well, I'm only going to keep half of that number worth of information about nodes and key values in the other branch. And this is all based on my perspective. So I don't care where somebody else. [00:31:34] Speaker A: Is in the tree. [00:31:35] Speaker B: I am only Worried about the branches that are near me, all the numbers that are near me. And then the further away the numbers and the values are in this spreadsheet, the less I am keeping, the less of the data that I'm keeping. So again, going back to the example of a billion, I'm keeping everything from the, from the number or the hash 1,000 to 1,512. However, from the number 0 to 5,000, I'm only keeping 256 of those. And then when it comes to everything from the number 5,000 up to 50,000. [00:32:13] Speaker A: I'm only going to keep 128 of them. [00:32:16] Speaker B: Then from 50,000 to 500,000, I'm only going to keep 64. So the further away the number is from my number, I'm going to keep less and less information or fewer and fewer, fewer pieces and entries of this giant database. And now I'm just kind of making up numbers that I hope are easier to picture. But all you have to understand is that the blocks of numbers get bigger and bigger. The block, like the groups get bigger and bigger, and I keep less and less information about each one entirely dependent on how they are related to the. [00:32:54] Speaker A: Hash, the key, or the identifier of. [00:32:56] Speaker B: Me and my value in this database, which importantly means that someone who has generated the hash of 50,000 is keeping all of the entries from 499,768 to 500,256. That same 500, 256 more, and 256 less than where mine is at 1256. And all the nodes do this. [00:33:22] Speaker A: So all of the nodes are only keeping. [00:33:25] Speaker B: Everybody has their own unique perspective. Nobody keeps the exact same amount of data. However, in my example, notice that in our buckets of 512 there would be exactly 512 other nodes that are keeping the exact same values that I have, every single one of the value that I am keeping. And in the Kademlia, in the BitTorrent DHT, I think, think, I think they. [00:33:52] Speaker A: Said specifically in this, if I'm not. [00:33:54] Speaker B: Mistaken, that the buckets are 20. [00:33:56] Speaker A: So every group has 20 nodes worth. [00:33:59] Speaker B: Of discovery, worth of redundancy in that. [00:34:03] Speaker A: Area of the database. [00:34:04] Speaker B: And importantly, to remember that the DHT specifically, even though you can, you can obviously add or you can join and leave the network, hosting the DHT is not the same as simply joining the network. It's kind of like running a bitcoin. [00:34:20] Speaker A: Full node is you have to actually host the dht. [00:34:24] Speaker B: It's kind of akin to being a relay in nostr, except for the fact that anyone can connect to anyone. It is literally a distributed network of connections where relays don't connect to each other. Anyone can simply run a relay and you can connect to that relay and other people can connect to it. But there's no intercommunication. [00:34:45] Speaker A: Whereas the entire point of the DHT. [00:34:48] Speaker B: Is to have robust intercommunication and everybody's hosting or running parts of the exact same database that is trying to globally consolidate or be aware of all of the other nodes on the network. We're not, not aware of all the nodes in the network specifically actually aware of enough nodes that they can contact. [00:35:11] Speaker A: Other nodes to find other nodes on. [00:35:13] Speaker B: The network that they are not aware. [00:35:15] Speaker A: Of specifically based on the distance from them. [00:35:18] Speaker B: So going back to the example of my zero or one to a billion possible hashes or IDs is I'm only keeping, I think it was like 64 between 5,000 and 50,000. But that means that anybody I'm looking. [00:35:35] Speaker A: For, if I'm looking for the ID. [00:35:37] Speaker B: And the value in this database that has the ID 39,970 and of the 64 I have saved, I have 42,000. I just have whoever is, has the ID 42,000. [00:35:53] Speaker A: Well, I'm going to contact them and. [00:35:55] Speaker B: Ask them about number 39,000, whatever I just said, because they are the one who is closest in the network and they're going to have, they may very well have that single data point, that one entry point, because it's in the set that they use where they saved half of them and it's in that half. If it's not, well then they're going to contact the next person closest. And with two hops or three hops. [00:36:19] Speaker A: I'm going to get to my piece of information. [00:36:22] Speaker B: And if specifically number 42,000 isn't online, I'm going to contact any one of the other 64 in that block. [00:36:29] Speaker A: If for some reason none of them. [00:36:31] Speaker B: Are responding, well, then I'll take it one step back and I'll go to the 128 between the numbers, you know, 2,000 to 5,000 or whatever it was. I don't know, I just made stuff up. So, but the set of numbers, the set of keys that are closer to me, well, I am contacting them. If I can't reach anyone further out in the network, then they tell me what they have near them and bounce out as far as they need to in order to finally get me to node 39,000. And the key, the values of this database, by the way, are IP addresses and port numbers and just how to contact other people in this network. [00:37:10] Speaker A: That's literally it. [00:37:11] Speaker B: So the entire table is just a. [00:37:13] Speaker A: Series of contact information for computers to connect to other computers. [00:37:17] Speaker B: And by doing this, there is no. [00:37:20] Speaker A: Central authority in the network and computers can just find each other in the system. [00:37:27] Speaker B: So the network can grow massively. But any one individual node only has to keep essentially small pieces of the network, which is enough information to reach out into the network and find any other data point. This makes it resilient, scalable and censorship resistant. However, there's a problem, or there was. [00:37:52] Speaker A: A problem, is that the hashes, like. [00:37:55] Speaker B: I said, they're deterministic. So if I have the name guy and I hash that, it's going to be no matter where you are or who you are, what computer you're on. [00:38:02] Speaker A: It'S going to hash to the exact. [00:38:03] Speaker B: Exact same, the exact same number, the exact same hash. But anybody can just join the network and identify and however they want and just make a hash. Which means if I was looking for, if I found out that you had the hash again, it's actually a 32 byte number. So this is a huge number. But let's just go back to the example of the set where there's a billion. You have the hashtag 500,000 and I just randomly generate one and it's just the number 1,000. Well, I could just kind of arbitrarily change whatever I was hashing, whatever my key was, and just generate a new key and do that, I don't know, a hundred times, a thousand times, until I'm like right in the, you know, 499,900 range, I'm near your network or. [00:38:52] Speaker A: Near you in the tables. [00:38:54] Speaker B: And then I could just keep generating all of the hashes until I have all of the nodes and values between 499,000 and 501,000. Which means that all of the information that you are going to ask about on the network is specifically just going to be asking me. [00:39:14] Speaker A: Suddenly you are going to be sybil. [00:39:16] Speaker B: Attacked and you're, you're suddenly on this network and the only person that you are talking to is me because I have just surrounded you with all of the relevant key values and hashes in the database that are, that you are all going to be concerned with because you're going to ask for, you know, 499, 768 and up to 500,256. Well, I just spun up thousands of fake nodes and I am literally all of them. So every single one that you request will just be asking me for what the network looks like, what the entire table looks like, and how to contact people. Which means that I can just leave out whoever I find inconvenient, or when everyone else contacts me to find you. [00:40:00] Speaker A: I can leave you out of my table. [00:40:03] Speaker B: I can essentially black hole you in the network. [00:40:06] Speaker A: And so the resistance method, what they. [00:40:09] Speaker B: Came up with to solve this vulnerability and to make it more expensive to do this is by making this ID a hash of the IP address. So I could trivially spin up thousands and thousands of nodes and generate thousands and thousands of hashes, but I cannot trivially own thousands of IP addresses. But even more importantly, I can't just generate random hashes. And I have no idea. My IP address being near yours is irrelevant. That has nothing to do with anything. It's whatever the hash of the IP address is that determines where I am in the network and I have no idea what that's going to be beforehand. Which means I just need to be grabbing IP addresses, guessing, hashing them, and seeing where it shows up in the. [00:41:02] Speaker A: Network or in the table. [00:41:04] Speaker B: So I need to own millions of IP addresses just to find the 100 or 1000 IP addresses that produce the hashes that are relevant to trapping you in a hole and then hosting and maintaining them cost five. Each of them cost $510 a day. Now that's a kind of trivial cost on top of the real fundamental issue is that because your IP address is how they connect to you, which means that's the only way that they're going. [00:41:36] Speaker A: To request information from you. [00:41:38] Speaker B: And the hash is, the ID is the hash of that IP address. There's essentially no way to fake it because it's literally the information that you. [00:41:49] Speaker A: Use to connect to them. [00:41:51] Speaker B: So if someone tries to defraud that. [00:41:53] Speaker A: Information and put in something that's fake. [00:41:55] Speaker B: In order to try to get a different spot on the network, well, now you don't have. They don't. They can't connect to you because you lied about where you were on the network and what your IP address is. So nobody can connect to you. So it's like, okay, well, whatever. Nobody's going to get your information, nobody's. [00:42:08] Speaker A: Going to ask you for any information. [00:42:10] Speaker B: Because they don't know how to contact you. And so this makes any attack or censorship on the network extremely difficult and expensive. And that, I think is a really important. This is also something that relates really, really well to Bitcoin is That you can censor the dht, it just costs. [00:42:31] Speaker A: An enormous amount of resources and it's a super pain in the butt. So it's a whole lot easier for. [00:42:37] Speaker B: Them to go just find the server or the peer and try to shut them down or legally attack them or whatever. Essentially, there's a ton of other mechanisms. [00:42:49] Speaker A: By which they can attack that are cheaper. [00:42:52] Speaker B: And because of this, the censorship resistance of the network stays in place. Because the network is so expensive, it's so uneconomical to attack that people just don't do it. This is exactly how Bitcoin works. [00:43:06] Speaker A: You can actually reverse the history of the Bitcoin chain. [00:43:11] Speaker B: You can actually censor transactions on the network. [00:43:15] Speaker A: It's just insanely expensive to do. [00:43:19] Speaker B: And because it's so difficult and uneconomical to try to attack the network itself, the network stays robust, the network stays. [00:43:28] Speaker A: Censorship resistant, and governments or attackers or thieves or whatever go after the individuals. [00:43:34] Speaker B: Or the websites or the, the people, whatever it is. [00:43:39] Speaker A: Essentially the other routes become less expensive across the board than trying to attack the entire network. The system stays safe and the rules stay in place. [00:43:52] Speaker B: And in doing so, you can have this giant distributed network of people who randomly join and leave the network. [00:44:01] Speaker A: And everybody has all the information necessary. [00:44:04] Speaker B: Has saved all of the values and pieces of this sections of this database in a hugely redundant fashion. [00:44:13] Speaker A: So that even as all of these. [00:44:15] Speaker B: Unreliable devices enter and leave and are having ephemeral connections and they're bouncing, they're hopping across this database, trying to find peers in order to, you know, find the three people who are going to route. Like, this is basically the routing table that the Internet does as well. [00:44:33] Speaker A: Except this is a distributed version of. [00:44:36] Speaker B: That, rather than just pinging the physical, the things that are physically tied to your wires. Like, I'm keeping a routing table of my little LAN network, right, just on the router in the, you know, closet of my house. Then there's routing tables of the neighborhood. [00:44:51] Speaker A: Nodes and then the city, and then. [00:44:53] Speaker B: The trunk line going out to, you know, a feeding station and a satellite. [00:44:58] Speaker A: Feed and all this stuff. [00:44:59] Speaker B: Everybody's. It's just connecting. And everybody has pictures of their local. [00:45:04] Speaker A: Or the things that are close to them. [00:45:06] Speaker B: And everything else hops, everything else, sends. [00:45:08] Speaker A: A request that says, find this. And then they move to the next. [00:45:11] Speaker B: Thing, find this, and then they get a ping back. And this takes milliseconds to actually clear the communication between all of these devices. So that even though it takes me. [00:45:21] Speaker A: 13 hops, which the last time I did this, just for Fun. [00:45:24] Speaker B: It took me 13 hops to connect to Google's DNS server or just a server at Google. [00:45:30] Speaker A: I just use their DNS because it's easy to remember. [00:45:33] Speaker B: But if I want to skip across the country on the physical Internet network I'm doing, I'm going through 13 other computers and devices in order to get there. Well, if I wanted to connect through the peer to peer distributed network, not only am I doing all this routing on the Internet, but I'm also just routing who has this information about these peers. And I'm doing that same sort of process through the table, the hash table itself, of all the peers that have joined that network. And I'm just looking, I'm bouncing from. [00:46:07] Speaker A: All of the branches that are close. [00:46:09] Speaker B: To me in order to get branches. [00:46:11] Speaker A: That are further away and find the. [00:46:13] Speaker B: Peers that are hanging out over there on the network. And so because of this, no matter where you are in the world, it. [00:46:18] Speaker A: Might just take three, four, five hops. [00:46:21] Speaker B: And the, the whole thing is, quote, unquote, six degrees of separation. It's actually true in this, in the DHT network as well, that essentially if you're the furthest away that you can be in the network, I'm still only going to have to go through a handful of nodes who are closer, getting increasingly closer to where you are in that giant table in order for us to have direct communications, which again might only take milliseconds. And then you and I can connect directly to each other and share a file or have a conversation or anything that we want. [00:46:55] Speaker A: IP addresses are centrally issued domains like. [00:46:59] Speaker B: Google.Com, these things are centrally owned and delivered. That's why when a government takes down a website, they can just shut it. [00:47:07] Speaker A: Down, they just turn it off. [00:47:08] Speaker B: They simply go to the domain authorities and say, this one, you can't talk to this one anymore. Courts do it, force shutdown of it. Those layers in the network are very easy to control. This is why Pub Key is such a fascinating new addition to how to better distribute the ability to contact and. [00:47:31] Speaker A: Find each other's content on the Internet. [00:47:33] Speaker B: Because there are some arbitrary values that. [00:47:36] Speaker A: Can be added into this giant table. It's not just IP addresses, it's IP. [00:47:40] Speaker B: IP addresses for contact and port numbers and then also sections for just random blobs of data. So this can actually be used. And what they built is to be able to use your key as a domain. So if you're using an app that has PKDNS in it, or you install PKDNS on your Computer. Well, then you can go to your browser and let's say my, whatever my hash is or whatever, I give it. [00:48:09] Speaker A: To you, I give you my key. [00:48:12] Speaker B: Well, you can go to, and then punch in my key and the PKDNS system will find me on the BitTorrent network and then just you can view whatever information or content is connected to that key. In other words, instead of using the BitTorrent network specifically for a file, or, you know, file sharing specifically is, well, it can actually be used for a quote unquote to store a domain that just lets you arbitrarily find any content hosted by anyone. And then that content can move around or it can be replicated dozens of times, or it can be shifted to a different IP address and that key itself will still determine where and how to connect to it. So it can be mirrored by a bunch of people who are supporters of yours, where you can seed it with a bunch of other people who you pay kind of like a distributed hosting service. And as long as they have that key, it will still take them there no matter how many times that information moves to a different place or to a different computer or to a different country. That domain, my key, HTTP, whatever you punched in, will allow you to identify and connect to whatever that content is. [00:49:30] Speaker A: It is a domain name system, a. [00:49:32] Speaker B: Way to contact and reach people on the web, on the normal web stored in the BitTorrent DHT network. That is a pretty powerful idea. So again, this is something I'm going to be playing with and exploring. I hope this kind of makes the. [00:49:51] Speaker A: Picture a little bit easier to understand. [00:49:53] Speaker B: It might have gotten really complicated with. [00:49:55] Speaker A: All the numbers and the branches and everything. [00:49:56] Speaker B: And I'm sorry, audio is not the perfect medium for doing this if you thought the explanation was useful. Actually, it's a, it's great thing to have feedback. Let me know if it worked or if you want me to do something with a visual. Like if I get, you know, 50 people telling me I should do a video explaining this, I am happy to do that, but otherwise I have a lot of other videos to do and I probably won't. And I'll let this episode suffice. And also they have a couple of other links in the actual article to learn more about dhts. They've got a guy who did a video that I watched that one too as a part of a bunch of the things that I was digging into. And he has a pretty good explanation actually, so it's probably not even necessary, even though I like to explain things my way. But check those out. Also check out PubKey and the app again. Betas are really really soon. Or actually when this comes out it. [00:50:48] Speaker A: May already be out. [00:50:49] Speaker B: I'm not. [00:50:50] Speaker A: I'm not sure that's. [00:50:51] Speaker B: I know it's soon. Like today is actually the 10th and I believe it was around the middle of February. They said that it was. I don't have any word on it yet, but it should be. [00:51:00] Speaker A: It should be pretty soon any day now. [00:51:01] Speaker B: So anyway, don't forget to check out those. Don't forget to follow them on their. [00:51:05] Speaker A: Blog on their Medium page. [00:51:06] Speaker B: Links and details in the show notes. Drop them some claps on the article. This is a project to keep an eye on and also just listen to the show because you'll hear about it because I'm following it. Don't forget to check out the BitKit wallet if you were looking for an on chain and Lightning wallet that is easy to use, has a very slick UX and is entirely self custodial. And of course if you want true self custody and cold storage, you can get yourself 10% off the new Jade plus hardware wallet. I've always been a big fan of the Jade and I am super stoked about the Jade which I will have a second video on really really soon. You should check out my sexy unboxing video because it's sexy and keep an eye out for the next one because maybe there'll be some bitcoin hidden in the video. I don't know. You should check it out though. Subscribe on YouTube on Rumble and share this out with everyone that you know who wants to know about bitcoin wants to learn about bitcoin and I will catch you on the next episode of Bitcoin. [00:52:07] Speaker A: Audible. [00:52:08] Speaker B: I am Guy Swan and until then. [00:52:11] Speaker A: Everybody take it easy guys. An open Internet is an open platform. [00:52:32] Speaker B: For debating opposing views. [00:52:35] Speaker A: It allows unpopular voices to be heard. Newton Lee.

Other Episodes

Episode

May 15, 2020 00:24:01
Episode Cover

Read_394 - Blockstream Satellite 2.0 [Adam Back, Chris Cook]

Full nodes without even needing the internet? That's right. A massive improvement to the decentralization & alternatives for securing Bitcoin validation infrastructure just dropped...

Listen

Episode

November 24, 2022 00:54:15
Episode Cover

Read_677 - Maturation of Lightning, Growing Up By Going Vertical [Roy Sheinfeld]

"As the Web became much more complex, we now engage with multiple companies for each of those functions. Including all the writing, editing, commenting,...

Listen

Episode

April 04, 2019 00:45:18
Episode Cover

CryptoQuikRead_227 - Bitcoin is a Hedge Against the Cashless Society [Hasu & Su Zhu]

Another excellent piece from Su Zhu and Hasu at uncommoncore.co and a great introduction to the dynamics and risks at stake in the cashless...

Listen