Towards a common crypto transport layer

There are many ways the crypto community can work together to bring common standards to the ecosystem and improve adoption of the technology. One of those ways is coming up with a common payment protocol (see Towards a singular payment protocol), although that proposal only covers a small part of how different cryptocurrencies and even fiat currencies can work together. Even for payments, the simple system described in that article only works for currencies that have addresses where the payment can be sent. There are cryptocurrencies, such as those based on mimblewimble, that have no addresses that can be handed out. Currencies such as those require a two-way communications channel to be established in order to complete a transaction. As mentioned in the same article, there are also compelling reasons to have two-way communications even when you have an address to send a payment to, as you can do things like confirm transaction fees, decide who will pay those fees, send receipts, etc.

Developing a way to communicate between two ends of a transaction requires tradeoffs between security, privacy and convenience. Users of such a communication system could choose what is the most important to them, and use it in whatever way they choose.

For example, the simplest and fastest way for two people (or machines) to communicate is directly. This is more or less the idea behind BIP-70 which allowed direct communication over https using X.509 certificates for authentication. BIP-70 took advantage of this two-way communication system to provide proof-of-payment to the buyer and a refund address to the seller, among other useful features. However, detractors of the protocol opposed it because it lowered the privacy of the buyer. The seller’s server would always know the IP address of the buyer’s wallet, for example, since it was connecting via https.

On the other hand, if you trust your recipient, then a direct connection to them prevents eavesdropping on the hops in between (except by your network provider of course), even if the only data that can be gleaned is meta data (IP addresses, size of message, destination, etc.).

Using a pure distributed hash table (DHT) approach would put your message on many machines that don’t need to see it. Optimizing the shortest route might reduce that risk, but that assumes each hop along the way is trusted. A bad actor could install many nodes to ‘help’ maintain such a network, and track everything going through its nodes.

Let’s say you visit a merchant, and want to buy something from them. This could be a physical store or someplace online. If you’re not shipping something physical to yourself, there’s no need for the merchant to know who you are, where you live, or what the IP address is of the device your wallet is running on. You can always choose to give the merchant identifying information, but that should really be up to you. You don’t get carded buying a coffee at a corner bodega, and you shouldn’t get virtually carded when you want to donate money to a charity or buy an ebook to read on your computer or iPad.

Other applications beyond payment messages

Of course, once you have a communications network in place there are many other applications that can be built on top of it. In the cryptocurrency space, one could see decentralized trading of currencies. You broadcast a trade request – selling 1 BTC for 50 ETH. Perhaps with a +/- of .005% to allow for fluctuation of the market price. The first node that that message hits that has an open market order that matches could automatically send a response to start an atomic swap. There is a lot to work out in that mechanism. What if I have an open order to buy half a Bitcoin for the same rate? Would my order be fulfilled, and then the 1 BTC for 50 ETH order be converted to .5 BTC for 25 ETH? How long would these orders exist on the network? Presumably market order would exist only briefly, while limit orders would exist longer. In the same way that a message sent through a DHT network, such as those sent through Bitmessage, may only remain on the network up to four weeks (but usually only a few days), other kinds of messages like atomic swap requests would need other kind of restrictions on how long they exist on the network.

A payment example

You walk into a coffee shop and want to buy a cappuccino. You can a QR code with a payment request. That payment request includes the amount to pay, the address to pay to, a description of the purchase, and in this case a separate messaging address and a public key from the seller that matches that messaging address. It could also include a fee amount that it has agreed to pay for the transaction. See Towards a singular payment protocol for more possibilities on what could be included in such a payment request. Normally you could just send a payment to the address given, but with the messaging address provided you send a message first instead. It’s possible that this message could be sent to the same address that is used in the blockchain for the payment, but there’s no reason (other than to save some bytes in the payment request) to do so since it could potentially reduce privacy on the transaction.

You send a message verifying the amount you will be paying (for example the amount of the item minus the fee amount) and providing a response address and public key for the seller to send your receipt. You could also include a new crypto address to allow a refund if necessary. Your client encrypts the message using the public key provided in the payment request. The response address is the only part of the message that is not encrypted. Since the address is a hash, there is no way to use that address to generate the public key.

The seller verifies the payment amount, and sends a response to the message address given. In doing so they provide a new address for your response. Each time a message is sent a response address and corresponding public key. Since this information is encrypted, there is no way for someone to know that the subsequent message sent to the new address is connected in any way to the current message.

Once, twice, thrice…

One way to help obfuscate the delivery of messages is to send them to one or more intermediary destinations. For example, Bob wants to send a message to Alice. First Bob encrypts the message using Alice’s public key. Then he takes Alice’s address and the encrypted message into a new message that is encrypted using Carol’s public key. That new message is sent to Carol, who receives it, sees there’s a message for someone else (she doesn’t know who, she only knows the address) and sends that message along to that address. Want to further obfuscate the message, send it to two intermediary nodes instead of just one.

Bob wants to send a message to Alice, but sends a message to Carol first, who then sends the real message to Alice

In the above diagram you can see an example DHT network, where Bob knows the addresses and public keys of Carol and Alice. If he used Dave as the intermediary instead of, or in addition to, Carol, ironically the message would pass through Alice as a node, before being returned to her from Dave (since David only connects to Bob through Alice).

In this example Bob wants to send a message to Alice, so he encrypts that with Alice’s address (together Message A), using Carol’s public key, and sends it with Carol’s address as Message B. When Carol gets Message B, she decrypts it using her private key, getting Message A which she then sends.

If someone was watching the network, they might determine from the length of Message A that it is reaching its final destination (that no more messages are encrypted and waiting to be relayed within). Certainly if the message is too short to contain another address, then you would know that it has reached its final destination. To prevent this, you can pad the message before encrypting it, to insure it is bigger than a minimum size, although you wouldn’t want to pick a specific number to pad to, otherwise that would also be meta data that could be used to make educated guesses about the destination. Your client could insert random text or instead grab a few random lines of Shakespeare or any other large source of text to pad the message.

Messages within messages

Another way to pad a message is to send it within another message. We’re already sending messages within messages, except up until now it was largely arbitrary. What if the relay point had a purpose?

Bob wants to send Alice a message. Instead of picking a random relay point, he receives a message over the network on its way to Carol. He encrypts the message to Carol along with his message to Alice, and sends the whole thing to Carol. Carol receives and decrypts it, getting the message that was intended for her, and also sees the message to Alice, which she sends.

This requires a mechanism for Bob to get Carol’s public key, without knowing who Carol is. However, this need not be the same public key that is paired to the address that Bob knows. Instead, Bob sends a public key request to the address in the message he is forwarding (which is Carol but he doesn’t know that). Carol sends back a new address and public key encrypted using Bob’s public key which he included in his message. Bob now encrypts his message to Alice and the message he is forwarding to Carol, and then sends it.

If Eve is able to watch the network she might know that Bob received a message intended for someone (but then again so did everyone else). Eve might also know that Bob sent a public key request to that same address, but it wouldn’t matter because there is no way to know what new address was sent to Bob. The response to Bob could even be sent through an intermediary, so even if Eve was watching Carol, she wouldn’t know which message leaving Carol was the one with the new public key and address intended for Bob. Eve would need to follow every message from Carol, and then follow every message sent from every node those messages passed through. Even with massive surveillance capabilities this would be very difficult. This is particularly true because even if all of the messages sent through all of those nodes eventually hit Bob’s node, there would be no way of knowing which ones contained the response from Carol.

When the message is actually sent bundled with Bob’s message to Alice, it’s to the new address that Carol sent him, so there is no connection to the message that Bob received. As Bob’s node would be sending and receiving many messages, there’s no way to know which message is the new one that contains the original message received by Bob.

Bob and Weave, Duck and Cover

If you’re on the network, and receive a message, you have no way of knowing if the message was sent from the peer you retrieved it from, or if it has been traveling multiple hops. You also have no idea if the address on the message is the final destination, or just another relay point. If you’re the final recipient, no one knows as the message is still being passed on to other peers. The only one who knows that it was decrypted by you is you.

If someone is watching you so they know that you originated a message, they don’t know who the final recipient is, and have no way of determining that. Even if they did somehow monitor every node in the network, they don’t know which client on the network is the owner of that address. They also don’t know if the message was encrypted to multiple intermediaries, so they cannot track the message effectively.

If you wanted to go further, you could do more to mess with possibly analysis. For example, you could send public key requests to random peers. This is a waste of resources, and in particularly a waste of other people’s resources, but it would make analysis of possible routes to a real person that much more difficult. Did you receive a message to be relayed? If so, which one is the one you are sending the message to?

Address generators

There could be a class of addresses which never receive regular messages, but only serve to generate new address and public key pairs. This would be useful for a public address used somewhere where many people might access it. For example, a QR code on a poster advertising an event, or on a business card. Any message sent to this address, no matter what is enclosed in the message, will only receive a response with a new address and public key.

If you know that the address is such an address you might simply send a request for an address and public key, but you could just as easily send a normal message with an encrypted payload, which would just be ignored when received.

In cases where the person sending the message didn’t know it was that kind of address, when responding to the initial message, the node could include some information to explain that the original payload was ignored. This could be a simple hash of the original message, so the client knows which message was ignored and needs to be resent to the new address. All of this can happen behind the scenes. There is no need for the user to know that these exchanges are going on, unless they want to know why it might be taking longer than usual to get a response that they see.

Another type of address generator is one like that described in A cryptocurrency address name service. Instead of receiving a regular address, you get a normal web domain, such as dave.cryptoname.com. Your client can, depending on your security needs, send an API call to the site over https, or send a message over the network using that as the address. In either case, when the request is received by dave.cryptoname.com, a response is sent with a new address and public key, encrypted using the public key you supplied in your request. It would probably be best to restrict such a name to only generate new address/key pairs. In other words, once a new address/key pair is generated, you continue exchanging address/key pairs for each step in the communications, so the messages are no longer connected to the original name.

Addresses when there are no addresses

For cryptocurrencies and other applications that have no addresses, this system can act as the means of communication to initiate and effectuate transactions. So a mimblewimble transaction could start with a throw-away address from the receiver, either a hash or a name, and from that an address/key pair are generated and sent to the sender, who then initiates the transaction by sending the first part to the receiver, and getting a response over the network.

This is not entirely different than Beam’s Secure BBS implementation, although it is less specialized and more decentralized.

It’s important to recognize that even though the initial address might be known, it has no connection to the transaction that is completed.

Different levels of security

If you are running a full node on the network, you receive all messages, and are protected from having it known that you are the recipient of a specific message. Sometimes, however, running a full node wouldn’t be necessary or even preferred. One example of when running a full node would not be preferred is when connecting to the network on a phone where you are using mobile data. You probably don’t want to be sending and receiving every message on the network in that situation. You might, if you wanted to insure privacy. On the other hand, if you are less concerned that someone might find out you paid for a coffee, you might forgo that privacy, and trust another node on the network to provide you with the messages you need.

If you’re initiating a payment, you wouldn’t even need to know anything about the state of the network. You scan a QR code to buy your coffee, connect to a full node and send a message to the address provided. At this point you have at least three options for receiving a response.

First, you could temporarily become like a full node, except you only get messages that are timestamped after the message you sent. You receive all messages and wait for the one that has the address you sent in your message. In theory the node or nodes you’re receiving all the messages from wouldn’t know which message was intended for you, but the fact that you stopped requesting messages at a certain point might help them make an educated guess. If you’re receiving from many nodes, they wouldn’t know if one of the messages they sent you is the one that you wanted, so that sense having many peers on the network would help your privacy in this case.

Second, you could request only to receive the message headers for messages with a newer timestamp than the message you sent. This is similar to the above option, except at some point you would need to request the full message from a node, which kind of gives up the game. It would of course use less bandwidth. You could request multiple messages to obfuscate which one you are actually looking to read, but this isn’t a great way to ensure your privacy.

Third, if you trust the node you are using to relay messages, you could simply ask them to watch out for messages sent to a specific address. When that node receives a message sent to that address, it will be pushed to your client. If you really trust the node, you could simply forward it all the addresses you generate and let it keep watch on all of them. You could control this other node if you want as well. For example, you could keep a node on your computer, and have that be the node that your phone connects to to get push notifications of messages.

Securing the channels

The above ideas assume someone is potentially watching and tries to minimize the amount of meta data that is leaked. To further that goal connections between nodes should be encrypted. The use of Tor to send messages could also be used, but the above ideas assumes that some nodes themselves could be compromised, so even without Tor or other ways of concealing the transmissions between nodes, the leakage should be minimal.

Some final thoughts

There are other messaging systems designed to overlay specific blockchains, such as the previously mentioned Beam Secure BBS, or the proposed Bitcoin Cash Overlay Network. There are also other application-specific P2P networks, like Bisq for doing decentralized exchange of cryptocurrencies. It’s probably also not a coincidence that the blockchain company Tron bought P2P pioneer Bittorrent, who had built a P2P messaging system called Bleep (or if it is a coincidence, then it’s a shame they killed off Bleep).

On the one hand the mentioned systems above are tailored to the blockchains that they support. They may, such as in the case of Beam, have the full support of the governing bodies behind their blockchain. In some ways that’s good because they know the system will support their needs. On the other hand, it limits the breadth of the network, lowers the number of people potentially working on it, and restricts innovation on potential applications that could go over the network. The goal here is to create a network that doesn’t only overlay a blockchain, but overlays the entire Internet to allow secure and private communications, and allows for application-specific formats to be supported.

This network could allow secure communications for simple messaging, news alerts, group messaging, file sharing (perhaps with integration with something like IPFS), social media, payment systems, currency exchange, and many more applications.