Putting data beyond ownership

Some blockchain implementations – public, non-permissioned ones in particular – have a really interesting property:

Nobody owns them.

As a sentence, it’s a simple statement. There is no owner.

As I’ve considered it, however, this is far more subtle and powerful than merely having a NULL value in the field where we usually write down the owner’s name. It is not merely that these systems do not have an owner right now, nor that there -is- an owner who has given broad permissions around the use and re-sharing of the data.

Rather, these systems are by design non-ownable. It might be accurate to describe them as “self sovereign,” though in a limited sort of way.

The foundational example of a blockchain system, BitCoin, was designed to solve an accounting problem in a low trust environment: The designers wanted to keep a ledger of transactions among a group of parties who did not trust each other to keep the books. They also didn’t trust any particular third party, whether it was a bank or a nation state.

The solution to that problem was to create an escrow service. In effect, the BitCoin network created the third party that they needed. That third party was sufficiently disinterested and transparent that the participants were comfortable using it to keep the books.

People are making increasingly clever use of this property of BitCoin and other public chains. It serves as a highly trustworthy and disinterested repository of data. It is useful precisely because there is no chance of a conflict of interest or an external force changing the system. While we do worry about it failing and becoming unviable, we do not worry about the owner being bought by a competitor or compelled by a nation state.

There is no owner to compel. I keep coming back to how -odd- that is.

Note that this is distinct from the question of whether the data itself is public, private, or bound by one license another. Uploading copyrighted material to a blockchain service does not invalidate the copyright. The copyright is still valid. Instead, it means that the party that received the data isn’t a person or a company.

This new model opens up possibilities for system designers and data strategists. Many of the challenges in our health care system center on the obligations, risks, and potential rewards of being the organization who owns the systems that hold the data.

Under current law, the data in a person’s medical records belongs to that person. However, the electronic medical record systems that hospitals use to record the data belong to the hospital. Hospitals exist in a web of frequently conflicting pressures, risks, and incentives. Anybody who has navigated the American health care system knows the awful challenges involved in accessing and sharing data that we supposedly “own.”

To be clear, using the BitCoin network to directly store electronic medical records is a terrible, half-baked, hopelessly naive idea. BitCoin is built on the wrong set of incentives for health care. It operates at the wrong tempo. It also has structural and social baggage that make it a bad choice for this particular application.

More fundamentally, data that should remain private over the long term should not go on public chains, even encrypted. Encryption has a shelf life. The computers of the future will be powerful enough to break the encryptions of today.

With that said, I do think that there is a great opportunity to change the rules of the game by considering which slices of information hospitals and insurance companies and research organizations hold closely because they can’t trust each other. Then we should ask: “What if we let ‘nobody’ hold some of that information, instead of any of us?”

Identity

Another day, another data breach.

The Swedish government has apparently exposed personal identifying data on nearly all of their citizens. The dataset came from the ministry of transportation. It included names, photographs, home addresses, birthdates, and other details about citizens – as well as maintenance data on both roads and military and government vehicles. Perhaps most squirm-inducing, the dataset included active duty members of the special forces, fighter pilots, and people living under aliases as part of a witness protection program.

The data has been exposed since at least 2015. We’re just finding out about it now.

I have written in the past about the perils of compiling this sort of dataset. This particular ministry has a good excuse: They print identification cards. The fact that they emailed the information around in clear-text and handed management and storage off to third party processors with little or no diligence? That’s another story.

It provides a decent opportunity to talk about identity and zero knowledge proofs.

Identity is one of those concepts that appears simple from a distance, but that aways seems to wriggle out of any rigorous definition.

For today, let’s say that identity is a set of properties associated with a person. We use these properties (or knowledge of them) to verify that someone is who they say they are. We can deal with group identities and pseudonyms in another post. Let’s also agree to defer metaphysics and philosophy around any deeper meaning of the word “identity,” at least for the moment.

My name, birthdate, address, social security number, fingerprints, bank account numbers, current and past addresses, first pet, high school, mother’s maiden name, and so on are all properties attached to and supporting “my” identity. This list includes examples commonly used by banks and websites. When someone calls my bank on the phone and claims to be me, the bank might ask for any or all of the above. As the answers provided by the caller match the ones in the bank’s database, the bank gains confidence that the caller is actually me.

Once a birthday, address, or other similar fact is widely known, it becomes substantially less useful in demonstrating identity. It also becomes substantially easier for people to fake an identity.

This data breach brings a particular problem into stark relief: Our identity cards have all sorts of identifying information printed on them, and that information is available to anybody holding the card (or the database from which it came).

The bartender doesn’t need to know my birthday – they need to know that I am of legal age to buy alcohol. They certainly don’t need to know my address or organ donor status.

This is where zero knowledge proofs come in. A zero knowledge proof is an answer to a question (“is this person of legal drinking age?”) that does not expose any unnecessary information (like date of birth or address) beyond that answer.

In order to implement zero knowledge proofs we usually need a trusted third party who holds the private data and provides the answers. Instead of printing dates of birth on ID cards, we might print a simple barcode. The bartender would scan the barcode with a phone or other mobile app, and receive a “yes” or a “no” answer immediately from the appropriate agency. In some cases, the third party might send me a message letting me know that somebody scanned my ID card. In some cases (like financial transactions), they might even wait for me to validate the request before sending the approval.

If the third party is trustworthy, having them in the loop can radically increase our information security – both by reducing information leakage and by providing a trail of requests for information. Imagine a drivers license that did not contain your private information, and could be invalidated as soon as you reported it lost.

Blockchain technologies seem likely to provide a robust solution to the question of a trusted third party in a trust-free environment. More on that in a later post.

The oldest part of Blockchain

Public key encryption, or PKE, is one of the oldest techniques in the blockchain toolbox. PKE dates from the 1970s and has a lineage of being “discovered” by both military and civilian researchers. It’s powerful stuff: One of the early implementations of a PKE system, called “RSA,” was famously classified as a munition and subject to export control by the United States government.

While PKE (also called “asymmetric key”) is a critical technology in Blockchain systems, I care about it mostly because I get a lot of email. With PKE it is conceptually straightforward to encrypt and “sign” a message in such a way that the identity of the sender is publicly verifiable and that the intended receiver is the only one who can open it. I’ll explain why that matters for my INBOX further on in this post.

Most of the algorithms that underpin PKE make use of pairs of numbers – called “keys” – that are related in a particular way. These “key pairs” are used as input to algorithms to encrypt and decrypt messages. A message that has been encrypted with one of the keys in a pair can only be decrypted using the matching key. As with crytographic hashes, these systems rely on the fact that while it is straightforward to create a pair of keys, it is computationally impractical to guess the second key in a pair given only the first.

This is conceptually distinct from “symmetric” key algorithms, which use the same key for both encryption and decryption.

In one common use of PKE, one half of a key pair is designated as “public,” while the other is “private.” We share the public key widely, posting it on websites and key registries. The private key is closely held. If someone wants to send me a message, they encrypt it using my public key. Since I’m the only one with the private partner to that public key, I’m the only one who can decrypt the message.

Similarly, if the sender wants to “sign” their message, they can encrypt a message using their private key. In this case, only people with access to the public key will be able to decrypt it. This is, of course, not very limiting. Anybody in the world has access to the public key. However, it is still useful, because we know that this particular message was encrypted using the private partner to that public key.

What is particularly cool is that we can “stack” these operations, building them one on top of the other. A very common approach is to encrypt a message twice, first using the sender’s private key to provide verification of their identity, and then a second time using the recipient’s public key, to ensure that only the recipient can open the message.

Many Blockchain systems use this system to verify that the person (or people, or computer program) authorizing a transaction is in fact allowed to do so. In fact, because key pairs are cheap and plentiful, every single Bitcoin transaction has used a unique pair of keys, created just for that one event.

Back to my surplus of email: None of my banks or healthcare providers have deployed this nearly 40 year old capability for communicating with me. Instead, a growing fraction of my inbound email consists of notifications that I have a message waiting on some “secure message center.” I am exhorted to click a link and sometimes required to enter my password in order to see the message.

This practice is actively harmful. Fraudulent links in emails are among the primary vectors by which computers are infected with malware. When we teach the absolute basics of information security, “don’t click the link,” comes right after “don’t share your password,” but before “we will never ask for your password.”

Email systems that use PKE have been around since I’ve been using technology, and somehow my bank and my hospital haven’t caught on. The HIPAA requirement to use “secure messaging,” has driven them backwards, not forwards.

Perhaps if we call it “Blockchain messaging,” it’ll finally catch on.

The blockchain part of Blockchain

The blockchain data structure (which is a part of, but distinct from the larger Blockchain ecosystem) consists, perhaps unsurprisingly, of an ordered series of “blocks.”

In addition to a payload of data and a few other housekeeping values, each block (except the first one, the “genesis” or “origination” block) contains the hash of the previous block. As described in a previous post, hash values are easy to verify and challenging to fake. A block is valid if it contains the hash of its predecessor. A valid blockchain contains only valid blocks.

A valid blockchain demonstrates an order of events. One cannot create a block without referring to the prior one. If the hashes are correct, we know that the blocks were created in sequential order, and therefore that the data stored on the chain was also written in that order. We know the relative order in which the data was written (we can’t generate a subsequent block without all the prior ones). We also know that the data has not changed since being written (changes to a block will change the hash, and require changes to all future ones).

Notably, we get no promises at all concerning validity or security. Merely storing information in a blockchain data structure does not make it correct, complete, or private. In fact, since most Blockchain systems are distributed ledgers (the topic of a future post), information on the chain is somewhat radically public. Every node in most Blockchain networks eventually see every piece of data on the chain.

Bitcoin and some (but not all) Blockchain systems up the ante on what constitutes a valid block by adding a nonce. The nonce is a value that, added to a block, yields a hash with specific and rare properties. This imposes a cost, called “proof-of-work,” on creating blocks. When creating a new block, authors must try (on average) a large number of nonces until they find one that yields a valid hash. The point of this is to make it computationally challenging merely to create a single new valid block at the end of the chain, and prohibitive to go back and corrupt earlier blocks.

The computational work of “mining” in the Bitcoin system is actually just searching for valid nonces. This is sufficiently different from conventional mining that it bears saying: In the usual use of the word “mining,” we are seeking out and refining a valuable resource. In Blockchain systems that use proof of work, the rare and precious resource at hand is the trustworthiness of the system itself. Value is not removed by the mining operation – it is actually being created.