Identity

Another day, another data breach.

The Swedish government has apparently exposed personal identifying data on nearly all of their citizens. The dataset came from the ministry of transportation. It included names, photographs, home addresses, birthdates, and other details about citizens – as well as maintenance data on both roads and military and government vehicles. Perhaps most squirm-inducing, the dataset included active duty members of the special forces, fighter pilots, and people living under aliases as part of a witness protection program.

The data has been exposed since at least 2015. We’re just finding out about it now.

I have written in the past about the perils of compiling this sort of dataset. This particular ministry has a good excuse: They print identification cards. The fact that they emailed the information around in clear-text and handed management and storage off to third party processors with little or no diligence? That’s another story.

It provides a decent opportunity to talk about identity and zero knowledge proofs.

Identity is one of those concepts that appears simple from a distance, but that aways seems to wriggle out of any rigorous definition.

For today, let’s say that identity is a set of properties associated with a person. We use these properties (or knowledge of them) to verify that someone is who they say they are. We can deal with group identities and pseudonyms in another post. Let’s also agree to defer metaphysics and philosophy around any deeper meaning of the word “identity,” at least for the moment.

My name, birthdate, address, social security number, fingerprints, bank account numbers, current and past addresses, first pet, high school, mother’s maiden name, and so on are all properties attached to and supporting “my” identity. This list includes examples commonly used by banks and websites. When someone calls my bank on the phone and claims to be me, the bank might ask for any or all of the above. As the answers provided by the caller match the ones in the bank’s database, the bank gains confidence that the caller is actually me.

Once a birthday, address, or other similar fact is widely known, it becomes substantially less useful in demonstrating identity. It also becomes substantially easier for people to fake an identity.

This data breach brings a particular problem into stark relief: Our identity cards have all sorts of identifying information printed on them, and that information is available to anybody holding the card (or the database from which it came).

The bartender doesn’t need to know my birthday – they need to know that I am of legal age to buy alcohol. They certainly don’t need to know my address or organ donor status.

This is where zero knowledge proofs come in. A zero knowledge proof is an answer to a question (“is this person of legal drinking age?”) that does not expose any unnecessary information (like date of birth or address) beyond that answer.

In order to implement zero knowledge proofs we usually need a trusted third party who holds the private data and provides the answers. Instead of printing dates of birth on ID cards, we might print a simple barcode. The bartender would scan the barcode with a phone or other mobile app, and receive a “yes” or a “no” answer immediately from the appropriate agency. In some cases, the third party might send me a message letting me know that somebody scanned my ID card. In some cases (like financial transactions), they might even wait for me to validate the request before sending the approval.

If the third party is trustworthy, having them in the loop can radically increase our information security – both by reducing information leakage and by providing a trail of requests for information. Imagine a drivers license that did not contain your private information, and could be invalidated as soon as you reported it lost.

Blockchain technologies seem likely to provide a robust solution to the question of a trusted third party in a trust-free environment. More on that in a later post.

The oldest part of Blockchain

Public key encryption, or PKE, is one of the oldest techniques in the blockchain toolbox. PKE dates from the 1970s and has a lineage of being “discovered” by both military and civilian researchers. It’s powerful stuff: One of the early implementations of a PKE system, called “RSA,” was famously classified as a munition and subject to export control by the United States government.

While PKE (also called “asymmetric key”) is a critical technology in Blockchain systems, I care about it mostly because I get a lot of email. With PKE it is conceptually straightforward to encrypt and “sign” a message in such a way that the identity of the sender is publicly verifiable and that the intended receiver is the only one who can open it. I’ll explain why that matters for my INBOX further on in this post.

Most of the algorithms that underpin PKE make use of pairs of numbers – called “keys” – that are related in a particular way. These “key pairs” are used as input to algorithms to encrypt and decrypt messages. A message that has been encrypted with one of the keys in a pair can only be decrypted using the matching key. As with crytographic hashes, these systems rely on the fact that while it is straightforward to create a pair of keys, it is computationally impractical to guess the second key in a pair given only the first.

This is conceptually distinct from “symmetric” key algorithms, which use the same key for both encryption and decryption.

In one common use of PKE, one half of a key pair is designated as “public,” while the other is “private.” We share the public key widely, posting it on websites and key registries. The private key is closely held. If someone wants to send me a message, they encrypt it using my public key. Since I’m the only one with the private partner to that public key, I’m the only one who can decrypt the message.

Similarly, if the sender wants to “sign” their message, they can encrypt a message using their private key. In this case, only people with access to the public key will be able to decrypt it. This is, of course, not very limiting. Anybody in the world has access to the public key. However, it is still useful, because we know that this particular message was encrypted using the private partner to that public key.

What is particularly cool is that we can “stack” these operations, building them one on top of the other. A very common approach is to encrypt a message twice, first using the sender’s private key to provide verification of their identity, and then a second time using the recipient’s public key, to ensure that only the recipient can open the message.

Many Blockchain systems use this system to verify that the person (or people, or computer program) authorizing a transaction is in fact allowed to do so. In fact, because key pairs are cheap and plentiful, every single Bitcoin transaction has used a unique pair of keys, created just for that one event.

Back to my surplus of email: None of my banks or healthcare providers have deployed this nearly 40 year old capability for communicating with me. Instead, a growing fraction of my inbound email consists of notifications that I have a message waiting on some “secure message center.” I am exhorted to click a link and sometimes required to enter my password in order to see the message.

This practice is actively harmful. Fraudulent links in emails are among the primary vectors by which computers are infected with malware. When we teach the absolute basics of information security, “don’t click the link,” comes right after “don’t share your password,” but before “we will never ask for your password.”

Email systems that use PKE have been around since I’ve been using technology, and somehow my bank and my hospital haven’t caught on. The HIPAA requirement to use “secure messaging,” has driven them backwards, not forwards.

Perhaps if we call it “Blockchain messaging,” it’ll finally catch on.