Putting data beyond ownership

Some blockchain implementations – public, non-permissioned ones in particular – have a really interesting property:

Nobody owns them.

As a sentence, it’s a simple statement. There is no owner.

As I’ve considered it, however, this is far more subtle and powerful than merely having a NULL value in the field where we usually write down the owner’s name. It is not merely that these systems do not have an owner right now, nor that there -is- an owner who has given broad permissions around the use and re-sharing of the data.

Rather, these systems are by design non-ownable. It might be accurate to describe them as “self sovereign,” though in a limited sort of way.

The foundational example of a blockchain system, BitCoin, was designed to solve an accounting problem in a low trust environment: The designers wanted to keep a ledger of transactions among a group of parties who did not trust each other to keep the books. They also didn’t trust any particular third party, whether it was a bank or a nation state.

The solution to that problem was to create an escrow service. In effect, the BitCoin network created the third party that they needed. That third party was sufficiently disinterested and transparent that the participants were comfortable using it to keep the books.

People are making increasingly clever use of this property of BitCoin and other public chains. It serves as a highly trustworthy and disinterested repository of data. It is useful precisely because there is no chance of a conflict of interest or an external force changing the system. While we do worry about it failing and becoming unviable, we do not worry about the owner being bought by a competitor or compelled by a nation state.

There is no owner to compel. I keep coming back to how -odd- that is.

Note that this is distinct from the question of whether the data itself is public, private, or bound by one license another. Uploading copyrighted material to a blockchain service does not invalidate the copyright. The copyright is still valid. Instead, it means that the party that received the data isn’t a person or a company.

This new model opens up possibilities for system designers and data strategists. Many of the challenges in our health care system center on the obligations, risks, and potential rewards of being the organization who owns the systems that hold the data.

Under current law, the data in a person’s medical records belongs to that person. However, the electronic medical record systems that hospitals use to record the data belong to the hospital. Hospitals exist in a web of frequently conflicting pressures, risks, and incentives. Anybody who has navigated the American health care system knows the awful challenges involved in accessing and sharing data that we supposedly “own.”

To be clear, using the BitCoin network to directly store electronic medical records is a terrible, half-baked, hopelessly naive idea. BitCoin is built on the wrong set of incentives for health care. It operates at the wrong tempo. It also has structural and social baggage that make it a bad choice for this particular application.

More fundamentally, data that should remain private over the long term should not go on public chains, even encrypted. Encryption has a shelf life. The computers of the future will be powerful enough to break the encryptions of today.

With that said, I do think that there is a great opportunity to change the rules of the game by considering which slices of information hospitals and insurance companies and research organizations hold closely because they can’t trust each other. Then we should ask: “What if we let ‘nobody’ hold some of that information, instead of any of us?”

Nobody cares about the cloud

Nobody cares about the cloud.

It’s a statement that requires a bit of explanation – since “the cloud” is so ubiquitous. I see it advertised on highway billboards, in airport concourses, and everywhere in between. I believe that my current resume and LinkedIn profile may even mention “cloud transformation.”

All of us connected to information technology have been immersed in a decade long frenzy of cloud. Personally, I’ve built production services using at least five different public clouds. I’ve built private clouds using both OpenStack and VMWare. I’ve smeared a single service across multiple public and private clouds (hybrid cloud), used a public cloud to provide additional capacity beyond my on-premise infrastructure (cloud bursting), and hosted one service on one public cloud and a different service on a different cloud –
all for the same enterprise (multi cloud).

I’ve made fun of more than one vendor who tried to convince me that their same old capital intensive, on premise offering was competitive and modern because they had switched to monthly billing with bundled support (enterprise cloud).

I’ve glared balefully at people who tittered about calling a legacy on-premise infrastructure a “fog.”

It’s a cloud that’s close to the ground. Get it?

And yet I claim that nobody cares about the cloud. I suppose that, to be completely honest, I actually mean that I don’t care about the cloud.

I mean this in the same way that I might say that I don’t care any particular Linux distribution, any particular CPU or GPU, or the very latest iPhone.

I’m a technologist, I use technology – but not for its own sake.

I care about what these technologies can let us do.

I care very much about accelerating the tempo of biomedical research and drug discovery. I have been in this industry for nearly two decades now, and it is going too damn slowly.

I care about democratizing the benefits of genomic medicine. Access to these cures is still far too dependent on being rich and having the right connections. Technology can help to reduce that unfairness.

I care about helping my clients to be more competitive as they search for leads, compounds, patents, products, and cures.

I care about ensuring that caregivers, from family through to physicians, have accurate and timely information – and that they can share that information with each other to give the very best care.

I care about appropriate usage of information. I care about preserving privacy while still enabling data driven therapies and intelligent interventions.

I care about empowering people, with both technology and education, to make informed choices.

I care about, in the words of Paul Farmer, “preventing stupid deaths.”

The cloud is a means to an end. It’s a fuzzy and overhyped term for a remarkable and transformative set of technologies. It’s right at the very core of my industry, and I really truly do not “care” about it at all.

I care about the mission.

I don’t care about the cloud.

Bioinformatics vs Computational Biology

In the past, I used the terms “bioinformatics” and “computational biology” somewhat interchangeably. I don’t do that anymore.

  • Bioinformatics is about reusable tools and information resources for biology.
  • Computational Biology is about biological insight.

It’s not necessarily a distinction between two different types of person. I know plenty of people who can do either type of work, building tools and then using them in their science. That said, people do seem to sort themselves to focus either on tool building or else on hypothesis testing.

The distinction is important because the metrics of project success are different, because we should manage projects and teams differently based on those metrics, and because career advancement and mentoring diverges rapidly between the two.

As always, I’m interested in your feedback, particularly if you disagree with me!