The game of kings

A very smart and well informed colleague recently shared a thought that disturbed me. I’m writing it here mostly to get it out of my head, and also in the hopes that the eminently quotable Admiral Rickover will once again be proved right: “Weaknesses overlooked in oral discussion become painfully obvious on the written page.”

Here’s the observation: Machine learning and Artificial Intelligence are become a game of kings. The field is now the competitive arena for the likes of Microsoft, Google, Amazon, Facebook, and IBM. When companies of this scale compete, they do so with teams of thousands of people and spend (in aggregate) billions of dollars. The people on these teams are not a uniform sampling of their industry, they are the elite – high level professionals with the freedom to be choosy about their jobs.

The claim is that this presents an insurmountable barrier of entry to anyone who is not on one of those teams. Prosaically, when the King’s Hunt is afield, those of us without the resources of a king are well advised to stay out of the way.

In his words: “If you want to have an impact in AI or ML, the only real choice is which of the billionaires you want to work for.” Further, if you want to use these technologies, the only real choice is which billionaire to buy from.

I find this to be depressing, but not necessarily flawed. It would be easy (and potentially even more accurate) to make the same argument about computational infrastructure in the age of public exascale clouds.

There’s also an insulting subtext to the argument: If you are working with or on ML and AI and are not working for or with a billionaire, your work is de-facto pointless. Further, all the most talented people are flocking to join the King’s teams – maybe it’s just that you didn’t make the cut?

Did I mention that this particular colleague works part-time for Google? It reminds me of the joke about Crossfit: “How do you tell that somebody does crossfit? Oh don’t worry, they’ll tell you.”

With all that said, I don’t buy it. I fall back on Margaret Mead’s famous quote: “Never doubt that a small group of thoughtful, committed citizens can change the world; indeed, it’s the only thing that ever has.”

I harbor a deep-seated optimism about people. Everywhere I go, individuals and small teams absolutely sparkle with creativity and intelligence. These people are not the ‘B’ players, sad that they couldn’t make the cut to join the King’s hunting team. For my entire career, brilliant, hardworking innovators and entrepreneurs have been disrupting established power structures and upending entire markets. They don’t do this by fielding a second tier team in the old game – instead they invent a new game and change the world.

So while the point may be valid for established commodities, it is a bridge too far (and quite the leap of ego) to write off the combined innovative energy of the whole rest of the world.

I would welcome conversation on this. It feels important.

The blockchain part of Blockchain

The blockchain data structure (which is a part of, but distinct from the larger Blockchain ecosystem) consists, perhaps unsurprisingly, of an ordered series of “blocks.”

In addition to a payload of data and a few other housekeeping values, each block (except the first one, the “genesis” or “origination” block) contains the hash of the previous block. As described in a previous post, hash values are easy to verify and challenging to fake. A block is valid if it contains the hash of its predecessor. A valid blockchain contains only valid blocks.

A valid blockchain demonstrates an order of events. One cannot create a block without referring to the prior one. If the hashes are correct, we know that the blocks were created in sequential order, and therefore that the data stored on the chain was also written in that order. We know the relative order in which the data was written (we can’t generate a subsequent block without all the prior ones). We also know that the data has not changed since being written (changes to a block will change the hash, and require changes to all future ones).

Notably, we get no promises at all concerning validity or security. Merely storing information in a blockchain data structure does not make it correct, complete, or private. In fact, since most Blockchain systems are distributed ledgers (the topic of a future post), information on the chain is somewhat radically public. Every node in most Blockchain networks eventually see every piece of data on the chain.

Bitcoin and some (but not all) Blockchain systems up the ante on what constitutes a valid block by adding a nonce. The nonce is a value that, added to a block, yields a hash with specific and rare properties. This imposes a cost, called “proof-of-work,” on creating blocks. When creating a new block, authors must try (on average) a large number of nonces until they find one that yields a valid hash. The point of this is to make it computationally challenging merely to create a single new valid block at the end of the chain, and prohibitive to go back and corrupt earlier blocks.

The computational work of “mining” in the Bitcoin system is actually just searching for valid nonces. This is sufficiently different from conventional mining that it bears saying: In the usual use of the word “mining,” we are seeking out and refining a valuable resource. In Blockchain systems that use proof of work, the rare and precious resource at hand is the trustworthiness of the system itself. Value is not removed by the mining operation – it is actually being created.

Proof of work and the nonce

The blockchain technology ecosystem brings together a diverse set of codes and algorithms that have been developed over the past 50-ish years. It includes decades old cryptographic techniques like hashing and symmetric/asymmetric key encryption, and also includes relatively recent innovations related to distributed consensus.

The Blockchain ecosystem reminds me of the classic radio tag-line: It’s the best of the 80’s, 90’s, and today.

Proof of work is one component of that ecosystem. It is used to prevent denial of service attacks, in which large numbers of messages swamp and degrade a system. The system works by imposing a computational cost on the creation of valid messages. Receivers check whether messages are valid before they pay any attention to the contents.

The proof of work described in the original Blockchain paper is based on a system called Hashcash, that was developed in 1998 to combat spam email. The sender is required to find a value called a nonce that is specific to a particular message, and that demonstrates that they put effort into creating the message. A valid nonce is rare to find by chance, but easy to verify once found.

This property – numbers and relationships that are challenging to find, but trivial to verify – is the basis of most of modern cryptography. Hash functions are one example. A hash function takes arbitrary input and returns a value within a fixed range. In a good cryptographic hash, the result (sometimes simply called the “hash” of the input) is randomly distributed across that range. It is difficult to author an input to get any particular hash value.

The hashcash algorithm is simple: The nonce is combined with the message to be sent, and the combination is hashed. The hash result must be small relative to all possible hash results. Exactly how small is a parameter that can be used to tune the algorithm.

For example, if the hash function returns a 256 bit value, there are 2256 possible results. If we insist the nonce be a value that makes the first 16 of those bits ‘0’, we are insisting that senders find one of 2240 values from among 2256 possible hash results. The probability of this happening by random chance are one in 216, or something like 1 in 65,000.

On average (assuming that we have picked a good hash function) senders will have to try 216 nonces before finding a valid one. If we assume that each hash takes 1 second to calculate on a single CPU, the sender would invest (on average) slightly under a CPU day per message.

In the email system proposed in 1998 (I would love to use something like this, by the way) senders invest some amount of computation in creating a nonce for each message. Receivers sort or apply thresholds based on the value of the hash. Low numbered hashes represent an investment in the message. Human beings who type or dictate messages to small numbers of recipients won’t even notice the additional compute effort. Mass marketing campaigns will be expensive.

This exact computation is the work of “mining” in the Bitcoin network. The language of “mining” or “finding” bitcoins obscures the fact that we’re actually searching for nonces.

Of course, compute power keeps getting cheaper, so we need to have a flexible system. Fortunately, the tunable parameter of the nonce makes this simple. If compute performance on hash functions were governed by Moore’s law (it’s actually a bit more complex), then we would need to increase the strictness of our nonce by one bit every two years.

The Bitcoin network has been tuning its proof of work to produce valid blocks at a remarkably consistent rate of about one every ten minutes since 2010.

P.s: Thanks to Eleanor of Diamond Age Data Science for this post explaining the difference between probabilities and likelihoods. An earlier version of this post used the words incorrectly.

The unicorn rant

In biotech these days, I hear a lot of talk about “unicorns.” Sometimes they are rare fancy unicorns … purple, or glittery. At Bio IT World, I found myself moderating a conversation that involved herds and farms of these imaginary animals.

Of course, we were talking about finding and retaining top talent. In the staffing world, “unicorn” is the codeword for an impossibly ideal candidate with a rare mix of skills and experiences. My friends in the recruiting and staffing industries spend their days chasing unicorns. It seems really stressful for them.

Here’s the thing: Unicorns don’t exist.

I’m an engineer by training. I spend a lot of time designing and debugging complex systems. As a rule of thumb, if the plan relies on a continuous supply of something that is either vanishingly rare or (worse) nonexistent – it is a bad plan. When brainstorming, we might joke about knowing a reliable supplier of unobtanium. Sometimes we trot out the old cartoon with the guy saying “and then a miracle occurs.” Eventually. however, engineers sigh and set to work on a better plan.

Not so with many hiring managers, senior leaders, board members, and venture firms in biotech. From what I hear, the plan is to fight harder for the unobtanium, to hope for the miracle.

We need a better plan.

Before going further, I want to first reaffirm my commitment to finding and retaining the best people. Of course people make the difference. Of course we should be highly selective. And yes, of course there are massive, critical differences between candidates. It is a false comparison and a strawman argument to suggest that “making do with a third rate workforce, indiscriminately chosen,” is the only alternative to the unicorn quest.

There are three major pieces to building an organization that does not rely on unicorns:

  • Managers must assume the full time job of supporting and developing their teams.
  • Project plans, workflows, and team behaviors must err on the side of granular, achievable work – with mechanisms to self-correct when the plan is wrong.
  • Recruiting must focus on attitude and enthusiasm, not on finding the next hero.

The non-unicorn plan is straightforward to say, but requires diligent effort and consistency: Divide work into achievable pieces (planning, architecture, and project management are real jobs), hire enthusiastic and intelligent people (give recruiting and HR a fighting chance), and give those people the resources they need (management is a real job).

There’s plenty of literature on this, but you won’t find it in the sci-fi fantasy or the young adult section of the bookstore. Instead, do a quick google on “Hero culture.” You may find yourself reading about burnout, mythical man-months, success catastrophes, and flash-in-the-pan companies.

A more subtle pathology of the unicorn fetish is that it encourages the worst sort of bias and monoculture. When the written criteria are unachievable (unicorn!), then the hiring decision is actually subjective. Rejecting candidate after candidate based on “fit,” or poor interview performance is almost always a warning sign that we’re in bias and blind-spot territory.

As an aside, please recall that interviews are among the worst predictors of job performance.

From the candidate perspective, unicorn recruiting is simple: The best opportunities are only available to the people who have already had the best opportunities (the paper qualifications), and who give favorable first impressions to the hiring manager (bias and cronyism). From what I can see of the startup culture in both Boston and San Francisco, this is in fact the situation. In both cities, we have large populations of motivated people actively seeking work while recruiters work themselves to death. Meanwhile hiring managers make sci-fi/fantasy metaphors to support staffing plans that are based on miracles.

We can do better.

Finally, if none of that convinces you, then perhaps consider the traditional mythology about who, exactly, should be sent to capture a unicorn.

Either way, we’re doing it wrong.

The second decade of the cloud

In my talk at Bio-IT World this year, I made some comments about “cloud” technologies that I think bear repeating.

2017 is somewhere in the middle of the second decade of the cloud.

Of course, when I say “cloud,” I mean much more than mere virtualization. You don’t get the 2017-benefits of “going to the cloud” by just hosting your legacy architecture in Amazon’s east-coast-1 availability zone. Nor do you get them by putting your one-server-per-service enterprise on a fancy VMWare / ESX system, no matter how “hyperconverged” it may be. That’s the kind of misuse of the technology that has kept the “on-prem vs cloud” boondoggle alive so far past its expiration date.

Virtualization, of course, is a very good idea. Depending on how you define it, we’re in at least the third decade of OS level virtualization. That’s even more of a solved problem than the cloud.

The benefits of the cloud in 2017 accrue when you adopt cloud-native architectures. This entails substantially more work than porting a system to a hosted platform. It is also absolutely worth it for all but the longest of the long tail of legacy systems.

A bit of history: Amazon Web Services launched as a platform in 2002, and re-launched with EC2 and S3 in 2006. At the time, I worked for BioTeam. Less than a year later, in 2007, we noticed that every single member of the technical team had independently chosen at least one AWS based solution for a customer need. There was no corporate mandate – it was the right way to do the engineering.

At the time, I was responsible for many aspects of Bioteam’s “Inquiry” software product. By early 2008, we had ported our software to AWS and were offering it under license terms that still read pretty well, 9 years on.

While the FAQ above has aged well, that 2008 port of Inquiry looks pretty dusty in the bright lights of 2017. We took a legacy HPC / batch computing architecture and we virtualized it to run on AWS. There is certainly some forward-looking stuff in there, hosts that spin up and down in response to backlogs of work, and also some cleverness around staging data to and from S3. However, it bears little resemblance to the approach that one might take today.

Chris Dagdigian put it well at Bio-IT World: Many cloud-native system architectures do not have a direct “on-prem” analogue. In particular, Lambda and serverless architectures are challenging to explain in terms of the systems that we built in 2006.

As just one small example: On the Inquiry port, we spent a lot of time convincing our old faithful HPC job scheduler, Sun Grid Engine to be okay with hosts appearing and disappearing all the time. In our hosted, legacy architecture, SGE interpreted many aspects of the cloud as repeated failure. Compare that with even the most basic autoscaling architectures – to say nothing of the wizardry behind tools like Amazon’s Athena. Athena is frankly a bit mind-bending for somebody who made a good living less than 10 years ago making less-usable systems to do less-robust data analysis.

I find it clarifying to think about “cloud” from the perspective of a non technologist. When the CFO, COO, or CSO think of “cloud,” or articulate a “cloud first” strategy, they almost certainly have business or scientific metrics in mind, rather than technical niceties like where the metal happens to live. When executives ask for “cloud,” in my experience they are asking for things like:

  • Remove an entire category of off-mission task from the in-house team.
  • Make technology updates totally seamless and automatic.
  • Vastly simplify licensing and budgeting – budget in terms of headcount, not opaque version numbers and product families
  • Scale without limit, even in the event of a “success catastrophe.”

Note that merely virtualizing a legacy architecture onto Amazon, Google, or Microsoft (or yes, even one of the at-least-six-way-tie-for-fourth-place other public cloud providers) provides zero of the benefits for which we are sent off to “cloud.”

The good news: These benefits are, in fact, possible for scientific and high performance computing. It will not be as easy as it was with human resources or office productivity tools, but we will do it. And it will not be as simple as moving everything to AWS east-coast-1.

Blockchain

Over the summer, I have the opportunity to think deeply about the ecosystem of technologies that go by the name “Blockchain.” I’m focusing particularly on how these might apply in a couple of different scientific and healthcare contexts. I plan to post snippets here from time to time, as much to force me to clarify my thinking as anything else. As Hyman Rickover said, "Nothing so sharpens the thought process as writing down one's arguments. Weaknesses overlooked in oral discussion become painfully obvious on the written page."

One challenge when trying to talk about blockchain is that it is massively hyped – sitting right at the peak of Gartner’s hype cycle. Most of the meetings I go to these days include at least one person who asks, regardless of the topic at hand, “what about blockchain?”

Another challenge is that Blockchain is strongly associated with Bitcoin and other cryptocurrencies. The hype brings a certain breathiness to the conversation, while the finance connection brings associations with fraud and nefarious dealings. Neither of these is entirely merited – but I’m finding it important to keep in mind as I explore.

For all that, the foundational documents are remarkably crisp, lucid, and readable. The original bitcoin paper, written under the pseudonym “Satoshi Nakamoto,” is only eight pages long – plus a half page of references.

It’s clear to me that there’s important work to be done in this space – and I’m thrilled to have the time to take part.

Bio-IT World 2017

Last week, one of my favorite conferences celebrated its 15th year: Bio-IT World. This is one of only a few forums where my particular professional community comes together to share our experiences, re-connect, and to try to catch a glimpse of what the future might hold. I’ve been showing up since at least 2004, and it’s always great to see old friends and to meet new ones.

I’m on the committee that awards “Best Practices,” and also “Best In Show.” This year I didn’t actually get to walk around and see the teams present their work. I missed that – it’s a private tour of the most interesting products and exhibitors. We were amazed at the strength of the entries this year. We saw projects from clinical diagnostics, agricultural genomics, semantic reasoning, and so very much more. The choices are never straightforward, but this year turned some sort of a corner for me in terms of direct impact and technical maturity.

The most fun two hours of the show was moderating the re-imagined “Trends from the Trenches.” Chris Dagdigian has been giving that talk since 2008, and he finally won his multi-year battle to expand it beyond just him. We re-framed the session as a panel / mini-symposium including five Bioteam people. It was a thrill to try to keep up with that crew, live and in real time. We tried a couple of experiments in audience interaction while we were at it, Slido to submit and vote on topics, and a pair of Catchbox throwable microphones. There was immediate synergy between the two systems: The very first Slido question was “can you reach the back of the auditorium with the Catchbox?” I couldn’t, but got an audience assist and it worked out okay.

I also got to give a solo talk in the Data track, focused on a couple of practical points about cloud technologies and data management. I’ve put those slides online at Slideshare.

Thanks to everybody at CHI who worked so hard to put together a great show. See you next year!