The Electronic Medical Mess

I posted a quick tweet this morning about the state of data in health care.

Over the years, I’ve worked with at least half a dozen projects where earnest, intelligent, diligent folks have tried to unlock the potential stored in mid to large scale batches of electronic medical records. In every case, without exception, we have wound up tearing our hair and rending our garments over the abysmal state of the data and the challenges in getting access to it at all. It is discordant, incomplete, and frequently just plain-old incorrect.

I claim that this is the result of structural incentives in the business of medicine.

What is a Medical Record?

Years ago the medical record was how physicians communicated amongst themselves. The “clinical narrative” was a series of notes written by a primary care physician, punctuated by requests for information and answers from specialists. Physicians operated with an assumption of privacy in these notes, since patients didn’t generally ask to see them. Of course they were still careful with what they wrote. If things went sideways, those notes might wind up being read aloud in front of a judge and jury.

In the 80’s, electronic medical records (EMRs) added a new dimension to this conversation. EMRs were built, in large part, to support accurate and timely information exchange between health care organizations and “payers” including both corporate and government insurance. EMRs digitized the old clinical narrative basically unchanged. They sometimes allowed in-house lab values to be transferred as data rather than text, though in many cases that sort of feature came much later. Most of the engineering effort went into building a framework for billing and payment.

The savvy reader will note that neither of these is a particularly good way to build a system for the collection of patient data.  Instead, we’re dealing with risk avoidance.

A Question of Risk and Cost

Being the Chief Information Officer (CIO) of a health care system or a hospital is a hard, stressful, and frequently thankless job. Information Technology (IT) is usually seen as a cost center and an expense rather than as a driver of revenue. A savvy CIO is always looking for ways to reduce costs and allow their institution to put more dollars directly into the health care mission. Successful hospital CIOs spend a lot of time thinking about risk. There are operational risks from attacks like ransomware, compliance risks, risks that the hospital will expose patient data inappropriately, financial risks from lost revenue, legal risks from failing to meet standards of care, and many more.

These pressures lead to a very sensible and consistent mindset among hospital CIOs: They have a healthy skepticism of the “new shiny,” an aversion to change, and a visceral awareness of their responsibility to consistent and compliant operations

So physicians are incentivized to avoid litigation, hospital information systems are incentivized to reduce exposure, and the core software we use for the whole mess is written primarily to support financial transactions.

Every single person I’ve ever met in the business and practice of health care, without exception, wants to improve patient lives. This is not a case where we need to find the bad, the malevolent, or the incompetent people and replace them. Instead, it’s one of those situations where good, smart hardworking people are stuck with a system that we all know needs a solid shake-up.

That means that when someone (like me) shows up and proposes that we change a bunch of hospital practices (including modifying that damn EMR software) so that we can gather better data, it falls a bit flat. If I reveal my grand plan to take the data and use it for some off-label purpose like improving the standard of care globally, I am usually politely but firmly shown the door.

But it gets worse.

Homemade Is Best

Back in the bad old days, it was possible to convince ourselves that observations made by physicians were the best and only data that should be used in the diagnosis of disease. That’s demonstrably untrue in the age of internet connected scales and wearable pulse and sleep monitors. I’ve written before about the reaction I receive when I show up to my doctor as a Patient With A Printout (PWP). Even here in 2019, there are not many primary care physicians who are willing to look at data from a direct to consumer genetics or wellness company.

The above isn’t strictly true. I know lots of physicians who have a very modern approach to data when we talk over coffee or dinner. However, at work, they have to do a job. The way they are allowed to do that job is defined by CIOs and hospital Risk Officers who grow nervous when we try to introduce outside data sources in the clinical context. What assertions do we have that these wearable devices meet any standards of clinical care? Who, they might ask, will be be legally responsible if a diagnosis is missed or an incorrect treatment applied?

So we’re left with a population health mindset that says “never order a test unless you know what you’re going to do with the result,” except that in this case it’s “don’t look at a test that was already done, you might wind up with an inconvenient incidental finding, and then we’ll have to talk to legal.”

Health systems incentivize risk avoidance above more accurate or timely data. They do this because they are smart, and because they want to stay in business.

So we collect information with a system tuned for billing, run by people whose focus is on risk avoidance. Is it any wonder that when we extract that data, what we find is a conflicting and occasionally self-contradictory mess?

There’s no incentive to have it any other way,

A Better Way

Here in 2019, most people who pay attention to such things believe that data driven health insights will lead to better clinical outcomes, better quality of life, lower overall costs for health care, and many other benefits.


One ray of hope comes from the online communities that spring up to connect people with rare and terrible diseases. These folks share information amongst friends, family, researchers, and physicians as they search desperately for any hope of a cure. Along the way, they create and curate incredibly valuable data resources. The difference between these patient centric repositories and any extraction that we might get from an EMR is simply night and day.

A former colleague was fond of saying, “a diagnosis of cancer really clarifies your thinking about the relative importance of data privacy.”

Put another way: If we put the patient at the center of the data equation, rather than payment, we’re really not that very far from a much better world – and all those wonderful technologies I mentioned will suddenly be quite useful.

Unfortunately, that’s a political question these days:

So where do we go from here? I’m not sure.

I do know for certain that -merely- flinging the messy pile of junk against the latest Machine Learning / Artificial Intelligence / Natural Language Processing software, without addressing the underlying data quality, is unlikely to yield durable and useful results.

Garbage in, garbage out – as the saying goes.

I would love to hear your thoughts.

Letting the genome out of the bottle

About eleven years ago, in January of 2008, the New England Journal of Medicine published a perspective piece on direct to consumer genetic tests, “Letting the Genome out of the Bottle, Will We Get Our Wish.” The article begins by describing an “overweight” patient who “does not exercise.” This man’s children have given him the gift of a direct to consumer genetics service at the bargain price of $1,000.

The obese person who (did we mention) can’t be troubled to go to the gym is interested in medical advice based on the fact that they have SNPs associated with both diabetes and cardiovascular disease. The message is implied in the first paragraph, and explicitly stated in the last:  “Until the genome can be put to useful work, the children of the man described above would have been better off spending their money on a gym membership or a personal trainer so that their father could follow a diet and exercise regimen that we know will decrease his risk of heart disease and diabetes.”

Get it?  Don’t bother us with data.  We knew the answer as soon as your heavy footfalls sounded in the hallway.  Hit the gym.

The authors give specific advice to their colleagues “for the patient who appears with a genome map and printouts of risk estimates in hand.”  They suggest dismissing them:  “A general statement about the poor sensitivity and positive predictive value of such results is appropriate … For the patient asking whether these services provide information that is useful for disease avoidance, the prudent answer is ‘Not now — ask again in a few years.'”

Nowhere do the authors mention any potential benefit to taking a glance at the sheaf of papers this man is clutching in his hands.

Just 10 years ago, a respected and influential medical journal told primary care physicians to discourage patients from seeking out information about their genetic predisposition to disease.  Should someone have the nerve to bring a “printout,” they advise their peers to employ fear, uncertainty, and doubt. They suggest using some low-level statistical jargon to baffle and deflect, before giving answers based on a population-normal assumption.

The reason I’m writing this post is because I went to the doctor last week and got that exact answer, almost verbatim.  I already went off about this on twitter.  I’m writing this because I think that it may benefit from a more nuanced take.

More on that at the end of the post.

Eight bits of history

For all its flaws, the article does serve as a fun and accessible reminder of how far we have come a decade.

I did 23andme when it first came out. I’ve downloaded my data from them a bunch of times.  Here are the files that I’ve downloaded over the years, along with the number of lines in each file:

cdwan$ wc -l genome_Christopher_Dwan_*
576119 genome_Christopher_Dwan_20080407151835.txt
596546 genome_Christopher_Dwan_20090120071842.txt
596550 genome_Christopher_Dwan_20090420074536.txt
1003788 genome_Christopher_Dwan_Full_20110316184634.txt
1001272 genome_Christopher_Dwan_Full_20120305201917.txt

The 2008 file contains about 576,000 data points.  That doubled to a bit over a million when they updated their SNP chip technology in 2011.

The authors were concerned that “even very small error rates per SNP, magnified across the genome, can result in hundreds of misclassified variants for any individual patient.”  When I noticed that my results from the 2009 download were different from those in 2008, I wrote a horrible PERL script to figure out the extent of the changes. I still had sitting around on my laptop, so ran it again today. I was somewhat shocked that it worked on the first try, a decade and at least two laptops later!  

My 23andme results were pretty consistent. Of the SNPs that were reported  in both v1 and v2, my measurements differ at a total of 54 loci. That’s an error rate of about one hundredth of one percent. Not bad at all, though certainly not zero.

For comparison, consider the height and weight that usually gets taken when you visit  a doctor’s office. In my case, they do these measurements with shoes and clothing on – meaning that I’m an inch taller (winter boots) and about 8 pounds heavier (sweater and coat) if I see my doctor in the winter. Those are variations of between 1% and 5%.

Fortunately, nobody ever looks at adult height or weight as measured at the doctor’s office. They put us on the scale so that the practice can charge our insurance providers for a physical exam, and then the doctor eyeballs us for weight and any concealed printouts.

A data deluge

Back to genomics: $1,000 will buy a truly remarkable amount of data in late 2018.  I just ordered a service from Dante Labs that offers 30x “read depth” on my entire genome.  They commit to measure each of my 3 billion letters of DNA at least 30 times.  Taken together, that’s 90 billion data points, or 180,000 times more measurements than that SNP chip from a decade ago.  Of course, there’s a strong case to be made that those 30 reads of the same location are experimental replicates, so it’s really only 3 billion data points or 6,000 times more data. Depending on how you choose to count, that’s either 12 or 17 doublings over a ten year span.   

Either way, we’re in a world where data production doubles faster than once per year.

This is a rough and ready illustration of the source of the fuss about genomic data.  Computing technology, both CPU and storage, seems to double in capacity per dollar every 18 months. Any industry that exceeds that tempo for a decade or so is going to experience growing pains.

To make the math simple, I omitted the fact that this year’s offering -also- gives me an additional 100x of read depth within the protein coding “exome” regions, as well as some even deeper reading of my mitochondrial DNA.

One real world impact of this is that I’m not going to carry around those raw reads on my laptop anymore. The raw files will take up a little more than 100 gigabytes, which would be. 20% of my laptop hard disk (or around 150 CD ROMs). 

I plan to use the cloud, and perhaps something more elegant than a 10 year old single threaded PERL script, to chew on my new data.

The more things change

Back to the point:  I’m writing this post because, here in late 2018, I got the -exact- treatment that the 2008 article recommends. It’s worse than that, because I didn’t even bring in anything as fuzzy as genotypes or risk variants.  Instead, I brought lab results, ordered through Arivale, and generated by a Labcorp facility to HIPAA standards.

I’ve written about Arivale before.  They do a lab workup every six months. That, coupled with data from my wearable and other connected devices provides the basis for ongoing coaching and advice.

My first blood draw from Arivale showed high levels of mercury. I adjusted my diet to eat a bit lower on the food chain. When we measured again six months later, my mercury levels had dropped by 50%. However, other measurements related to inflammation had doubled over the same time period.  Everything was still in the “normal” range – but a fluctuation of a factor of two struck me as worth investigating.

I use one of those fancy medical services where, for an -additional- out-of-pocket annual cost, I can use a web or mobile app to schedule appointments, renew prescriptions, and even exchange secure messages with my care team. Therefore, I didn’t have to do anything as undignified as bringing a sheaf of printouts to his upscale office on a high floor of a downtown Boston office building.  Instead, I downloaded a PDF from Arivale and sent them as a message with my appointment request.

When we met, my physician had printed out the PDFs.  Perhaps this is part of that “digital transformation” I’ve heard so much about. The 2008 article is studiously silent on the topic of doctors bearing printouts. I’m guessing it’s okay if they do it.

Anyway, I had the same question as the obese, exercise-averse patient who drew such scorn in the 2008 article:  Is there any medical direction to be had from this data?

My physician’s answer was to tell me that these direct to consumer services are “really dangerous.”  He gave me the standard line about how all medical procedures, even minimally invasive ones, have associated risks. We should always justify gathering data in terms of those risks, at a population level. He cautioned me that going down the road of even looking at elevated inflammation markers can lead to uncomfortable, unnecessary, and ultimately dangerous procedures.

Thankfully, he didn’t call me fat or tell me to go get a gym membership.

This, in a nutshell is our reactive system of imprecision medicine.

This is also an example of our incredibly risk averse business of medicine, where sensible companies will segment and even destroy data to avoid the danger of accidentally discovering facts that they might be obligated to report or to act on.

This, besides the obvious profit motive, is why consumer electronics and retail outfits like Apple and Amazon are “muscling into healthcare.”

The void does desperately need to be filled, but I think it’s pretty terrible that the companies best poised to exploit the gap are the ones most ruthlessly focused on the bottom line, most extractive in their runaway capitalism, and who have histories of terrible practices around both labor and of privacy.

A happy ending, perhaps

I really do believe that there is an opportunity here: A chance to radically reshape the practice of medicine. I’m a genomics fanboy and a true believer in the power of data.

To be clear, the cure is not any magical app. The transformation will not be driven simply by encoding our data as XML, JSON, or some other format entirely. No specific variant of machine learning or artificial intelligence is going to un-stick this situation.

It’s not even blockchain.

The answer lies in a balanced approach, with physicians being willing to use data driven technologies to amplify their own senses, to focus their attention, to rapidly update their recommendations and practices, and to measure and adjust follow ups and follow throughs.

To bring it back to our obese patient above, consider the recent work on polygenic risk scores, particularly as they relate to cardiovascular health. A savvy and up-to-date physician would be well advised to look at the genetics of their patients – particularly those of us who don’t present as a perfect caricature of traditional risk-factors for heart disease.

I’ve written in the past about another physician who sized me up by eyeball and tried to reject my request for colorectal cancer screening, despite a family history, genetic predisposition, and other indications.  “You look good,” he said, “are you a runner?”

There is a saying that I keep hearing:  “Artificial Intelligence will not replace physicians.  However, physicians who use Artificial Intelligence will replace the ones who do not.”

The same is true for using all the data available. In my opinion, it is well past time to make that change.

I would love to hear what you folks think.

Thank you!

About 20 months ago, I left a fantastic job at the Broad Institute to strike out on my own as an independent consultant. At the time, I was nervous. I was pretty sure that I could manage the nuts and bolts of running a small business. I’ve got experience using spreadsheets to track potential customers and to remind me to follow up on invoices. I’ve managed projects, reviewed contracts, and picked up enough negotiation and other critical soft skills to get by.

The big question in my mind was this: Would people would still take me seriously when I wrote from a shared home office or a co-working space in Somerville rather than from a private office on the 11th floor of one of the biggest names in Kendall Square. That was a leap of faith for me. I honestly didn’t know.

Nearly two years in, I’m thrilled to report that it’s working out great.

All of this is because of the amazing professional community of friends, colleagues, vendors, customers, and collaborators that I’ve met and worked with over the years. You folks reading this post made this possible.

You, specifically. Thank you. I’m not going to list all your names, but I recently had a chance to make a picture out of some of your logos:

As Eric Lander frequently says when he speaks in public: “Wow!”

In case you’re wondering, I will probably have a “real job” (paycheck, office, boss) again sometime in the future. Here’s why:

I miss sharing in the mission. One of the hallmarks of a good consultant is that we leave once the need for specialized and time critical services has passed. That leaving is bittersweet. If I do my job right, I get to see client after client outgrow their need for me.

I also miss mentoring, building teams, and working on not just technical efficiency but also on culture, inclusion, fairness, access, and the quality of life. I can give little nudges to these things from the outside, but really making a difference requires time and focus that a consulting engagement usually doesn’t afford.

For all that, I’ve got no plans to rejoin the 9-5 crowd any time soon.

When I left the Broad, I made a deliberate decision to move away from my comfort zone. I didn’t just quit a job, I also moved away from what I already knew and towards what I know to be important in the future. That meant that I set aside perfectly good opportunities to tune up high performance computing systems, and instead spent a summer researching and writing a white paper about Blockchain. I demurred on cloud migrations and dug in to enhance my admittedly basic knowledge of effective, practical information security. I got facile with the language of governance and compliance, and started in on covered entities, HIPAA, and all that jazz.

My goal in all of this was to swim rapidly out of the research shallows, all the way out to the gnarly rapids where data, computing and information intersect with clinical care.

Forget 20 months, I’ve spent nearly 20 years working with genomic data. I want to see what’s holding us back from the long promised genomic medicine revolution, I want to find the very toughest problems, and I want to help solve them.

And really, the core of my gratitude is that I feel like I’m getting a chance to do that.

Thanks to all of you.


This is a personal story about workplace bias.

In my first management gig, between 2004 and 2013, I built an all male team.

“Keep the company mostly male” was never a goal. In fact, if anyone had said that kind of crap out loud – the whole team would have reacted with disgust. I’m pretty sure that we didn’t break any laws or even “best-practices.” We had the required nondiscrimination policies and we took our annual sexual harassment training seriously.

For all that, the numbers are unambiguous: The people we hired and retained for the technical team were almost all men. Since I had a major role in our recruiting, hiring, and workplace practices, I’ve got to own that.

Bias is harder to isolate and pin down than crimes like assault and harassment. Bias feels vague, which provides wiggle room for those of us who hold the power to make change, rather than excuses.

I’m writing this now because I wish that someone had pointed it out to me at the time.

The dangers of monoculture

I’m proud of my nine years at that company. By every metric that we used, I think that we did a really good job. We bootstrapped from four people to fourteen without taking any external investment. We made payroll every single month, launched three products, and did all sorts of cool stuff.

Still, I know that we could have done better – and I’m not just talking about the moral perspective here.

When recruiting, we tended to reach only within our existing network. We hired people that we already knew or had heard of rather than casting a wider net. That’s part of why we recapitulated the biases of our industry.

This also created an intellectual monoculture in which we were all pretty sure of our own superiority. For all that the team was broad-minded and incredibly creative – we were also stuck with a tiny slice of the intellectual landscape.

Year over year this siloing held us back. It made it all too easy to believe that we were the very best. From where I sit now, that comes across as immature. It’s the arrogance of a regional sports champion that has never gone to a national competition.

From the outside, I can see how provincial we were.

That attitude (plus always showing up with an all male team) certainly cost us customers over the years. The NY Times has a decent article on how all male sales teams are less competitive.

On the performance side, you don’t have to take my word for it. Read Why Diverse Teams Are Smarter from the Harvard Business Review. Look at the Mckinsey report on how more diverse leadership makes companies more profitable.

My very favorite study in this space says it right in the abstract:

Groups with out-group newcomers (i.e., diverse groups) reported less confidence in their performance and perceived their interactions as less effective, yet they performed better than groups with in-group newcomers (i.e., homogeneous groups). Moreover, performance gains were not due to newcomers bringing new ideas to the group discussion. Instead, the results demonstrate that the mere presence of socially distinct newcomers and the social concerns their presence stimulates among oldtimers motivates behavior that can convert affective pains into cognitive gains.

I know a man who is an influential leader at a well known organization. He scans the author lists of scientific papers before reading the abstracts. If he doesn’t recognize the names, he doesn’t bother to go further. He once told me why: “It saves time. If they were any good, I would already have heard of them.”


Nature vs Nurture

The few women who did choose to join the company tended to leave after a much shorter tenure than the men. That difference speaks to workplace culture. I have to own that too, since I was responsible for many of the team’s day to day practices.

With what I know now, I can see that I built a place where it was pretty easy for people like me to succeed. My guess is that the more different a person was from me in their work and life patterns, the harder they would find it to succeed in my organization.

That inattention sabotaged the few people who made it past the filters described above.

As above, there’s no malice required. I just wasn’t paying attention.

Stated baldly, it’s a pretty weak and inexperienced manager who can only manage people just like himself.

It matters

This isn’t all in the past.

Just this year, in 2018, I took flak from several long-term friends because my “political correctness” made it harder for us to organize a joint marketing opportunity.

It’s all too easy to make excuses. The reality is that the state of inclusion in our industry is an embarrassing and broken thing.

It is our job to fix it.

Stat News has published several articles lately that shine a spotlight on some of the most egregious behavior. One stark example is their coverage of the Party At Bio Not At Bio. In case you haven’t heard, sponsors paid to have their logos painted on nearly nude women who danced for the crowd’s amusement.

That’s not a “Mad Men” episode, it’s the state of biotech in 2018.

The sponsors, organizers, and attendees aren’t bad people – but they weren’t paying attention. Making sure that the environment was safe and inclusive didn’t make the list of priorities.

It takes work to overcome these systemic biases. Fortunately, there are resources available. This twitter list of 300 women in tech you should follow is up to 522 members. makes it laughable to use the excuse that there are no qualified women available for speaking engagements.

Broccoli in your teeth

It’s still hard to talk about this stuff. I hesitated a long time before writing this post, and still longer before hitting “publish.”

That hesitation is because it is awkward, uncomfortable and weird every single time that I take a customer or a business contact aside and privately point out that they’ve got an all-male team.

People get defensive, evasive, and occasionally even insulting and sanctimonious. They come back with “what about,” and even bring up examples from my own past where I didn’t live up to the ideals that I’m now pushing on them.

It’s important to power through that discomfort: A former colleague ran her group on the “must inform” principle. Whether it was broccoli in the teeth, a wardrobe malfunction, or something more significant – it was the team’s obligation to help each other.

The benefit was clear: Her team never showed up with junk in their teeth, or with easily correctible biases showing.

I wrote this because I wish that my mentors back in the day had said something to me.

Manufacturing improvements apply to HPC

The Strategy Board

My former colleagues at the Broad Institute recently published a marvelous case study. They describe, in a delightfully brisk and jargon-free way, some of the process improvements they used to radically increase the productivity of the genome sequencing pipeline.

This post is about bringing the benefits of their thinking to our high performance computing (HPC) systems.

The fundamental change was to modify the pipeline of work so that instead of each stage “pushing” to the next, stations would “pull” work when they were ready to receive it. This should be familiar to folks who have experience with Kanban. It also overlaps with both Lean and Agile management techniques. My favorite part of the paper is that they applied similar techniques to knowledge work – with similar gains.

The spare text of the manuscript really doesn’t do justice to what we called the “strategy board meeting.” By the time I started attending in 2014 it was a massive thing, with fifty to a hundred people gathering every Wednesday morning. It was standing room only in front of a huge floor-to-ceiling whiteboard covered with brightly colored tape, dry erase writing, and post-it notes. Many of the post-it notes had smaller stickers stuck on them!

Somehow, in an hour or less every week, we would manage to touch on every part of the operation – from blockers in the production pipeline through to experimental R&D.

My favorite part was that it was a living experiment. Some weeks we would arrive to find that the leadership team had completely re-jiggered some part of the board – or the entire thing. They would explain what they were trying to do and how they hoped we would use it, and then we would all give it a try together.

I really can’t explain better than the paper itself. It’s 100% worth the read.

The computational analysis pipeline

When I started attending those strategy board meetings in 2014, I was responsible for research computing. This included, among other things, the HPC systems that we used to turn the raw “reads” of DNA sequence into finished data products. This was prior to Broad’s shift to Google’s Cloud Platform, so all of this happened on a large but finite number of computers at a data center in downtown Boston.

At that time, “pull” had not really made its way into the computational side of the house. Once the sequencers finished writing their output files to disk, a series of automatic processes would submit jobs onto the compute cluster. It was a classic “push,” with the potential for a nearly infinite queue of Work In Progress. Classical thinking is that healthy queue is a good thing in HPC. It gives the scheduler lots of jobs to choose from, which means that you can keep utilization high.

Unfortunately, it can backfire.

One of the little approximations that we make with HPC schedulers is to give extra priority to jobs that have been waiting a long time to run. On this system, we gave one point of priority (a totally arbitrary number) for every hour that a job had been waiting. On lightly loaded systems, this smooths out small weirdnesses and prevents jobs from “starving.”

In this case, it blew up pretty badly.

At the time, there were three major steps in the genome analysis pipeline: Base calling, alignment, and variant calling.

In the summer of 2015, we accumulated enough jobs in the middle stage of the pipeline (alignment) that some jobs were waiting a really long time to run. This meant that they amassed massive amounts of extra priority in the scheduler. This extra priority was enough to put them in front of all of the jobs from the final stage of the pipeline.

We had enough work in the middle of the pipeline, that the final stage ran occasionally, if at all.

Unfortunately, it didn’t all tip over and catch fire at once. The pipeline was in a condition from which it was not going to recover without significant intervention, but it would still emit a sample from time to time.

As the paper describes, we were able to expedite particular critical samples – but that only made things worse. Not only did it increase the wait for the long-suffering jobs in the middle of the pipeline, but it also distracted the team with urgent but ultimately disruptive and non-strategic work.


One critical realization was that in order for things to work, the HPC team needed to understand the genomic production pipeline. From a system administration perspective, we had high utilization on the system, jobs were finishing, and so on. It was all too easy to push off complaints about slow turnaround time on samples as just more unreasonable demands from an insatiable community of power-users.

Once we all got in front of the same board and saw ourselves as part of the same large production pipeline, things started to click.

A bitter pill

Once we knew what was going on, it was clear that we had to drain that backlog before things were going to get better. It was a hard decision because it meant that we had to make an explicit choice to deliberately slow input from the sequencers. We also had to choose to rate-limit output from variant calling.

Once we adopted a practice of titrating work into the system only at sustainable levels, we were able to begin to call our shots. We measured performance, made predictions, hit those predictions, fixed problems that had been previously invisible, and added compute and storage resources as appropriate. It took months to finish digging out of that backlog, and I think that we all learned a lot along the way.

All of this also gave real energy to Broad’s push to use Google’s cloud for compute and data storage. That has been transformational on a number of fronts, since it turns a hardware constraint into a money constraint. Once we were cloud-based we could choose to buy our way out of a backlog, which is vastly more palatable than telling world-leading scientists to wait months for their data.

Seriously, if your organization is made out of human beings, read their paper. It’s worth your time, even if you’re in HPC.

Surfing the hype curve

I’ve spent most of my career on the uncomfortable edge of technology. This meant that I was often the one who got to deal with gear that was being pushed into production just a little bit too early, just a little bit too fast, and just a little bit too aggressively for everything to go smoothly.

This has left me more than a little bit jaded on marketing hype.

Not too long ago I posted a snarky rejoinder on a LinkedIn thread. I said that I had a startup using something called “on-chain AI,” and that we were going to “disrupt the nutraceutical industry.”

I got direct messages from serious sounding people asking if there was still time to get in early on funding me.

Not long after that, a local tech group put out a call for lightning talk abstracts. I went out on a limb and submitted this:

Quantum AI Machine Learning Blockchains on the IoT Cloud Data Ocean: Turning Hype Into Reality

It's easy to get distracted and confused by the hype that surrounds new computing and data storage technologies. This talk will offer working definitions and brutally practical assessments of the maturity of all of the buzzwords in the title.

Somewhat to my horror, they accepted it.

Here are the slides. I would love to hear your thoughts.

Bio-IT World

We’re back around to one of my favorite events of the Boston biotech year, The Bio-IT World Expo.

This conference has been host to a bunch of critical conversations for me. My favorite example happened in 2004. That was the year that the founders of BioTeam and I stepped away from the sessions and the exhibit floor, sat on benches in the stairwells of the Heinz convention center, and worked out the details of how I would become their very first employee.

I don’t think that any of us could have predicted that, 14 years later, we would be hosting a two hour panel in the main auditorium to close out and wrap up the conference. Last year’s version was tremendous fun. I’m super excited to get to moderate it again.

We’ve made a few adjustments this year to make the session even more interactive and fast moving. At the same time, we’re keeping the throwable microphone, dynamic and emerging topic list, and the online question submission / topic tracking system.

The panelists brainstormed up an incredible list of topics:

  • Team culture, keeping it healthy
  • Engineering for nation-state scale projects
  • Identity management in federated environments
  • The changing face of the end-user scientific computing environment, specifically notebook style analysis environments
  • Rapid prototyping of data-driven products
  • Diversity – how, specifically do we intend to empower and amplify emerging voices in our community?
  • What does it take to “validate” a process that was born on a research HPC environment?
  • The maturation of cloud-native, serverless architectures and its uncomfortable collision with current information security and governance processes
  • Data lakes, warehouses, marts, ecosystems, commons, biospheres, bases, gravity, movement, and so on and on
  • Notes from the field as machine learning and AI settle in as mature and productive tools in the kit
  • Emerging technologies like blockchain and how to separate the hype from the reality
  • … and many more …

If you have questions, topics, opinions, or suggestions – please write me or any of the panelists a note.

I’m looking forward to seeing many of you there.


Over the past year, I’ve had the privilege to serve as the chair of a working group (computing) for the GP-Write project. I’m spending today at GP-Write’s annual meeting in Boston.

GP-Write is a highly international, rapidly evolving collaboration with a goal of rapidly advancing the technology and ethical / legal framework necessary for forward engineering of complex genomes. I’m particularly proud of the fact that the very first working group to present is, “Ethical, Legal, Social Implications.” It’s nice, for once, to see the question of what we should do discussed prior to all the excitement about what we can do.

My (brief) slides are below.

Data driven health decisions

I just had a personal experience with how timely, personal measurements can drive better health and lifestyle decisions.

Unfortunately, it wasn’t related to any of the times that I’ve been genotyped, nor was it in the context of care by any physician. In fact, I had to cross state lines in order to get it done.

More on that later.

The punch line, for the curious, is that I have elevated levels of mercury in my system, and I should probably eat a bit lower on the food chain when I order sushi.

Genomics fanboy

I’ve been a genomics fanboy for years. I enrolled in 23 and me when it first came out. I did the exome add-on that they offered briefly in 2012. I signed up with the Personal Genome Project around that time, and one of the exomes you can download from their site is mine. I drove an hour and spat in a tube for the Coriell Personalized Medicine Collaborative.

Coriell has been the most satisfying for me personally, since they occasionally email a PDF of a manuscript that is based on analysis using data derived from my (and many other’s) saliva. For me, at least, getting to read science papers and think, “I helped!” is much more motivating than cryptocurrency based micropayments.

While it’s all been fun and interesting, I haven’t learned very much that was terribly actionable. Without putting too fine a point on it, I have basically re-verified that I don’t have any of the major genomic disorders that would have already shown up by middle age. My standard line describing what I learned is I’m most likely male, almost certainly of northern European descent, likely brown hair, likely brown eyes, etc.

A question of focus

One way this shows up for me is that I don’t really know where to focus my health and lifestyle efforts. Sight unseen, one might tell a person like me that I should work out a little more, mix it up with cardeo and weight bearing exercise, eat a mostly vegetarian diet, don’t smoke, drink in moderation if at all, maintain a regular sleep schedule, use sunblock, floss, don’t sit too long at work, meditate, never read re-tweets, practice test driven development, etc, etc, etc.

None of it appeals and more or less because I know that all this advice is generic. I.e: It doesn’t really apply to me. I’m pretty healthy, so who cares, right?

On the opposite side, I’ve written before about my frustrations in convincing my physicians to screen me for colorectal cancer. I have a family history on both sides, genetic markers, and a medical history that all point in the same direction: Elevated risk. The current state of clinical practice is that men don’t need screening before age 50. I’ve been getting screened since my late 20’s, and I persist in thinking that it’s a really good idea. This is one of those cancers that is easily treatable with early detection and lethal without it.

So there we have it: Advice is either so generic that I ignore it, or else when I do have actionable information it’s a challenge to convince my physician to act on it.

Personalized bloodwork

Enter Arivale. They are a relatively recent addition in the direct to consumer health and lifestyle offerings that are cropping up this year. I heard about them through professional connections (thanks Dave!), and I’ve been excitedly waiting for them to offer services in Massachusetts.

The Arivale process involves a battery of bloodwork, genetic testing, and a gut microbiome (which is a novel experience if you haven’t provided a laboratory with a stool sample before). They combine this with coaching from people trained in nutrition, genetic counseling, and behavioral modification.

Because of the niceties of paying for lab work, I had to leave Massachusetts in order to reach a lab who could actually accept my money to draw the blood. Bright and early on a morning in the middle of February, I made a pre-breakfast, pre-coffee commute into New Hampshire to be stuck with needles.

Let me pause and say that again: Our health care system is so screwed up that I had to cross state lines to get bog-standard bloodwork done, entirely because I was paying out of pocket for it.

I also filled out a battery of health and family history questionnaires, as well as some about personality and lifestyle.

Show me the data

A couple of weeks later, I got an email and logged into a slick web dashboard. I went ahead and did the integration with my Fitbit account. I disabled GPS location sharing but enabled the rest. Let’s hear it for granular access control. Because Fitbit connects to my wireless scale, my Arivale coach was suddenly able to access five years of weight data on top of the four years of info on my pulse, sleep, and walking habits that my Fitbit devices have accumulated.

Let me pause and say that again: I logged into a slick web dashboard and integrated years worth of data about myself in the context of a new battery of lab tests. At no point did I have to write down my previous physician’s FAX number on a piece of paper.

It felt normal and ordinary, because I’m used to these integrations everywhere except health care. I do this sort of thing with my bank, my utilities, my news feed, and all sorts of other places.

That is a different rant, but come on!



Anyway, I logged in and saw (among other things), this:

It honestly gave me pause. I’m pretty robustly healthy. I don’t expect to see any of my biological metrics “in the red,” but there it was.

So I did a quick Google search, top hit, I feel lucky:

A bit of refinement:

Which led me to look at my last few Grubhub orders.

Yeah, every time I order, I bolt on that mackerel. That’s for me. That’s my treat. It’s worth noting that February 15 was the night before I made that hungry, grouchy drive. I know that mercury accumulates in tissue and lingers there over time, your milage may vary, but it’s a pretty clear signal in my book.

And it showed up in my lab work.


So there you have it. All of a sudden, I’ve picked something actionable to do for my health – out of the incredible variety of good advice at my fingertips. Because, well:

Converged IT and the Cloud

I promised that I would post a summary from our closing panel at the Converged IT and the Cloud thread at the Molecular Medicine Tri-Conference.

Unfortunately, I was having so much fun in the session itself that I didn’t take any notes at all. Please forgive errors and omissions. My slides are here, but they’re the very least part of the conversation.

I opened up the session with the question that my friend Carolyn posted in the comments of the last post: “What are the biggest barriers to immunotherapy becoming translational (FDA, funding limits, enrollees in clinical trials)? How can patients best support future immunotherapy developments?”.

It sobered the audience considerably, especially when I pointed out that her interest is as a current patient of the system that we all acknowledge has tons of room for improvement.

My point in starting with that question was to move the conversation up a level from IT people talking about IT stuff – and to provide both motivation and urgency. It is very unlikely that a session on “converged IT and the cloud,” would be able to answer Carolyn’s question. That said, we would be remiss to sit around talking about network speeds and feeds, regulatory frameworks, costs per gigabyte, and other technical details without ever engaging with the high level “why” that drives our industry.

Each of the four panelists prepared a brief summary on a specific topic:

Jonathan Sheffi (@sheffi) is the Product Manager for Genomics and Life Sciences within Google Cloud. He spoke about the convergence that he sees in data structures and standards as customers bring different data types like health information, outcomes data, and so on to the “same” cloud. This was pretty exciting to me – since it is the infrastructure groundwork that will support some of the things we’ve been saying about collaboration and integration in the cloud.

Aaron Gardner is with Bioteam, and shared an absolutely whirlwind review of machine learning and AI for our field. The coolest part, to me, was the idea of AI/ML as a de-noising tool. The hope is that this will allow us to take unwieldy volumes of data and reduce them to only contain the necessary level of complexity for a certain task. It took me back to a dimly remembered time when I would talk about “Shannon Information Content” and similar concepts.

I first heard Saira Kazmi speak at the 2017 Bio-IT World, when she was still with the Jackson Laboratory. She had earned a reputation as Jax’s “queen of metadata.” She combined a handful of deceptively simple techniques with an impressively diplomatic tenacity to create a sort of ad-hoc data lake – without ever pausing to go through the most painful parts of the data lake process. Instead they chose to archive first, scrape file headers into a JSON format and stuff it into a NoSQL database, and (my favorite) stored checksums of large primary data files in a database to identify duplicates and support provenance tracking.

Finally, we had ‏Annerose Berndt (@AnneroseBerndt) – who has just finished standing up a genome sequencing center to serve the UPMC hospitals. I asked her to hold forth a bit on security, compliance, quality systems, and other absolutely necessary bits of process discipline.

We shared a wide-ranging and illuminating conversation building on these topics. It was a blast.

As I said from the stage: I really cannot believe that it’s somehow part of my job to have conversations like this, with people of this caliber. How cool!