Author: cdwan

A biased sampling

On a whim the other day, I scraped the “portfolio” page on the websites of three of the large venture capital firms in Kendall Square (Atlas Venture, Flagship Pioneering, and Third Rock Ventures) to generate a list of 162 biotech companies.

The first thing I noticed was that the company names seem, disproportionately, to start with the letter ‘A.’ If company names were distributed like English words, the most common starting letter would be ‘T’. Instead, for whatever reason, we see a skew towards A, C, and S.

If we wanted to, we could come up with some sort of an explanation for this phenomenon. My personal bet is that there is a benefit to being at the top of the stack in competitive evaluations for funding. A famous study of Israeli parole boards tracked judges becoming steadily less merciful as they got hungry and tired. I’m willing to bet that venture capitalists exhibit similar behavior. If that’s true, and if people (like me) tend to sort things alphabetically, then we would expect the list to be enriched for the likes of Acceleron, Afferent, Agios, and Annovation.

Conversations about bias in the workplace can be stressful. Even bringing up the subject can feel like an accusation and incite a rush to judgement. We get so caught up in what it means and what we should do, that we lose sight of what is actually there. The rest of this post explores gender ratios in biotech leadership. I encourage you to relax away from judgement for a moment and merely consider the numbers as a sort of intellectual curiosity – as if we were talking about lexical anomalies .

Founder Effects

I went through the list in alphabetical order, looking up the people who got credit for “founding” each company. It turns out that “founder” is not terribly well defined. I included all of:

  • Whoever Crunchbase said was the founder – these turn out to mostly be people from the venture firms
  • Everybody mentioned as a founder on the company website
  • Anybody who got a mention as a “Founding Whatever” in launch announcements in industry news sources like Fierce Biotech, or Xconomy.

Then I added a column and tagged each person according to a guess at their gender. I took gendered names and pronouns (he / her) at face value. In cases where I had any doubt, I dug around on the web until I found a bio that used a gendered pronoun. I didn’t find even one person with nonbinary pronouns or presentation in their professional persona.

With a couple days work, I got through the first 67 companies (all the way to “Fulcrum”). In those companies, I found 190 people listed as founders: 177 men and 13 women (7%).

Remember, no judgement, no explanations. We’re just counting. In this case, we’re double counting. We have some incredibly prolific founders, like Noubar Afeyan of Flagship Pioneering who shows up in 9 of these 67 companies.

The plot below spreads out those gender numbers by year. It suffers a bit from the small sample size, but – if I squint my eyes – I can begin to imagine a couple of trends.

Leadership

Out of my list of 67 companies, 47 of them are still doing business under their original name (including 24 IPOs). I went to their websites and scraped the names and titles out of the pages dedicated to the leadership teams, the boards of director, and the scientific advisors.

I took some liberties with the titles. Even though Nick Leschly of Bluebird Bio is listed as “Chief Bluebird,” I still lumped him in with the CEOs. For people with multiple titles, I selected the highest ranking one. I also exercised a bit of judgement and combined different versions of what seem objectively to be the same role (Biology vs. Biological Sciences, for example). I counted 16 distinct ranks (chief through assistant) and 78 distinct areas (science, law, HR, and so on).

Out of 347 people listed on the “leadership” pages, 244 are men and 103 are women (30%).

Gender distribution varies widely with area of responsibility. I didn’t find any women in leadership roles for data or “tech,” nor did I find very many men leading human resources or project / portfolio teams.

The Board

Boards of Directors vary widely from company to company. Most boards are made up of representatives of the firms who have made significant investments, a senior executive or two from within the company, and a selection of industry veterans.

Out of the 286 board members in my little study, 237 of them were men and 47 were women (17%) – which is very similar to the gender representation I found among CEOs. Out of 35 “chair” positions, I found 33 men and two women (6%).

The numbers were similar for the 153 people listed on the scientific and clinical advisory boards. Out of that total, 139 were men and 14 were women (9%).

So What?

As I said at the beginning, these numbers are, by themselves, neither good nor bad. They are simply a snapshot of a tiny slice of an incredibly dynamic industry. For all that, gender representations like 7%, 9%, 17%, and even 30% do seem to beg the question, what’s going on here? This is pretty far from an unbiased selection out of the human population. Scientists are constantly generating hypotheses and ideas for how to test them – I feel that some of that intellectual rigor might be valuable here.

My plan is to continue to parse, sort, sift, and learn. I’ve got 100 more company websites to scrape, and a few more low-hanging analyses to run. I expect to generate at least a couple more blog posts from this data, which will hopefully spark some downstream conversations.

At the very least, I hope to answer the question of whether or not the women leaders are mostly hanging out in the companies towards the latter half of the alphabet.

As always, I’m deeply interested in your thoughts.

Correcting for Bias

This is the second in a series of three or four posts.

The first one, diverse teams perform better, explored some of the research on the measurable performance advantages that diverse teams have over monocultures. This and future posts will share real world examples about measuring and correcting for the bias that leads to a lack of diversity. I hope to build a toolkit of useful techniques that can reduce bias in recruiting, hiring, promotion, and retention.

I believe that my professional communities – high performance computing, biotech, data science, genomics, and the adjacent specialties – have a bias problem. Eventually, I hope to make that argument in some detail. Posts like this are the necessary groundwork. I hope you will bear with me.

Hang In There

Unfortunately, a lot of readers are about to check out on me.

Even though this post is absolutely not about subjective personal labels like “racist” or “sexist,” conversations about the specifics of measuring bias seem to land for some folks as accusations of bigotry. I’ve had more than a few friends and colleagues turn belligerent and hostile at this point in the conversation. I’ve been told that even wanting to talk about diversity beyond a high and fluffy level makes me sound like the “diversity police.”

Apparently nobody likes the diversity police.

Bluntly, if a person can’t bear to even talk about how one might measure and compensate for bias, it’s a pretty safe bet that their organization is rife with it.

Still, I’m not making personal accusations or calling anybody nasty names. Name calling is ineffective and misguided. It’s a waste of time.

Anne-Marie Slaughter said, “systemic bias does not require a conspiracy of men.” The Harvard Business Review built on this in their 2016 piece Designing a bias-free Organization: “…Rather than run more workshops or try to eradicate the biases that cause discrimination, companies need to redesign their processes to prevent biased choices in the first place.” Effective managers and leaders should focus on compensating for bias, rather than “naming and shaming.”

Without the power to cut funding, terminate employment, or otherwise impose consequences – naming and shaming just enrages powerful, biased people. That’s never a good scene.

Still, I acknowledge that we’re on uncomfortable ground. In a good-faith effort to keep my HPC, biotech, and genomics friends engaged, I will start off with an absolutely non-tech-centric example in which I cannot possibly be talking about them.

Let’s go to the symphony.

Curtain Call

It sounds strange to my modern ears, but symphony orchestras used to be predominantly male. In the 1970’s, women accounted for only 6% of the players in top tier ensembles. By the 1990’s that number had risen to 21%. One major factor in that increase was the practice of “blind” auditions, where the player is hidden from the view of the judges behind a curtain or a screen.

According to the 2000 study Orchestrating Impartiality: The Impact of “Blind” Auditions on Female Musicians: “Using a screen to conceal candidates from the jury during preliminary auditions increased the likelihood that a female musician would advance to the next round by 11 percentage points. During the final round, “blind” auditions increased the likelihood of female musicians being selected by 30%.

No quotas, no diversity police, no name calling. Just a curtain.

Let’s look at how that worked in practice:

In 1980, Abbie Conant auditioned for principal trombone of the Munich Philharmonic. The orchestra did not usually do blind auditions, but in this case one of the applicants was a relative of the decision maker, so they decided to guard against any appearance of favoritism.

Oops.

By all accounts, Conant blew the doors off that audition, performing well enough that one judge leapt to his feet, saying “there’s our trombone.” On finding out who “their trombone” was, the director was nonplussed. He demoted her to second chair, paid her less than her peers, and spent years trying to get her fired. Conant spent more than a decade waging a rather epic lawsuit against the orchestra while at the same time building an international reputation as a soloist and teacher.

The Berlin Philharmonic didn’t do another blind audition for nearly 20 years.

Mere Numbers

I love the simplicity of the study above. They literally just counted the number of men vs. women who got hired before and after a process change. As the HBR article says, “Marketers have been running A/B tests for a long time, measuring what works and what doesn’t. HR departments should be doing the same.

Of course, when we start trying to count, we find out that some job applicants are constructing their own blind-audition screens. People replace their names with initials and otherwise remove the signals that biased organizations use as unconscious cues.

What can I say? People are smart.

I assume that everybody has heard about Jo Handelsman’s straightforward and awesome study “Science faculty’s subtle gender biases favor male students.” The researchers asked faculty members to evaluate resumes for a notional lab manager that they might hire. The resumes were identical except for the gender of the applicants name.

“Faculty participants rated the male applicant as significantly more competent and hireable than the (identical) female applicant. These participants also selected a higher starting salary and offered more career mentoring to the male applicant.”

One really important result from this study is that the gender of the faculty member didn’t affect the result. “Female and male faculty were equally likely to exhibit bias against the female student.”

So spare me the line about how some particular decision can’t be biased because a woman or a person of color participated in it. That’s not what the data says.

Excuses

Let’s get real for a minute: Very, very few people in any of the industries where I make a living have adopted anything even as basic as the blind audition.

We’re not even looking. We’re not counting. We’re not doing the basic work, and yet we make excuse after excuse.

We’re damn sure not doing the rigorous statistics that would be required to back up arguments about pipelines, cost-benefits of delay vs. performance, and so on. Future posts will get specific about what how we might do that.

For the moment, just rest with this question:

If your company had a “principal” position – trombone or otherwise – and the child of a powerful board member was applying for the job, what would you do to guard against the appearance of bias?

Now do it anyway.

Just don’t be surprised when the person who blows the doors off your interview process isn’t exactly who you were expecting.

Diverse teams perform better

In my next three or four posts, I’m hoping to lay out a story that goes something like this:

  • Diverse teams outperform monocultures.
  • Biases in hiring and retention mean missing out on that performance boost.
  • Biased systems self perpetuate. It takes action to break the cycle.
  • Here are the most effective actions to take.

My motivation in writing this is that I’m speaking at the November meeting of the Rosalind Franklin Society. My working title is “Advocacy in the enterprise: What works, what doesn’t.” In that talk, I plan to share stories about some times that I’ve attempted to cause organizations to be more diverse and inclusive.

My hope is that writing these posts will force me to check my references and get my thoughts in order. If it turns out that they also make a decent intro for other people – that’s great – but this is mostly a thought exercise for me.

Onward to the first point: Do diverse teams really outperform monocultures?

Direct Experimentation

Many researchers have measured the impact of diversity on team performance. I’ve written before about my very favorite experiment along these lines. Researchers at Northwestern University and Stanford took members of various fraternities and sororities and had them work in small groups to solve a murder mystery. The control groups were homogenous – all pulled from the same greek house. The test groups were spiked with an “out-group” member, matched for gender but from a competing fraternity or sorority.

As an aside, this is just about the most trivial difference that can possibly be measured – a Planck constant for diversity, if you will.

The result: “Groups with out-group newcomers (i.e., diverse groups) reported less confidence in their performance and perceived their interactions as less effective, yet they performed better than groups with in-group newcomers

This work appears as a 2003 conference presentation titled “The Pain is Worth the Gain“, and then as a 2008 publication in the Personality and Social Psychology Bulletin titled “is the Pain Worth the Gain? The Advantages and Liabilities of Agreeing With Socially Distinct Newcomers.” The Kellogg School of Management at Northwestern published a nice (and open access) summary in 2010 titled “Better Decisions through Diversity.”

The abstract in the journal observes: “Performance gains were not due to newcomers bringing new ideas to the group discussion. Instead, the results demonstrate that the mere presence of socially distinct newcomers and the social concerns their presence stimulates among oldtimers motivates behavior that can convert affective pains into cognitive gains.”

Another team of researchers had people invest money in simulated stock markets in Southeast Asia and again in North America. The paper “Ethnic diversity deflates price bubbles” appeared in the Proceedings of the National Academy of Sciences in 2014. The authors found that “market prices fit true values 58% better in diverse markets. In homogenous markets, overpricing is higher and traders’ errors are more correlated than in diverse markets .. our findings suggest that diversity facilitates friction that enhances deliberation and upends conformity.”

The results point to a common mechanism: Diverse teams, and also individuals in diverse environments, work harder to avoid groupthink, which makes them more adept at solving problems.

So it works in the lab – but what about the real world?

Real World Results

There is also a raft of literature analyzing real world results and reading tea leaves to find the secret of success.

If you’re an academic who wants your work to be read and cited, you might be interested in the 2018 piece in Nature Communications, “The preeminence of ethnic diversity in scientific collaboration.” The authors analyzed millions of papers to look at the effect of ethnicity, discipline, gender, affiliation, and academic age on academic impact. The remarkable finding: “While these findings do not imply causation, it is still suggestive that one can largely predict scientific impact based solely on average ethnic diversity.”

The story is the same on the corporate side of the world. The 2012 paper “Gender diversity within R&D teams: Its impact on radicalness of innovation” in the journal Innovation, Organization, and Management studied more than 4,000 Spanish companies. “The results indicate that gender diversity is positively related to radical innovation.” The 2013 paper “Cultural Diversity, Innovation, and Entrepreneurship: Firm-level Evidence from London” in Economic Geography found similar results across thousands of British companies.

Okay, okay, but has anybody done a really wonky deep dive on how this works?

A Theoretical Approach

There’s a very old joke about a mathematician who asks “sure, your experiment worked, but what does the theory say?” All the experimental data seems to point in the same direction, and fortunately the theory agrees: In 2004, Lu Hong and Scott E. Page published a paper in the Proceedings of the National Academy of Science titled “Groups of diverse problem solvers can outperform groups of high-ability problem solvers.

The title pretty much gives away the punch line.

Let’s ask some consultants what they think.

Arguments from Authority

In 2015, Mckinsey published one of their trend-setting reports, which says: “More diverse companies, we believe, are better able to win top talent and improve their customer orientation, employee satisfaction, and decision making, and all that leads to a virtuous cycle of increasing returns. This in turn suggests that other kinds of diversity—for example, in age, sexual orientation, and experience (such as a global mind-set and cultural fluency)—are also likely to bring some level of competitive advantage for companies that can attract and retain such diverse talent.

If you’re not into consulting firms, perhaps you’ll listen to a four star general and commander of the US special forces instead. Allow me to commend General Stanley McChrystal’s book “Team of Teams.” While not explicitly about diversity, one of the book’s major themes is that breaking down group identities and avoiding groupthink is critical to innovation and high performance.

Put simply, the argument that building a diverse team is not worth the effort flies in the face of decades of research and experience, from multiple fields, all over the world. Failing to do the work (and it is work, more on that below) makes your company less innovative, your work less impactful, and your investors less profitable.

No Free Lunch

Let’s be blunt: Nobody is saying that this is easy.

When I let google autocomplete from a prompt of “diverse teams,” “Feel less comfortable” is higher in the list than “make better decisions.” That was a lesson learned from many/most of the studies above. The benefits of diversity come at a cost. It takes effort to recruit and retain a diverse team. The benefit is that such teams play at a higher level.

We will also encounter active resistance along the way. Let’s talk about that for a moment.

The Resistance

There is a particular sort of nay-sayer who protests that any effort to increase diversity is all about lowering standards and letting unqualified people slide. They’re the ones who insist on talking about diversity efforts as if quotas based on race or gender are the only tool in the toolbox. It’s well known that raw quotas are a pretty terrible practice, as spelled out in the Harvard Business Review piece from 2016, “Why Diversity Programs Fail.

An earlier HBR piece, from 2013, “The Trouble With Gender Targets” describes the phenomenon of resistance well: “Targets are like a red flag to a bull for these men and women. They experience it as an affront to their deeply held meritocratic principles.”

Did you catch the “men and women” in that quote? That’s going to be a big theme in the next post.

This behavior isn’t rational, but once people are “enraged” and “affronted,” they don’t tend to spend much time with the literature. Even when research, experience, and professional consultant advice all point in the same direction, resistance and confusion persist.

So we have work to do.

Fixing Problems at the Root

I’m a consultant. I solve problems for hire.

As part of my practice, I try to go beyond the superficial technical problems, and also address the underlying social pathologies. Done well, this can radically empower and accelerate teams. It’s also much more challenging and fun than merely speeding up yet another computer program.

For example: I often get brought in to help with some supposedly thorny and opaque technical problem, only to find that the team already knows the solution. In this case, finding and bridging the gap that prevented leadership from hearing, understanding, and trusting their team’s own expertise is the subtle work that leads to lasting change and improved performance. There are plenty of other “people” challenges that can hold teams back, and bias is a really common one.

As organizational pathologies go, bias is pretty easy to diagnose. If you’re paying attention, you can literally see it when you walk into the room.

That will be the meat of the next post: How to detect and measure bias.

For now, I will close with an invitation: I am -very- interested in your thoughts on this topic. If you have thoughts, please comment below, shoot me an email, message me on Twitter, or whatever. I’m easy to find.

Should I take aspirin?

Earlier this year, I purchased Dante Labs “whole genome Z” service, which includes 30x sequencing of every base pair of my DNA, plus an additional 100x on the protein coding regions.

I mostly did this for the raw data. I work in this space and I like to tinker. Using my very own data shields me from any concerns about privacy, consent, and appropriate usage. It’s also super useful professionally: I’m an advisor to folks who are responsible for health and genetic data from hundreds of thousands of patients and research participants. I find that handling my very own information has a way of clarifying my thinking around privacy, consent, and other topics related to good data stewardship.

My experience thus far with personalized genomics is that there’s not a huge amount of diagnostic or clinical value there unless you’re dealing with cancer, the risk of inherited conditions, or a challenging undiagnosed disease. I’m in my 40’s now. I would already be aware if I carried any of the most readily diagnosed genetic disorders.

The joke is that 23andme told me that I’m probably male, most likely of northern European ancestry, sorta average height, probably brown eyes, likely brown hair … you know … all things you could tell at a glance by looking at me.

Aspirin

My expectations were low when Dante Labs sent a note inviting me to check out their new “wellness and lifestyle” report. I was 100% surprised to see that the first item on the list was a “high risk” for “Aspirin.” That’s new for me, and I was sort of hoping that the new data had unearthed some heretofore un-observed risk factor.

Spoiler alert: It had not.

I clicked in to see details, and got this rather opaque wall of generated text, which had obviously never been edited by a human. Maybe that’s what they meant when they claimed to be revolutionary in their use of artificial intelligence in these reports.

I didn’t know the word “urticaria,” so I googled it. It’s hives: red, raised, bumpy, itchy skin. Millions of people have it, it’s irritating, completely self diagnosable, and eminently self treatable.

I got curious. I take daily low dose aspirin because I’ve read about a constellation of positive effects. The question is, should I stop?

The really simple clarifying question would have been “do you break out in hives when you take aspirin?” The answer to that is “no,” but bear with me, I’m telling a story here.

Nerding out on genetics

The first question with any kind of genetic diagnosis is whether the data is correct. Fortunately, I’ve been a genomics fanboy for a while, and I was able to crack open my raw data from 23andme. Yes indeed, at position 179,220,638 on chromosome 5 I am heterozygous – with an “A” on one of the copies and a “C” on the other.

> grep rs730012 genome_Christopher_Dwan_v1_v2_v3_Full_20170926071925.txt  
> rs730012 .  5 . 179220638 . AC

After verifying data quality, the next question is “how sure are we about this?” There is a lot of truly tenuous associative research out there, and a naive approach is almost certain to lead you astray.

I took a look at ClinVar, a remarkably powerful and well curated database of the clinically actionable variants. It said that yes indeed, there is an association between this variant and an allergic reaction to aspirin.

I skimmed the abstracts of the three publications, and while it’s a clear association, it’s not the strongest of signals. The three studies were pretty small, with case and control groups of around 100 people each. Importantly, all three studies asked the question “is this genetic variant more common in people who break out when they take aspirin,” rather than asking the deeper and much more challenging question of -why- such people might have such a reaction.

Short version: It turns out that the reaction to aspirin is more common among people with a “C” at that position at either or both copies of your chromosome 5. In industry parlance, I’ve got one copy of the “risk variant.”

One really important question when looking at this sort of thing is to determine how rare this genetic variant is. My friends at SNPEdia have done a great job of parsing a bunch of different resources to show the answer. In this case, the answer is that among caucasians, my genotype is actually the most common type. It’s pretty rare in other populations, but for white folks like me – most of us have either one or two copies of the risk variant.

So what you have here is a super common genotype that’s associated with a minor, self diagnosable and self-treatable condition.

So should I stop with my daily aspirin? The answer is probably not.

Other genes, other diseases

SNPedia is my go-to for quick reads on genes and variants. I did a little poking around on aspirin and found a ton of interesting stuff. As just a single example, we’ve got Rs6983267 over on Chromosome 8.

Don’t look at me like that. All interesting people have at least one odd hobby where the nerd-o-meter reads “extreme.” This is one of mine.

There’s a study of more than 3,000 caucasians with the exact kind of cancer that killed my grandfather, and I have the risk variant (‘GT’) here too. The middle red box on the right, next to ‘GT’ says “aspirin reduces the risk of colorectal cancer.”

Sadly, this one didn’t make the cut for Dante Labs. According to them, I’m 100% free of colon cancer markers.

So what’s the point?

The point is this: This stuff is complicated and it is important. I’ve written before about how the risk averse culture in American medicine holds us back. This is a counterexample. A naive person might have looked at that report and said “oh hey, I’ll stop taking aspirin, I’ve got a risk factor.” The simple fact is that the risk is for a minor, eminently detectable condition, and there’s good data to suggest that taking aspirin (specifically, for me) reduces my risk of dying painfully of a kind of cancer that runs in my family.

I don’t want the FDA to shut Dante Labs down, but I do want Dante to get their act together and stop just yammering about “AI.”

A side note

In the course of writing and editing this, I have noticed a confounding factor.

Over the past couple of years, I -have- in fact noticed a couple of reddish patches on my torso. I’ve treated them with antifungals, but it didn’t have an effect. They don’t itch and they aren’t terribly visible, so I don’t worry about them.

So, just now, here at the end, I’m thinking that I might cut out the aspirin for a month and see if those patches fade. In that case, I will have learned something. After that, I will 100% resume the aspirin, because duh.

Time To Have An Idea

What are the most important pieces of professional advice you’ve ever received?

I remember one of mine clearly: It was in late 2004, and my colleague Bill told me that it was “time to have an idea.”

I had hired in as the first employee at a small consulting company in early summer. The founders had been handing me pre-specified projects for a few months. These early projects appeared on my desk ready-made, with the Statement Of Work (SOW) already written, the scope negotiated, and the customer interested mostly in when the resource (me) could be scheduled.

Now it was fall, and it was time for me step up my game and spec my own work. I realize now that they were tired of carrying me.

In the spirit of “learn by doing,” they dumped me on the phone with a prospective customer, the IT department for Stanford.

That, in itself, was an incredible opportunity.

Rookies look down on “sales.” I know now about the grinding work that leads to calls like that. The series of interactions with gatekeepers whose only options are to say “no” or else to continue the conversation. The people on the other end of this call could say “yes.”

Also, their “no” would end the conversation entirely.

At the time, I wasn’t even savvy enough to be nervous.

I know now that we practiced a variant of “spin” selling, which focuses on understanding the customer’s pain points as the first part of the conversation. It’s not “our floor cleaning machine is great,” but rather “do you have any irritation connected with dirty floors?” Our model was characterized by a triangle of needs, features, and benefits. If your offer (the features) addresses the customer’s needs, and if the benefits to them (the perceived value) are greater than the cost, the deal pretty much closes itself.

I was prepped with the need: Stanford had recently done an audit and determined that they employed more people in computer support roles outside of IT than within it. Further, they had found at least 20 instances of an on-campus closet with a ton or two of recently added cooling to support a feral compute environment.

IT needed to justify their continued investment in scientific computing. The user community was routing around them.

The conversation went back and forth for about 20 minutes, introducing ourselves, re-hashing the situation, doing the human part of the meeting. Somewhere around that 20 minute mark, Bill, my colleague / boss / and co-owner of the company popped into the group chat:

Time to have an idea, Dwan.

I was stumped. What did he mean?

Conversation continued, my teammates carrying me. Bill pinged again.

Dwan, write yourself a job.

So I went for it. Broke into the conversation and suggested that maybe it would help to have me … um … fly to California to spend a week with them? Yes. Having me onsite was totally part of it.

They were curious but unconvinced. What did I have in mind?

Maybe the need was that folks on campus were unaware of the resources available within central IT. So I would come out and give a series of talks on batch computing and how scientists might use the central IT compute cluster (the feature!). That would draw prospective users to the resources of central IT (the benefit!).

They dug it. There was a brief digression to fill in the details.

Bill texted again:

Keep going. There’s more. Go for it. You got this.

So I kept going. I suggested that I would also talk to the various user / stakeholders and ask them what they needed. With prompting from Bill, this turned into an offer to author a report describing the “capability gaps” between central IT’s offerings and the needs of the community. We would use my talks as bait to draw an audience with legitimate value, and leverage those connections to help central IT better align its services against its stakeholder needs.

Sorry for the consultant-speak. It’s what I do for a living.

On that call, it was enough. We got the work. I still sort of marvel that my words on that phone call created a trip to California.

As a mentor and friend would say about a different project, a decade later: “You spoke it into being.”

Knowing what I know now, I should have gone further. I could have helped more. My proposal was tactical rather than strategic. I should have offered to help with the root cause rather than just going after the symptoms. There should have been check-in and follow-up to make sure that I didn’t just drop a consultant report and leave, but instead fixed the problem for good.

How, exactly? Well that depends on a lot of other questions.

Did you have a “have an idea” moment?

If you’re further along in the career journey, can you give such a moment to a person on your team?

N of one

We are living through an uncomfortable period in the practice of medicine.

The dialogue between patient and physician is critically underserved, both in terms of tools for patients and physicians, and also in terms of the data context where that conversation takes place. This is unfortunate, because those are the moments of human to human care. Whether it’s a clinic visit, a lab test, a counseling or physical therapy session, the patient / provider meeting is when the full breadth of the caregiver’s experience and training can be brought to bear. At these moments, the subtle observations and pattern recognition that constitute diagnostic expertise come into play. These are are also the times when the nuance and detail of the patient’s lived experience can be shared to influence the course of diagnosis and treatment.

Population health turns into personal medicine at the bedside.

That conversation between patient and physician ought to be a first class citizen in terms of tool development, but it is not. It is within our reach to build a clinical care environment that retains high standards of data integrity and privacy while also focusing on empowering the human beings in the room rather the interests outside the door.

Due to the misaligned incentives that I’ve written about previously, the development of tools to support a data-rich conversation between patient and physician has generally taken a back-seat to software for billing, regulatory compliance, and mitigating risks to the care system. Recently, we have begun to instrument the clinic to support data gathering for research purposes. While this is a great idea on the face of it, it can have the unintended effect of leaving still less time for that critical conversation. Unless we can close the loop and bring the benefit of that instrumentation back to either physician or patient, it will be felt as friction, yet another loss.

I believe that we can have our data and do research on it too – and also that the clinical interaction is vastly more important than research use of the data we might gather along the way.

Research at no benefit to the participant

On the topic of research.

I’ve participated in a number of clinical research projects, mostly around genetics and genomics. The usual routine is to sit in a plastic chair and fill out a piece of paper using a pen tied to a clipboard. Some projects let me do the (still manual) data entry using a tablet. I used to gripe to the staff that this is a terrible, terrible way to gather data, but these days I just let them do their job and then blog or tweet about it later. The moment of truth comes with a needle stick, a swab, or a collection cup. Sometimes there are juice and cookies. Usually not.

Later, some anonymous lab will re-measure values that I’ve likely already got on my laptop. The math is churned for a few months, and perhaps somebody publishes. I usually won’t find out. I’ve stopped asking about that, because I’m bored with people who use HIPAA as an incantation to ward off further questions.

There are notable exceptions to this pattern of research’s stony indifference to the well being of the participants. The Coriell Personalized Medicine Collaborative stays in touch, nearly a decade after I spat in a tube for them. I get regular emails sharing the research results derived from my data. They also provide a crufty-but-effective web interface to allow me see curated and IRB approved subsets of my results along with risk scores and background reading. For all the well-deserved flak we give (and should continue to give) 23andme for selling our data to the highest pharmaceutical bidder without asking first – they too give me useful and regular value.

All of Us is saying the right words about citizen researchers and “partners rather than subjects,” but the proof will be in the pudding. Their involvement with the likes of Google leaves me a bit cold.

In nearly two decades of energetically engaged participation, I have yet to encounter even one research project that offered to close the loop on the data they collected by making it available to my physician in the context of my clinical care. Nearly two decades after we completed the Human Genome Project, this basic courtesy to research participants is still not on the menu.

We are left to fend for ourselves, to separate the useful offerings from the snake oil in the direct to consumer marketplace.

Personal Data

I’ve written, more than once, about my ongoing attempts to get out in front of the curve of personalized / precision medicine. I can see where we’re going, and I want to live there as soon as possible. Early 21st century medicine is, by and large, reactive. Nobody wants to hear, “I wish we had caught this earlier,” but that’s what you get when the protocol is to wait for visible symptoms before testing for disease. Risk officers exacerbate this by steering physicians away from data, citing the risk of incidental findings and HIPAA violations.

I’m still irked about the physician who tried to refuse to screen me for the colorectal cancer that killed my grandfather, despite genetic and symptomatic evidence that indicated that it might be worth an extra look.

In the future, patients will have conversations about their care in the context of a well structured repository of personal data. That data will come from multiple sources, most of them nonclinical. Our data will be available, with appropriate localization for education and language, directly to the patient. We will be able to share it with our in-home caregivers and with a care team that includes both physicians and other health and wellness professionals.

In the future, nobody will ask for my previous doctor’s FAX number.

Put another way, our physicians should have the same data-driven advantages that we already see in retail sales, in entertainment, and in finance. Our doctors should have the kind of integrated data that data monopolies like Google, Amazon, and Apple already use to influence everything from our buying to our voting.

Of course, that will require changes to – without exaggeration – nearly every aspect of the clinical data environment. We should start now if we want to see it in our lifetimes.

Mercury Retrograde

A company named Arivale has been a partner in my personal data journey for the last year. Through them, I could get clinical-grade laboratory bloodwork every six months. The Arivale dashboard showed me my data in context, along with information from my self-monitoring devices (pulse, weight, sleep, and steps per day), as well as notes from online self-evaluations and conversations with a “wellness coach.”

We were a year in, and it was just getting good when they shut down. They cited operational costs, implying that this sort of service is too expensive to provide – at just about any price. I wish I could see the math on that.

I have written before about my elevated mercury levels and how I was able to do a personal experiment to see whether changing my diet to omit fish rich in heavy metals would reduce them. Here’s a full year plot of the data. It worked.

Of course, over the same year, my cholesterol shot up. Here’s a graph of my LDL levels and particle count over the same period:

My first reaction to these plots was to ask “what changed?” One obvious thing that changed was my diet. I had mostly stopped eating mammals and birds around the year 2001. When I cut out mercury rich fish, I re-introduced a bit of red meat. On reflection, I was probably looking to replace the celebration meal-centerpieces that had formerly involved high-on-the-food-chain fish. Also, a slow-cooker roast on a Sunday is pretty wonderful.

The experiment over the next six months will be to dial back down on the red meat and see what happens to the cholesterol. My other grandfather died of heart disease. It’s something I keep an eye on.

Presentation

When I showed these plots to an experienced computational biologist whose PhD includes the word “statistics,” she had a strong reaction. To paraphrase: “What are they thinking, drawing straight lines between those points? That’s incredibly misleading. You got tested three times in a year. Three. This plot gives no insight into the underlying biological variability or the accuracy of the test! This is a gross oversimplification!”

I tried to make a case that the simple picture was accessible enough to spark curiosity and bring a novice like me into a data driven conversation. I told a story about different visualizations that would be suitable for everybody, including patients, data scientists, and also clinicians, all rendered based the same underlying data. She was unimpressed: “It doesn’t matter which of those categories of person we’re talking about, this plot would be misleading to all of them.”

I trust my statistician friend, and I can see the importance of making sure that the data presentation is as accurate as possible. I’m bummed out that I didn’t get to write the feature-request note to Arivale.

The clinic of the future

I will end on a hopeful note: I recently had the opportunity to visit a clinic from the future.

When you walk into Lab100 at the Mt Sinai School of Medicine, it feels more like an Apple store than a medical establishment. Everything is smooth curves, laminate, and frosted glass. Even though the data that they gather is more accurate, better calibrated, and more natively digital (no manual data entry here). The experience is also more personal and human than I’ve previously experienced in a clinical context.

You know how the restaurants and vendors at Disney resorts already know your preferences before you speak up? Imagine that but at the doctor’s office or in the hospital.

A visit to Lab100 begins by sitting down with your caregiver, side by side on a couch. You and the clinician talk while looking at the same pane of glass, a large flat-panel display that shows your medical history and current complaints. Instead of being separated by technology – the flat panel monitor between me and my doctor – here technology brings people together to facilitate that all-important doctor / patient conversation.

The beginning of the visit is a review of your chart to make sure that it’s accurate, complete, and relevant. You move through stations to measure blood chemistry, balance, cognitive function, grip strength, and more. At each station there are video presentations explaining what is being done and why. Your results show up on the screen immediately, including a longitudinal view of how you tested before.

At the end, there is another sofa and an even larger screen where you see yourself in context. Your data is shown along with a cohort of other real people, matched to you by gender and age. Then you and the provider talk and make a plan together.

It’s compelling. I hope that the idea takes off.

It felt like rich people medicine, but the founders of the lab assured me that it is built out of commodity components and designed to be replicated without undue expense. In 2019, the Apple aesthetic is certainly high-end, but for all that, there is an Apple store in every major city in the country. It is apparently possible to have that rich-people feeling while still keeping the coss to shopping mall levels – provided long as you’re selling consumer electronics and not health care.

Lab100 and whoever follows in Arivale’s footsteps are not the whole picture. There is a lot of work still to be done, and many entrenched interests to be appeased. We’ve spent decades building and empire tuned for billing, risk mitigation, compliance, and a weird and stilted flavor of data privacy. It’s going to take years to dig out of this hole.

For all that, the path is clear: Radically empower patients with access and control over their data, and make the physician/patient conversation a 1st class citizen in terms of tool development.

Let’s get on with it.

That consulting thing

People regularly ask, “how’s that consulting thing going?” It’s a fair question, and I don’t mind answering. The short answer is that it’s going better than I ever expected.

Conditions were basically perfect when I created my LLC in 2013: I had been employed by BioTeam for nine years. Since 2011, I had been dedicated nearly full time to a single customer, the NY Genome Center. The work with the genome center was all-consuming, so Bioteam had transitioned my day to day management responsibilities to other members of the team. That made it minimally disruptive to ease myself out and “go direct” with the Genome Center.

About a year later, NYGC was to the point where it didn’t make sense for them to rely on consultants anymore. I have great respect and love for the team and the mission, but I didn’t want to live in Manhattan. I came back to Boston and hired on as the leader of research computing at the Broad Institute.

During that first round of independence, I didn’t give much thought at all to business development or process. I had NYGC to rely on, and a few other small gigs sort of landed in my lap along the way.

Fast forward to March of 2017. I decided to depart the Broad and give the “independent” thing another go. It was a very different situation. Without that single large “anchor” customer in hand, business development was essential. I started blogging (yes, this blog is a business development activity), meeting friends and colleagues, tweeting more actively, and generally hustling to raise my profile and build a client base.

It worked.

Two years in, I’ve closed deals with twenty different companies: Seven biotechs, four technology vendors, three other consulting groups (mostly subcontracting for specialized skills and expertise), two universities, a pharmaceutical company, a regional hospital system, a government agency, and an independent research institute. Two of my clients are coming up on their two year anniversary of working with me. Eight others were brief “one and done” engagements.

It’s going well enough that I’ve had to deal with some of the challenges of success.

There’s a fair amount of road time. I’m platinum status with Marriott, “select executive” on Amtrak, and Mosaic with Jetblue. It’s frankly disheartening that, in terms of lifetime totals, I’ve spent nearly two full years worth of nights sleeping in hotels. On the other hand, I benefit from the ongoing biotech miracle that is Kendall Square: Nine of my clients are within an easy bicycle ride from my house.

Managing travel time is among the most important things that I do for my health, happiness, and profitability. It turns out to be straightforward for me to book myself into travel hell, which certainly -feels- like being productive. However, for me at least, that productivity is an illusion. Looking at the numbers, the months when I was running myself ragged going back and forth across the country were actually among my -least- profitable, especially factoring in the downtime that I need to recover from even a few weeks of being flat-out on the road.

The basics of communication and scheduling also take discipline. Slack is ubiquitous among my clients, which means that I check something like six different workspaces on a daily basis. My life would be utter insecure chaos without a password manager to manage logins and secrets. I practice vigorous defensive calendaring to ensure that my days don’t wind up chopped into useless shards of time and to make space for life maintenance activities. Along the way, I’ve disabled all but the most essential alerts on my desktop and mobile devices. I’ve replaced an interrupt-driven way of life (which actually just doesn’t work at scale) with norms and boundaries that allow people get my attention without having to be online and interrupted all the time.

Independence was scary at first, both from a financial and from a lifestyle perspective. It certainly doesn’t work for everybody, and I’m cognizant of the luck and privilege that make it possible for me to live this way. I still have regular bouts of imposter syndrome where I realize that I cannot possibly be getting away with this.

As always, huge thanks to the community of colleagues, friends, and customers who make it all possible. And now, back to work!

25 Rules For Sons

A few days ago, one of my professional contacts shared a list titled “rules for sons” on LinkedIn. It was filled with advice like, “the man at a BBQ grill is the closest thing to a king,” and “carry two handkerchiefs. The one in your back pocket is for you. The one in your breast pocket is for her.”

Lists like this are always making the rounds. This one may have started with a 2015 book titled “Rules for my Unborn Son.” There are other versions online, but they’re all the same story. Manhood is about wearing sport coats, working the grill, asking the pretty girl out, marrying the woman, playing team sports, and maybe serving in the military.

I scrolled past, but found that it was still bugging me after a couple of minutes, so I went back and left a two word comment: “Misogynist claptrap.”

He (you knew it was a guy who shared the post, right?) responded almost immediately that I had clearly not read rule 23: “After writing an angry email, read it carefully. Then delete it.”

LOL, right?

I severed the LinkedIn connection. No harm, no foul – but I don’t go to LinkedIn looking for irritation, and arguing in the comments section has never, even once, changed anybody’s mind.

I shared the story with my spouse, and she said, “You should tell him, and you should tell his employer too. Those people scare me. They can’t hurt people like you, but they can and do hurt people like me.”

So in the spirit of “hey bro, not cool,” here’s the deal:

Truth in Advertising

My contact is the regional sales lead for a new company. His job is to open doors, get meetings, develop relationships, and eventually to make sales.

For a person in that position, LinkedIn is a marketing tool. This guy is an experienced professional. He knows what he’s doing here, at least at an unconscious level. His list – like this blog post – is signaling to his community about what kind of a person he is and what he expects of the rest of us.

No matter the title, this is not about any notional “sons.” Instead, this is how he expects the men in his professional network to act.

The message is that my contact is a certain kind of businessman. He has a firm handshake, looks you in the eye, and is an experienced negotiator. You know he’ll close the deal and then you can both go home to your wives and kids.

Under the hood, though, the inverse message is also clear: We’re supposed to think less of men who don’t make strong eye contact, who wear nontraditional clothing, who (for whatever reason) don’t marry the girl or work the grill. Those people aren’t up to this guy’s standards.

He also keeps a clean hankie on hand in case one of the ladies is overcome with emotion. Good dude, right?

So What’s The problem?

Lists like this rise from a nostalgia for a time when gender and relationship roles were supposedly simpler. Men were men, women were women, and there was a well defined and correct way to fill either role.

Of course, those roles were radically asymmetric when it came to the workplace. Women were (and are) paid less, under-promoted, subject to outright abuse and subtle neglect, and generally treated like second class human beings. We’re going to be grappling with the fallout of those antique and chauvinist ideas for the rest of our lives.

Even worse: The idea that there is a single correct way to experience gender is incredibly toxic. Our society is slowly and haltingly coming to grips with the diversity of human experience – and lists like this, while superficially innocuous, are a step backwards.

Things weren’t actually simpler back then. Rather, people who didn’t fit into the dominant patterns either adopted an ill-fitting persona at great emotional and mental cost, or else they were excluded, ostracized, and subject to violence and even questionable medical procedures aimed at correcting them because they were somehow wrong at being themselves.

The problem with pushing this as some kind of misty eyed ideal in a professional / business context like LinkedIn should be apparent on the face of it.

The Inappropriate Thing

The thoughtful reader might go back, look at the list, and say that this blog post is a bit of an uptight overreaction. There is no particular word or phrase that stands out as inappropriately crossing some clear line. That’s how this sort of signaling works. The inappropriate stuff emerges gradually as we establish some spaces (the grill, the locker room, perhaps the board room or the industry event) as masculine and therefore subject to different rules.

This is the gateway to some really nasty stuff. Once we start down this road, we’re just a fraternity induction and an MBA away from the @GSElevator twitter feed.

More on that at the end of the post, but first allow me to share another example:

The All Male Conference

A few years back, I was invited to speak to a meeting of the US sales and engineering teams of an early stage technology company. I was already a customer, and my team was in the middle of a proof of concept evaluation of their new product.

When I arrived, I was struck by the massive gender imbalance. It was an all male event with at least 50 men in attendance. The two women at the conference center were the receptionist who gave me my badge, and the person who served the coffee.

The thing had a weird and macho vibe: When the national sales lead finished his presentation, his last slide was a picture of some bird, perhaps a duck, that he had run over on the way to the meeting. The room laughed, some uncomfortably. He left the grisly picture lingering on the projector while he took questions.

After my talk, I had an opportunity to meet with the executive team. I asked about the total lack of women at the event, and they laughed and said that they had just been talking about that.

LOL, weird, right?

I pushed, and they told me that they were working on it, but had to take it slow. Dead-duck guy? He brought in amazing sales numbers. He apparently saw any effort at diversity as diluting his talented team with charity cases and low performers. They didn’t want to alienate him, so they had to tread carefully.

I cancelled the proof of concept and insisted that they go only through me for future communications I don’t know what other tricks dead-duck-guy had on offer, but I knew I didn’t want him talking to my team.

This particular story has a happy ending. The company did some soul searching and then hired a global head of diversity, who was the most forceful and intersectional person you’ll ever meet. They made a sustained effort to fix their biased and unbalanced team. Dead-duck guy may still be there, but I certainly never saw him around again.

Along the way they discovered something really important: Their product had a much larger potential audience than they had realized.

The company had been blind to that larger market because so many potential customers had been unwilling to take an initial meeting with dead-duck-guy and his team. They never showed up as qualified prospects.

Let me say that again: The macho, hyper-masculine approach of their best sales guy was alienating half of their target audience. The people who didn’t want to deal with him didn’t call back and explain themselves. They just moved on.

Maybe they took Rule 23 to heart.

Conference Season

I mentioned that the inappropriate thing usually shows up later. Let’s talk about that:

Conference season is starting. That means lots of mix-and-mingle events. The goal is relationship building. There will be coffees, breakfasts, lunch-and-learns, bar nights, and boozy steak dinners. There will be private presentations back at the Air B&B, invitations to travel and speak at the national convention, and so on.

As these invitations ramp up, my experience is that they move more and more into masculine spaces that exclude women. Once there, we always tend to see a bit more of the old “locker room” banter. It’s a ratchet that goes in only one direction.

This happens gradually to avoid anybody getting all weird and uptight when the enticements on offer depart from what we talk about in mixed company.

I mean seriously, you know why they put all the big industry events in Las Vegas, right? It’s not for the child care facilities, I can tell you that much.

Why Speak Up?

Real talk: I’m pretty nervous about posting this article.

I know that my contact will see it – I plan to send him a link (seems only fair). I know at least a few other people who will think I’m talking about them. I feel social pressure against rocking the boat and upsetting anybody.

I felt exactly the same way before I spoke up at that all-male team meeting. It’s super stressful go to somebody’s party and tell them that they are doing it wrong. I was the invited speaker. I checked all the boxes of gender, race, and personal presentation to be welcome, and I still very nearly censored myself.

The thing that pushed me over the edge, then and now, is that this is the same pressure that keeps women silent in the face of uncounted insults and indignities. It gave me just the briefest glimpse of what it’s like to be on the unpleasant side of social pressure to conform, stay quiet, and obey. That brief glimpse was enough to motivate me to speak up then, and it continues to do so today.

As the saying goes: If you see something, say something.

I’m saying something.

Not cool, bro.

The network is slow: Part 1

Let me start off by agreeing that yes, the network is slow.

I’ve moved a fair amount of data over the years. Even when it’s only a terabyte or two, the network always seems uncomfortably slow. We never seem to get the performance we sketched out on the whiteboard, so the data transfer takes way longer than expected. The conversation quickly turns to the question of blame, and the blame falls on the network.

No disagreement there. Allow me to repeat: Yes, the network is slow.

This post is the first in a series where I will share a few simple tools and techniques to unpack and quantify that slowness and get things moving. Sometimes, hear me out, it’s not the entire network that’s slow – it’s that damn USB disk you have plugged into your laptop, connected over the guest wi-fi at Panera, and sprayed across half a continent by a helpful corporate VPN.

True story.

My point here is not to show you one crazy old trick that will let you move petabytes at line-rate. Rather, I’m hoping to inspire curiosity. Slow networks are made out of fast and slow pieces. If you can identify and remove the slowest link, that slow connection might spring to life.

This post is about old-school, low-level Unix/Linux admin stuff. There is nothing new or novel here. In fact, I’m sure that it’s been written a bunch of times before. I have tried to strike a balance to make an accessible read for the average command line user, while acknowledging a few of the more subtle complexities for the pros in the audience.

Spoiler alert: I’m not even going to get to the network in this post. This first one is entirely consumed with slinging data around inside my laptop.

Endless zeroes

When you get deep enough into the guts of Linux, everything winds up looking like a file. Wander into directories like /dev or /proc, and you will find files that have some truly weird and wonderful properties. The two special files I’m interested in today both live in the directory /dev. They are named “null” and “zero”.

/dev/null is the garbage disposal of Linux. It silently absorbs whatever is written to it, and never gives anything back. You can’t even read from it!

energon:~ cdwan$ echo "hello world" > /dev/null 
energon:~ cdwan$ more /dev/null
/dev/null is not a regular file (use -f to see it)

/dev/zero is the opposite. It emits an endless stream of binary zeroes. It screams endlessly, but only when you are listening.

If you want your computer to spin its wheels for a bit, you can connect the two files together like this:

energon:~ cdwan$ cat /dev/zero > /dev/null

This does a whole lot of nothing, creating and throwing away zeroes just as fast as one of the processors on my laptop can do it. Below, you can see that my “cat” process is taking up 99.7% of a CPU – which makes it the busiest thing on my system this morning.


Which, for me, raises the question: How fast am I throwing away data?

Writing nothing to nowhere

If my laptop, or any other Linux machine, is going to be involved in a data transfer, then the maximum rate at which I can pass data across the CPU matters a lot. My ‘cat’ process above looks pretty efficient from the outside, with that 99.7% CPU utilization, but I find myself curious to know exactly how fast that useless, repetitive data is flowing down the drain.

For this we need to introduce a very old tool indeed: ‘dd’.

When I was an undergraduate, I worked with a team in university IT responsible for data backups. We used dd, along with a few other low level tools, to write byte-level images of disks to tape. dd is a simple tool – it takes data from an input (specified with “if=”) and sends it to an output (specified with “of=”).

The command below reads data from /dev/zero and sends it to /dev/null, just like my “cat” example above. I’ve set it up to write a little over a million 1kb blocks, which works out to exactly a gigabyte of zeroes. On my laptop, that takes about 2 seconds, for a throughput of something like half a GB/sec.

energon:~ cdwan$ dd if=/dev/zero  of=/dev/null bs=1024 count=1048576
1073741824 bytes transferred in 2.135181 secs (502880950 bytes/sec)

The same command, run on the cloud server hosting this website, finishes in a little under one second.

[ec2-user@ip-172-30-1-114 ~]$ dd if=/dev/zero  of=/dev/null bs=1024 count=1048576
1073741824 bytes (1.1 GB) copied, 0.979381 s, 1.1 GB/s

Some of this difference can be attributed to CPU clock speed. My laptop runs at 1.8GHz, while the cloud server runs at 2.4GHz. There are also differences in the speed of the system memory. There may be interference from other tasks taking up time on each machine. Finally, the system architecture has layers of cache and acceleration tuned for various purposes.

My point here is not to optimize the velocity of wasted CPU cycles, but to inspire a bit of curiosity. While premature optimization is always a risk – I will happily take a couple of factors of two in performance by thinking through the problem ahead of time.

As an aside, you can find out tons of useful stuff about your Linux machine by poking around in the /proc directory. Look, but don’t touch.

[ec2-user@ip-172-30-1-114 ~]$ more /proc/cpuinfo | grep GHz
model name : Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz

Reading and writing files

So now we’ve got a way to measure the highest speed at which a single process on a single CPU might be able to fling data. The next step is to ask questions about actual files. Instead of throwing away all those zeroes, let’s catch them in a file instead:

energon:~ cdwan$ dd if=/dev/zero  of=one_gig  bs=1024 count=1048576
1073741824 bytes transferred in 7.431081 secs (144493358 bytes/sec)

energon:~ cdwan$ ls -lh one_gig
-rw-r--r--  1 cdwan  staff   1.0G Mar  5 08:57 one_gig

Notice that it took almost four times as long to write those zeroes to an actual file instead of hurling them into /dev/null.

The performance when reading the file lands right in the middle of the two measurements:

energon:~ cdwan$ dd if=one_gig of=/dev/null bs=1024 count=1048576
1073741824 bytes transferred in 4.222885 secs (254267367 bytes/sec)

At a gut level, this makes sense. It kinda-sorta ought-to take longer to write something down than to read it back. The caches involved in both reading and writing mean we may see different results if we re-run these commands over and over. Personally, I love interrogating the behavior of a system to see if I can predict and understand the way that performance changes based on my understanding of the architecture.

I know, you were hoping to just move data around at speed over this terribly slow network. Here I am prattling on about caches and CPUs and RAM and so on.

As I said above, my point here is not to provide answers but to provoke questions. Agreed that the network is slow – but perhaps there is some part of the network that is most to blame.

I keep talking about that USB disk. There’s a reason – those things are incredibly slow: Here are the numbers for reading that same 1GB file from a thumb drive:

energon:STORE N GO cdwan$ dd if=one_gig_on_usb of=/dev/null bs=1024 count=1048576
1073741824 bytes transferred in 75.596891 secs (14203518 bytes/sec)

That’s enough for one post. In the next installment, I will show a few examples of one of my all time favorite tools: iperf.

Biology is weird

Biology is weird. The data are weird, not least because models evolve rapidly. Today’s textbook headline is tomorrow’s “in some cases,” and next year’s “we used to think.”

It can be hard for non-biologists, particularly tech/math/algorithm/data science/machine learning/AI folks, to really internalize the level of weirdness and uncertainty encoded in biological data.

It is not, contrary to what you have read, anything like the software you’ve worked with in the past.  More on that later.

This post is a call for humility among my fellow math / computer science / programmer type people.  Relax, roll with it, listen first, come up to speed. Have a coffee with a biologist before yammering about how you’re the first smart person to arrive in their field. You’ll learn something. You’ll also save everybody a bit of time cleaning up your mess.

References

Don’t be the person who walks into a research group meeting carrying a half read copy of “Genome” by Matt Ridley, spouting off about how all you need is to get TensorFlow running on some cloud instances under Lambda and you’re gonna cure cancer.

This is not to speak ill of “Genome,” it’s a great book, and I’m super glad that lots of people have read it – but it no more qualifies you to do the heavy lifting of genomic biology than Lisa Randall’s popular press books prepare you for the mathematical work of quantum physics.

You’ll get more cred with a humble attitude and a well thumbed copy of “Life Ascending” by Nick Lane. For full points, keep Horace Judson’s “The Eighth Day of Creation” on the shelf.  Mine rests between Brooks’ “The Mythical Man Month” and “Personality” by Daniel Nettle.

The More Things Change

Back in 2001, the human genome project was wrapping up.  One of the big questions of the day was how many genes we would find in the completed genome.  First, set aside the important but fundamentally un-answerable question of what, exactly, constitutes a gene.  Taking a simplistic and uncontroversial definition, I recall a plurality of well informed people who put the expected total between 100,000 and 200,000.

The answer?  Maybe a third to a sixth of that.  The private sector effort, published in Science, reported an optimistically specific 26,588 genes.  The public effort, published in Nature, reported a satisfyingly broad 30,000 to 40,000. 

There was a collective “huh,” followed by the sound of hundreds of computational biologists making strong coffee. 

This happens all the time in Biology. We finally get enough data to know that we’ve been holding the old data upside down and backwards.

The fundamental dogma of information flow from DNA to RNA to Protein seems brittle and stodgy when confronted with retroviruses, and honestly a bit quaint in the days of CRISPR.  I’ve lost count of the number of lower-case modifiers we have to put on the supposedly inert “messenger molecule” RNA to indicate its various regulatory or even directly bio-active roles in the cell.

Biologists with a few years under their belt are used to taking every observation and dataset with a grain of salt, to constantly going back to basics, and to sighing and making still more coffee when some respected colleague points out that that thing … well … it’s different than we expected.

So no, you’re not going to “cure cancer” by being the first smart person to try applying math to Biology.  But you -do- have an opportunity to join a very long line of well meaning smart people who wasted a bunch of time finding subtle patterns in our misunderstandings rather than doing the work of biology, which is to interrogate the underlying systems themselves.

Models

To this day, whenever I look at gene expression pathways I think: “If I saw this crap in a code review, I would send the whole team home for fear of losing my temper.”

My first exposure to bioinformatics was via a seminar series at the University of Michigan in the late 90’s. Up to that point, I had studied mostly computer science and artificial intelligence. I was used to working with human-designed systems. While these systems sometimes exhibited unexpected and downright odd behaviors, it was safe to assume that a plan had, at some point, existed. Some human or group of humans had put the pieces of the system together in a way that made sense to them.

To my eye, gene expression pathways look contrived. It’s all a bit Rube Goldberg down there, with complex and interlocking networks of promotion and inhibition between things with simple names derived from the names of famous professors (and their pets). 

My design sensibilities keep wanting to point out that there is no way that this mess is how we work, that this thing needs a solid refactor, and that … dammit … where’s the coffee?

It gets worse when you move from example to example and keep finding that these systems overlap and repeat in the most maddening way. It’s like the very worst sort of spaghetti code, where some crazy global variable serves as the index for a whole bunch of loops in semi-independent pieces of the system, all running in parallel, with an imperfect copy paste as the fundamental unit of editing.

This is what happens when we apply engineering principles to understanding a system that was never engineered in the first place.

Those of us who trained up on human designed systems apply those same subconscious biases that show us a face in the shadows of the moon. We’re frustrated when the underlying model is not based on noses and eyes but rather craters and ridges. We go deep on the latest algorithm or compute system – thinking that surely there’s reason and order and logic if only we dig deep enough.

Biologists roll with it. 

They also laugh, stay humble, and drink lots of coffee.