Genetics, Genomics, and the GI Doctor

Genome Fanboy

I’m a personalized medicine and genomics nerd and fanboy. Have been for years.

Back in 2008, I did 23andme. I immediately downloaded my raw data and began to tinker (as one does).

I poked and prodded at the text files they provided – and downloaded updated versions of that “same” data from time to time. My reported genotypes changed as their processes matured. Being a nerd, I wrote a horrible little PERL script to compare the versions.

I pulled down my raw data this morning and used that same PERL script (unchanged since 2009) to compare it with the original file I downloaded in 2008. Much to my surprise – it worked:


religion:genome cdwan$ ./compare.pl \
genome_Christopher_Dwan_20080407151835.txt \
genome_Christopher_Dwan_v1_v2_v3_Full_20170926071925.txt

Set 1: 576,105 calls
Set 2: 1,001,428 calls

Set 1 but not set 2: 608. Set 1 in both: 575,497 total: 576,105
Set 2 but not set 1: 425,931. Set 2 in both: 575,497 total: 1,001,428

No call on both: 290
Same call on both: 556,181
Double letter to same single letter: 16,028
no call on set 1, but call on set 2: 2,848
no call on set 2, but call on set 1: 10
Different calls: 140

What does that mean? Today’s version of their assay gives information on about a million locations on the genome – a little bit less than double the 2008 figure. The vast majority of those reported genotypes (“calls”) have remained constant over time. 99.3% are exactly the same (though with formatting changes) from 2008 to today. Only 140 (0.02%) have changed.

What does it mean?

Mike Cariaso and I were working together at the time that 23andme came out. Driven by the same curiosity, he started a nights-and-weekends project that he called SNPedia. Mike used software developed by the WikiMedia foundation, combined with some PERL scripts of his own, plus a lot of old fashioned, “stay up late and find a way,” to scrape through and cross reference resources like the publication abstracts available in PubMed, the genotype frequency tables of HapMap and others.

He made a wiki page for every genotype mentioned anywhere, and started to accumulate information on pages where various sources seemed to be talking about the same thing. SNPedia is a glorious example of the hell of data skew that is the lot in life of the practicing bioinformatician. It takes vast amounts of work just to answer the question, “are these two observations about the same location on the genome, or not?” Heaven help you if you want to consider complex questions of inheritance.

Along the way, we had many conversations about what constituted an “interesting” genotype. Was it rare? Was it mentioned in lots of articles? Was it mentioned in an article that was cited by lots of articles? Was it implicated in some horrible disease or in some desirable phenotype? Did people keep changing their minds about it? Eventually Mike codified his (and other people’s) thinking on these matters into a tool that he called Promethease. Over the years, it’s grown into a really impressive analysis tool that takes genotype data as input and produces an aggregate report that brings the “interesting bits” to the top – for various definitions of interesting.

I re-ran Promethease on my updated 23andme data this morning, and it’s come a really long way since those early days. If you’re into this sort of thing, it’s worth checking out.

For what it’s worth, I seem to be relatively typical. Of course, by this point in my life I would probably have figured out by other less nerdy means if I carried one of the the better known mutations.

I also got into the whole participatory genomics thing. Some other morning in 2008, Mike and I trooped over to an office in Maryland to enroll in the Coriell personalized medicine collaborative. We signed all sorts of consents and drooled into tubes and then went home. Since then, Coriell continues to win a place in my heart by, from time to time, emailing me to let me know that my data was used in some study or other. On occasion, they email me that my genotype is -actionable-.

It turns out that I’m much more motivated by the idea of active participation than I am by micro-payments or even promises of strong privacy protections.

Because of that, I enrolled in the Personal Genome Project when they opened it up back in 2010. When 23andme briefly offered an Exome service in 2012, I bought it, and uploaded it to the PGP. As with Coriell, I get the occasional email showing me where my data was put to use.

I will probably wind up signing up for Arivale, out of the same motivation to -know- and -use- this technology.

Why does this come up today?

It comes up because I saw a GI doctor today, and he was working with almost zero data about me. He didn’t even have access to the family history that my primary care physician had taken. His baseline in deciding how to proceed with me was based on my age (“a bit young to be screened for colon cancer, according to the new national standards”), and my general physical appearance (“are you a runner? You look fit!”).

When I mentioned that my mother’s father had died of metastatic colon cancer, that various GI disorders run in my father’s family, and that I have some pretty good data indicating that no matter how fit I look – that I’m not exactly at the statistical mean for this disease – it was a complete surprise to him.

He suggested that we run the same labs that my primary care doctor had run six weeks prior that had led to this appointment. He had literally no idea that those labs had already been run.

Then again, how would he know? My primary care doctor is in a different practice, probably using a different electronic medical record system. There is no financial incentive anywhere in either one of their practices, nor in the system of insurance and payment, that would encourage them to share that data electronically.

I’m generally very, very fortunate in matters of health. I’m an engineer who has been working in or around genomics for almost 20 years. I understand heritability, genetic penetrance, and the interplay of genomic and lifestyle risk factors. I eat mostly vegetarian. I watch my weight. I get cardiovascular exercise on the regular. I’ve also had the resources and the curiosity to pay out of pocket for all kinds of critical data about my body.

I’ve got the resources and the energy to insist, and I’ve got communication skills honed by working with teams to design complicated systems over years and years. I was able to convince him that the fact that I’m a runner is probably less relevant than my family history. I was able to share fresh laboratory data and prevent yet another iteration of office visits to review redundant tests. We didn’t even get into my genotype on RS2273535, RS6983267, and RS7903146 – though I think that those are also germane.

My doctor was working at a handicap in terms of treating me. I was able to overcome that handicap because I carry a device with access to PDFs and summaries of my medical records in my pocket, because I take an active hand in my health, and because I have the time and the energy to nerd out about both health and statistics.

Most people are not so lucky. They would have gone home and told their family that, whatever their primary care provider had thought, the specialist doc said they looked healthy. Their screening criteria would have been based on population aggregates – and uninformed ones at that.

I’ve been working in this field for nearly two decades, preaching the gospel of data driven medicine, and I say that this is pathetic. We must give physicians the data driven tools that they need to practice effective medicine.

We must do better, and we must do it starting today.



2 thoughts on “Genetics, Genomics, and the GI Doctor”

  • Well put! At a recent cancer program meeting at the broad Atul Gawande said the broad is developing “star wars universe” treatments that people them have to use in a Flintstones medical world.

  • Y’know, I’ve toyed repeatedly with the idea of doing 23andme as a similar side project (“Learn variant calling with *your* genetic information!”), but I’m also a bit leery of letting them have that as well.

    On the one hand, you’ve got things like Equifax, and the worry of if someone’s going to decide that patching something is just Not A Priority.

    On the other hand you’ve got Verizon and Comcast, and the ability to package up and sell off your network traffic usage to whomever wants to pony up a bit of cash (https://nypost.com/2017/03/28/internet-providers-closer-to-selling-customers-private-info/; while I’m aware that they all rushed to claim that they do not and have no plans to do such a thing, now we’re dependent on their good graces and the value of their word).

    I’d really love to do this, but the cynical technologist in me looks at those two things and so far isn’t willing to bite.

Leave a Reply to Mike Steeves Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.