23andMe Data

Just to get this out of the way: 23andme. There. Google has been fed Welcome, news-hungry bloggers with constantly running searches.

I finally got my data yesterday, a full 11 weeks after my tube of spit was received. My whining was noticed, and eventually there was some explanation on the company blog. In response, there was sympathetic commentary as well as some outright snarkiness in response. Sheesh, Sherpa. We all know you’re smart. Relax.

In any event, I say “whatever.” Business is hard. I’ve finally got my data. Let’s play.

I deliberately held off on judging the web interface until now. Their example genotypes are loaded with somewhat interesting stuff. How would it look on data from an average height, average build, no major complaints white guy? My first thought was that 23 and me has done a remarkable amount of work to make the basic biology comprehensible to the interested layperson. I have no idea whether Decodeme and Navigenics have done similar flash and java whizziness. Perhaps unfortunately for me, I’ve had a bunch of biology classes and have been doing bioinformatics for a living for more than a couple of years. Their tutorials are old hat, and while the miraculous wonder of DNA replication is still pretty newsworthy, but that’s not why I bought a ticket for this particuar ride.

If you don’t already know how DNA, SNPs, and Microarrays work, it’s a good intro.

My first impression of looking at the Gene Journal was “this is it?” I knew up front that we know next to nothing about how these data points relate to physical predispositions, aptitudes, and diseases. Still, I was bummed when I saw only 58 entries in the gene journal. These are the detailed, annotated pages that 23 and me provides about specific physical conditions. I did get a snicker out of the fact that one of the subsections of the journal is named nether regions.

Since I was already giggling, I dug into the Prostate Cancer page, and saw sort of what I was expecting to see. A nice little write up on what prostate cancer is and what it does, and then a couple of charts comparing my relevant SNP with the population baselines. They even do a slick little thing with the set of five markers that are currently implicated in this condition.

As I clicked through, the data was absolutely inconclusive … as expected. A percent higher here, a percent lower there. The very best research that they cite has at least two studies, each of which has at least 1,000 participants. That seems large until you start dealing in fractional percentages, at which point you’ve got barely enough data to say anything at all.

Next, I noticed what was missing. The stone cold indicators currently used in genetic testing for Alzheimer’s, Huntington’s, and so on are simply not present in the gene journal. The Breast Cancer indicators are in there, but there are only two of them. Contrast with the SNPedia entry where an ungodly list is present.

SNPedia is run by my friend and colleague cariaso. The vast majority of annotations in there are the product of his evenings and weekends. Let me say that again: One or two people, in the evenings, have presented a far more comprehensive resource than a team of venture capital funded biologists. Now, the SNPedia AJAX and java whizziness is not nearly on par with 23 and me, but that’s not what I’m here for.

To continue, cariaso has put forward links from SNPedia into 23 and me’s page. I can very easily take some genes that are implicated according to SNPedia, go into my raw data from 23 and me, and bounce back to see what that genotype might mean. In fact, I can do the math that 23 and me might have done and calculate a more comprehensive set of odds than they did. Turns out I’m at a slightly elevated risk for breast cancer. I’m also a guy. Take that statistics!

Back to Alzheimer’s. That’s also a simple one. I’ve got the “less likely to get Alzheimer’s” version of APoE. However, it took a good 10 minutes of clicking back and forth between the two sites. The 23 and me page about APOE tells me less than nothing. Further, there isn’t even a gene journal entry for this largely genetically determined and very well researched disease.

I can see a reason for this: If I was in the business of selling genotyping, I would have nightmares about the first time a customer killed himself when he learned about some propensity or condition. The same sort of standards that apply to revealing HIV test results might well apply here. As a society and as individuals we will need to spend some hard time thinking about how much we want to know and when we want to know it.

So they’ve deliberately (it seems) scrubbed the gene journal of easy and clear evidence for simply diagnosable risk factors. Thanks guys. I can download the raw data and do the work myself … but a summary page eludes me. That’s the useful stuff that I was looking for. I know that I have a “58% chance” of having brown eyes, and that my earwax is wet. Further, I knew that my dad’s fathers came from Ireland, and that I’m Northern European on both sides.

Me? I want to know it all, and I want to know it now. Therefore, I downloaded my data into a 4.5MB text file. First look:


Total Lines: 576119
Comments: 14 lines
SNP data points: 576105
Uncalled (--): 3189 (0.5%)

So that seems fair to me. Half a percent “no call” rate is actually pretty darn good. Note that this says nothing about error rate, nor about quality of the called SNPs. It does say a bit about where they set their quality thresholds. I’ll dig into that on a second look.

Then I used my colleague cariaso‘s Promethease to churn through an analysis. This is a program that does exactly what I was doing by hand. For each data point, dig into SNPedia, figure out whether it’s associated with any diseases, and if I have the rare version. Then it produces a report listing my information both by disease and by rarity. That’s what I was looking for. He and I had a nice chat about usage, being trapped on Windows, and about data privacy.

Even with my shiny promethease report, there’s still not much information in there. The state of human genetics is “just getting started.” However, I plan to run Promethease once a month or so as a way to keep up on the advances in science relevant to me. Hopefully in a few years there will be more to read about.



Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.