{"id":369,"date":"2017-12-08T07:25:47","date_gmt":"2017-12-08T12:25:47","guid":{"rendered":"https:\/\/dwan.org\/?p=369"},"modified":"2019-10-25T15:05:51","modified_gmt":"2019-10-25T19:05:51","slug":"deepvariant","status":"publish","type":"post","link":"https:\/\/dwan.org\/index.php\/2017\/12\/08\/deepvariant\/","title":{"rendered":"DeepVariant"},"content":{"rendered":"<p>Earlier this week, Google published <a href=\"https:\/\/research.googleblog.com\/2017\/12\/deepvariant-highly-accurate-genomes.html\">DeepVariant<\/a>, a machine learning (ML) based tool for genomics.  The software is now available on the <a href=\"https:\/\/blog.dnanexus.com\/2017-12-05-evaluating-deepvariant-googles-machine-learning-variant-caller\/\">DNANexus platform<\/a>.<\/p>\n<p>This is kind of a big deal, and also kind of not a big deal.<\/p>\n<h3>Does it matter?<\/h3>\n<p>It\u2019s a big deal in the same way that ML systems exceeding the performance of radiologists on diagnostic classification of images is a big deal. Sure, it\u2019s a little creepy and intimidating when a computer program exceeds a respected and trained professional at one of their tasks. On the other hand, it would take a spectacularly naive and arrogant person to claim that a radiologist\u2019s only job is image classification.<\/p>\n<p>It\u2019s not a big deal because there is still so much domain expertise required to derive scientifically meaningful results from genomic data, and because these methods are still changing all the time.<\/p>\n<p>The DeepVariant team took one of the points in the genomic analysis workflow where scientists have historically used eyeballs and intuition to identify subtle patterns in the data. Prior variant callers were built atop that intuition, coding it into complex algorithms. That\u2019s why there was a well characterized image format (Pileup) already available as a starting point for the project \u2013 scientists still want to <em>look<\/em> at the results of their callers to see if the results align with intuition.<\/p>\n<p>That\u2019s why there was a contest for the team to win. Because we\u2019re still figuring this stuff out.<\/p>\n<p>It was a good place to start, and the system performed much as we might expect.<\/p>\n<h3>Much to Learn<\/h3>\n<p>I saw a preview of this technology at the <a href=\"http:\/\/broadinstitute.org\">Broad Institute<\/a>, sometime in mid to late 2016. We were all really impressed. I remember that someone asked exactly the right question: \u201cCan it -discover- a new sort of biological artifact or feature?  One that we haven\u2019t seen before?\u201d<\/p>\n<p>The team was unambiguous: Of course it can\u2019t. Until the patterns are present in the training data, there\u2019s nothing there to learn. Further, this particular approach will never suggest that, maybe, we\u2019re looking at the problem sideways.<\/p>\n<p>Put another way: There is a <em>lot<\/em> of genomic biology still to be learned.<\/p>\n<p>Every year that I\u2019ve been in and around this field, there has been at least one discovery that has up-ended major pieces of institutional knowledge and dogma. Formerly safe assumptions get melted down, combined with new insights, and formed into superior alloys <em>all the time<\/em>.<\/p>\n<h3>The more subtle challenge<\/h3>\n<p>There is a more subtle challenge in this particular case: We\u2019re dealing with <em>measurements<\/em> rather than <em>facts<\/em> here. The process of DNA sequencing is complex and subtle, with biases and omissions and room for improvement throughout. The way that this particular test was framed up assumes that there is one unambiguous correct answer to the question of variation, and that we already know that answer.<\/p>\n<p>A genomic biologist \u2013 or scientist of any stripe \u2013 has to hold two truths in their head at the same time: They must gather data to answer questions, and they must also accept that the data may suggest refinements to the question itself. Those refinements to the question, the ones that call existing knowledge in question \u2013 that\u2019s where the real innovation happens.<\/p>\n<p>Given enough data, machine learning now excels at answering well formed questions. The task of questioning our assumptions and changing the question itself remains much more subtle.<\/p>\n<h3>The take home<\/h3>\n<p>The short version is that computers are here, right now, to take away any part of any job that involves memorizing a large corpus of data and then identifying new examples of old categories based on patterns in that data. This is just as true for eyeballing pileup images as it is for reading ZIP codes or license plates.<\/p>\n<p>Machine learning is also here for any part of your job in which you merely turn the crank on a bunch of rules and formulas. This has already impacted a bunch of different jobs: Tax preparation, law, real estate, and travel planning have all undergone radical changes in the last decade.<\/p>\n<p>One final thought:  This is also a big deal because while it takes massive computation to <em>create<\/em> a recognizer like DeepVariant, it is trivial to <em>use<\/em> that recognizer on any particular input. Variant calling in the old model takes up a <em>lot<\/em> of CPU power \u2013 which can now be turned (hopefully) to more subtle questions.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Earlier this week, Google published DeepVariant, a machine learning (ML) based tool for genomics. The software is now available on the DNANexus platform. This is kind of a big deal, and also kind of not a big deal. Does it matter? It\u2019s a big deal&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[37],"tags":[],"class_list":["post-369","post","type-post","status-publish","format-standard","hentry","category-genomics"],"_links":{"self":[{"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/posts\/369","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/comments?post=369"}],"version-history":[{"count":5,"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/posts\/369\/revisions"}],"predecessor-version":[{"id":1145,"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/posts\/369\/revisions\/1145"}],"wp:attachment":[{"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/media?parent=369"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/categories?post=369"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/tags?post=369"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}