{"id":507,"date":"2019-02-20T04:53:25","date_gmt":"2019-02-20T09:53:25","guid":{"rendered":"https:\/\/dwan.org\/?p=507"},"modified":"2019-10-25T11:30:55","modified_gmt":"2019-10-25T15:30:55","slug":"biology-is-weird","status":"publish","type":"post","link":"https:\/\/dwan.org\/index.php\/2019\/02\/20\/biology-is-weird\/","title":{"rendered":"Biology is weird"},"content":{"rendered":"<p>Biology is weird. The data are weird, not least because models evolve rapidly. Today&#8217;s textbook headline is tomorrow&#8217;s &#8220;in some cases,&#8221; and next year&#8217;s &#8220;we used to think.&#8221;<\/p>\n<p>It can be hard for non-biologists, particularly tech\/math\/algorithm\/data science\/machine learning\/AI folks, to really internalize the level of weirdness and uncertainty encoded in biological data.<\/p>\n<p>It is not, contrary to what you have read, anything like the software you&#8217;ve worked with in the past.&nbsp; More on that later.<\/p>\n<p>This post is a call for humility among my fellow math \/ computer science \/ programmer type people.&nbsp; Relax, roll with it, listen first, come up to speed. Have a coffee with a biologist before yammering about how you&#8217;re the first smart person to arrive in their field. You&#8217;ll learn something. You&#8217;ll also save everybody a bit of time cleaning up your mess.<\/p>\n<h1>References<\/h1>\n<p>Don&#8217;t be the person who walks into a research group meeting carrying a half read copy of &#8220;Genome&#8221; by Matt Ridley, spouting off about how all you need is to get TensorFlow running on some cloud instances under Lambda and you&#8217;re gonna cure cancer.<\/p>\n<p>This is not to speak ill of &#8220;Genome,&#8221; it&#8217;s a great book, and I&#8217;m super glad that lots of people have read it &#8211; but it no more qualifies you to do the heavy lifting of genomic biology than Lisa Randall&#8217;s popular press books prepare you for the mathematical work of quantum physics.<\/p>\n<p>You&#8217;ll get more cred with a humble attitude and a well thumbed copy of &#8220;Life Ascending&#8221; by Nick Lane. For full points, keep Horace Judson&#8217;s &#8220;The Eighth Day of Creation&#8221; on the shelf.&nbsp; Mine rests between Brooks&#8217; &#8220;The Mythical Man Month&#8221; and &#8220;Personality&#8221; by Daniel Nettle.<\/p>\n<h1>The More Things Change<\/h1>\n<p>Back in 2001, the human genome project was wrapping up.&nbsp; One of the big questions of the day was how many genes we would find in the completed genome.&nbsp; First, set aside the important but fundamentally un-answerable question of what, exactly, constitutes a gene.&nbsp; Taking a simplistic and uncontroversial definition, I recall a plurality of well informed people who put the expected total between 100,000 and 200,000.<\/p>\n<p>The answer?&nbsp; Maybe a third to a sixth of that.&nbsp; The private sector effort, <a href=\"http:\/\/science.sciencemag.org\/content\/291\/5507\/1304\">published in Science<\/a>, reported an optimistically specific 26,588 genes.&nbsp; The public effort, <a href=\"https:\/\/www.nature.com\/articles\/35057062\">published in Nature<\/a>, reported a satisfyingly broad 30,000 to 40,000.&nbsp;<\/p>\n<p>There was a collective &#8220;huh,&#8221; followed by the sound of hundreds of computational biologists making strong coffee.&nbsp;<\/p>\n<p>This happens all the time in Biology. We finally get enough data to know that we&#8217;ve been holding the old data upside down and backwards.<\/p>\n<p>The fundamental dogma of information flow from DNA to RNA to Protein seems brittle and stodgy when confronted with retroviruses, and honestly a bit quaint in the days of CRISPR.&nbsp; I&#8217;ve lost count of the number of lower-case modifiers we have to put on the supposedly inert &#8220;messenger molecule&#8221; RNA to indicate its various regulatory or even directly bio-active roles in the cell.<\/p>\n<p>Biologists with a few years under their belt are used to taking every observation and dataset with a grain of salt, to constantly going back to basics, and to sighing and making still more coffee when some respected colleague points out that that thing &#8230; well &#8230; it&#8217;s different than we expected.<\/p>\n<p>So no, you&#8217;re not going to &#8220;cure cancer&#8221; by being the first smart person to try applying math to Biology.&nbsp; But you -do- have an opportunity to join a very long line of well meaning smart people who wasted a bunch of time finding subtle patterns in our misunderstandings rather than doing the work of biology, which is to interrogate the underlying systems themselves.<\/p>\n<h1>Models<\/h1>\n<p>To this day, whenever I look at gene expression pathways I think: &#8220;If I saw this crap in a code review, I would send the whole team home for fear of losing my temper.&#8221;<\/p>\n<p>My first exposure to bioinformatics was via a seminar series at the University of Michigan in the late 90&#8217;s. Up to that point, I had studied mostly computer science and artificial intelligence. I was used to working with human-designed systems. While these systems sometimes exhibited unexpected and downright odd behaviors, it was safe to assume that a plan had, at some point, existed. Some human or group of humans had put the pieces of the system together in a way that made sense to them.<\/p>\n<p>To my eye, gene expression pathways look contrived. It&#8217;s all a bit Rube Goldberg down there, with complex and interlocking networks of promotion and inhibition between things with simple names derived from the names of famous professors (and their pets).&nbsp;<\/p>\n<p>My design sensibilities keep wanting to point out that there is <em>no way<\/em> that this mess is how we work, that this thing needs a solid refactor, and that &#8230; dammit &#8230; where&#8217;s the coffee?<\/p>\n<p>It gets worse when you move from example to example and keep finding that these systems overlap and repeat in the most maddening way. It&#8217;s like the very worst sort of spaghetti code, where some crazy global variable serves as the index for a whole bunch of loops in semi-independent pieces of the system, all running in parallel, with an imperfect copy paste as the fundamental unit of editing.<\/p>\n<p>This is what happens when we apply engineering principles to understanding a system that was never engineered in the first place.<\/p>\n<p>Those of us who trained up on human designed systems apply those same subconscious biases that show us a face in the shadows of the moon. We&#8217;re frustrated when the underlying model is not based on noses and eyes but rather craters and ridges. We go deep on the latest algorithm or compute system &#8211; thinking that surely there&#8217;s reason and order and logic if only we dig deep enough.<\/p>\n<p>Biologists roll with it.&nbsp;<\/p>\n<p>They also laugh, stay humble, and drink lots of coffee.<\/p>\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Biology is weird. The data are weird, not least because models evolve rapidly. Today&#8217;s textbook headline is tomorrow&#8217;s &#8220;in some cases,&#8221; and next year&#8217;s &#8220;we used to think.&#8221; It can be hard for non-biologists, particularly tech\/math\/algorithm\/data science\/machine learning\/AI folks, to really internalize the level of&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[37],"tags":[],"class_list":["post-507","post","type-post","status-publish","format-standard","hentry","category-genomics"],"_links":{"self":[{"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/posts\/507","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/comments?post=507"}],"version-history":[{"count":12,"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/posts\/507\/revisions"}],"predecessor-version":[{"id":1131,"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/posts\/507\/revisions\/1131"}],"wp:attachment":[{"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/media?parent=507"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/categories?post=507"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/tags?post=507"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}