{"id":507,"date":"2019-02-20T04:53:25","date_gmt":"2019-02-20T09:53:25","guid":{"rendered":"https:\/\/dwan.org\/?p=507"},"modified":"2019-10-25T11:30:55","modified_gmt":"2019-10-25T15:30:55","slug":"biology-is-weird","status":"publish","type":"post","link":"https:\/\/dwan.org\/index.php\/2019\/02\/20\/biology-is-weird\/","title":{"rendered":"Biology is weird"},"content":{"rendered":"<p>Biology is weird. The data are weird, not least because models evolve rapidly. Today\u2019s textbook headline is tomorrow\u2019s \u201cin some cases,\u201d and next year\u2019s \u201cwe used to think.\u201d<\/p>\n<p>It can be hard for non-biologists, particularly tech\/math\/algorithm\/data science\/machine learning\/AI folks, to really internalize the level of weirdness and uncertainty encoded in biological data.<\/p>\n<p>It is not, contrary to what you have read, anything like the software you\u2019ve worked with in the past.\u00a0 More on that later.<\/p>\n<p>This post is a call for humility among my fellow math \/ computer science \/ programmer type people.\u00a0 Relax, roll with it, listen first, come up to speed. Have a coffee with a biologist before yammering about how you\u2019re the first smart person to arrive in their field. You\u2019ll learn something. You\u2019ll also save everybody a bit of time cleaning up your mess.<\/p>\n<h1>References<\/h1>\n<p>Don\u2019t be the person who walks into a research group meeting carrying a half read copy of \u201cGenome\u201d by Matt Ridley, spouting off about how all you need is to get TensorFlow running on some cloud instances under Lambda and you\u2019re gonna cure cancer.<\/p>\n<p>This is not to speak ill of \u201cGenome,\u201d it\u2019s a great book, and I\u2019m super glad that lots of people have read it \u2013 but it no more qualifies you to do the heavy lifting of genomic biology than Lisa Randall\u2019s popular press books prepare you for the mathematical work of quantum physics.<\/p>\n<p>You\u2019ll get more cred with a humble attitude and a well thumbed copy of \u201cLife Ascending\u201d by Nick Lane. For full points, keep Horace Judson\u2019s \u201cThe Eighth Day of Creation\u201d on the shelf.\u00a0 Mine rests between Brooks\u2019 \u201cThe Mythical Man Month\u201d and \u201cPersonality\u201d by Daniel Nettle.<\/p>\n<h1>The More Things Change<\/h1>\n<p>Back in 2001, the human genome project was wrapping up.\u00a0 One of the big questions of the day was how many genes we would find in the completed genome.\u00a0 First, set aside the important but fundamentally un-answerable question of what, exactly, constitutes a gene.\u00a0 Taking a simplistic and uncontroversial definition, I recall a plurality of well informed people who put the expected total between 100,000 and 200,000.<\/p>\n<p>The answer?\u00a0 Maybe a third to a sixth of that.\u00a0 The private sector effort, <a href=\"http:\/\/science.sciencemag.org\/content\/291\/5507\/1304\">published in Science<\/a>, reported an optimistically specific 26,588 genes.\u00a0 The public effort, <a href=\"https:\/\/www.nature.com\/articles\/35057062\">published in Nature<\/a>, reported a satisfyingly broad 30,000 to 40,000.\u00a0<\/p>\n<p>There was a collective \u201chuh,\u201d followed by the sound of hundreds of computational biologists making strong coffee.\u00a0<\/p>\n<p>This happens all the time in Biology. We finally get enough data to know that we\u2019ve been holding the old data upside down and backwards.<\/p>\n<p>The fundamental dogma of information flow from DNA to RNA to Protein seems brittle and stodgy when confronted with retroviruses, and honestly a bit quaint in the days of CRISPR.\u00a0 I\u2019ve lost count of the number of lower-case modifiers we have to put on the supposedly inert \u201cmessenger molecule\u201d RNA to indicate its various regulatory or even directly bio-active roles in the cell.<\/p>\n<p>Biologists with a few years under their belt are used to taking every observation and dataset with a grain of salt, to constantly going back to basics, and to sighing and making still more coffee when some respected colleague points out that that thing \u2026 well \u2026 it\u2019s different than we expected.<\/p>\n<p>So no, you\u2019re not going to \u201ccure cancer\u201d by being the first smart person to try applying math to Biology.\u00a0 But you -do- have an opportunity to join a very long line of well meaning smart people who wasted a bunch of time finding subtle patterns in our misunderstandings rather than doing the work of biology, which is to interrogate the underlying systems themselves.<\/p>\n<h1>Models<\/h1>\n<p>To this day, whenever I look at gene expression pathways I think: \u201cIf I saw this crap in a code review, I would send the whole team home for fear of losing my temper.\u201d<\/p>\n<p>My first exposure to bioinformatics was via a seminar series at the University of Michigan in the late 90\u2019s. Up to that point, I had studied mostly computer science and artificial intelligence. I was used to working with human-designed systems. While these systems sometimes exhibited unexpected and downright odd behaviors, it was safe to assume that a plan had, at some point, existed. Some human or group of humans had put the pieces of the system together in a way that made sense to them.<\/p>\n<p>To my eye, gene expression pathways look contrived. It\u2019s all a bit Rube Goldberg down there, with complex and interlocking networks of promotion and inhibition between things with simple names derived from the names of famous professors (and their pets).\u00a0<\/p>\n<p>My design sensibilities keep wanting to point out that there is <em>no way<\/em> that this mess is how we work, that this thing needs a solid refactor, and that \u2026 dammit \u2026 where\u2019s the coffee?<\/p>\n<p>It gets worse when you move from example to example and keep finding that these systems overlap and repeat in the most maddening way. It\u2019s like the very worst sort of spaghetti code, where some crazy global variable serves as the index for a whole bunch of loops in semi-independent pieces of the system, all running in parallel, with an imperfect copy paste as the fundamental unit of editing.<\/p>\n<p>This is what happens when we apply engineering principles to understanding a system that was never engineered in the first place.<\/p>\n<p>Those of us who trained up on human designed systems apply those same subconscious biases that show us a face in the shadows of the moon. We\u2019re frustrated when the underlying model is not based on noses and eyes but rather craters and ridges. We go deep on the latest algorithm or compute system \u2013 thinking that surely there\u2019s reason and order and logic if only we dig deep enough.<\/p>\n<p>Biologists roll with it.\u00a0<\/p>\n<p>They also laugh, stay humble, and drink lots of coffee.<\/p>\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Biology is weird. The data are weird, not least because models evolve rapidly. Today\u2019s textbook headline is tomorrow\u2019s \u201cin some cases,\u201d and next year\u2019s \u201cwe used to think.\u201d It can be hard for non-biologists, particularly tech\/math\/algorithm\/data science\/machine learning\/AI folks, to really internalize the level of&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[37],"tags":[],"class_list":["post-507","post","type-post","status-publish","format-standard","hentry","category-genomics"],"_links":{"self":[{"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/posts\/507","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/comments?post=507"}],"version-history":[{"count":12,"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/posts\/507\/revisions"}],"predecessor-version":[{"id":1131,"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/posts\/507\/revisions\/1131"}],"wp:attachment":[{"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/media?parent=507"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/categories?post=507"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dwan.org\/index.php\/wp-json\/wp\/v2\/tags?post=507"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}