Genetics, Genomics, and the GI Doctor

Genome Fanboy

I’m a personalized medicine and genomics nerd and fanboy. Have been for years.

Back in 2008, I did 23andme. I immediately downloaded my raw data and began to tinker (as one does).

I poked and prodded at the text files they provided – and downloaded updated versions of that “same” data from time to time. My reported genotypes changed as their processes matured. Being a nerd, I wrote a horrible little PERL script to compare the versions.

I pulled down my raw data this morning and used that same PERL script (unchanged since 2009) to compare it with the original file I downloaded in 2008. Much to my surprise – it worked:



religion:genome cdwan$ ./compare.pl \

genome_Christopher_Dwan_20080407151835.txt \

genome_Christopher_Dwan_v1_v2_v3_Full_20170926071925.txt
Set 1:  576,105 calls

Set 2:  1,001,428 calls
Set 1 but not set 2:  608.  Set 1 in both:  575,497 total: 576,105

Set 2 but not set 1:  425,931.  Set 2 in both:  575,497 total:  1,001,428

No call on both: 290 Same call on both: 556,181 Double letter to same single letter: 16,028 no call on set 1, but call on set 2: 2,848 no call on set 2, but call on set 1: 10 Different calls: 140

What does that mean? Today’s version of their assay gives information on about a million locations on the genome – a little bit less than double the 2008 figure. The vast majority of those reported genotypes (“calls”) have remained constant over time. 99.3% are exactly the same (though with formatting changes) from 2008 to today. Only 140 (0.02%) have changed.

What does it mean?

Mike Cariaso and I were working together at the time that 23andme came out. Driven by the same curiosity, he started a nights-and-weekends project that he called SNPedia. Mike used software developed by the WikiMedia foundation, combined with some PERL scripts of his own, plus a lot of old fashioned, “stay up late and find a way,” to scrape through and cross reference resources like the publication abstracts available in PubMed, the genotype frequency tables of HapMap and others.

He made a wiki page for every genotype mentioned anywhere, and started to accumulate information on pages where various sources seemed to be talking about the same thing. SNPedia is a glorious example of the hell of data skew that is the lot in life of the practicing bioinformatician. It takes vast amounts of work just to answer the question, “are these two observations about the same location on the genome, or not?” Heaven help you if you want to consider complex questions of inheritance.

Along the way, we had many conversations about what constituted an “interesting” genotype. Was it rare? Was it mentioned in lots of articles? Was it mentioned in an article that was cited by lots of articles? Was it implicated in some horrible disease or in some desirable phenotype? Did people keep changing their minds about it? Eventually Mike codified his (and other people’s) thinking on these matters into a tool that he called Promethease. Over the years, it’s grown into a really impressive analysis tool that takes genotype data as input and produces an aggregate report that brings the “interesting bits” to the top – for various definitions of interesting.

I re-ran Promethease on my updated 23andme data this morning, and it’s come a really long way since those early days. If you’re into this sort of thing, it’s worth checking out.

For what it’s worth, I seem to be relatively typical. Of course, by this point in my life I would probably have figured out by other less nerdy means if I carried one of the the better known mutations.

I also got into the whole participatory genomics thing. Some other morning in 2008, Mike and I trooped over to an office in Maryland to enroll in the Coriell personalized medicine collaborative. We signed all sorts of consents and drooled into tubes and then went home. Since then, Coriell continues to win a place in my heart by, from time to time, emailing me to let me know that my data was used in some study or other. On occasion, they email me that my genotype is -actionable-.

It turns out that I’m much more motivated by the idea of active participation than I am by micro-payments or even promises of strong privacy protections.

Because of that, I enrolled in the Personal Genome Project when they opened it up back in 2010. When 23andme briefly offered an Exome service in 2012, I bought it, and uploaded it to the PGP. As with Coriell, I get the occasional email showing me where my data was put to use.

I will probably wind up signing up for Arivale, out of the same motivation to -know- and -use- this technology.

Why does this come up today?

It comes up because I saw a GI doctor today, and he was working with almost zero data about me. He didn’t even have access to the family history that my primary care physician had taken. His baseline in deciding how to proceed with me was based on my age (“a bit young to be screened for colon cancer, according to the new national standards”), and my general physical appearance (“are you a runner? You look fit!”).

When I mentioned that my mother’s father had died of metastatic colon cancer, that various GI disorders run in my father’s family, and that I have some pretty good data indicating that no matter how fit I look – that I’m not exactly at the statistical mean for this disease – it was a complete surprise to him.

He suggested that we run the same labs that my primary care doctor had run six weeks prior that had led to this appointment. He had literally no idea that those labs had already been run.

Then again, how would he know? My primary care doctor is in a different practice, probably using a different electronic medical record system. There is no financial incentive anywhere in either one of their practices, nor in the system of insurance and payment, that would encourage them to share that data electronically.

I’m generally very, very fortunate in matters of health. I’m an engineer who has been working in or around genomics for almost 20 years. I understand heritability, genetic penetrance, and the interplay of genomic and lifestyle risk factors. I eat mostly vegetarian. I watch my weight. I get cardiovascular exercise on the regular. I’ve also had the resources and the curiosity to pay out of pocket for all kinds of critical data about my body.

I’ve got the resources and the energy to insist, and I’ve got communication skills honed by working with teams to design complicated systems over years and years. I was able to convince him that the fact that I’m a runner is probably less relevant than my family history. I was able to share fresh laboratory data and prevent yet another iteration of office visits to review redundant tests. We didn’t even get into my genotype on RS2273535, RS6983267, and RS7903146 – though I think that those are also germane.

My doctor was working at a handicap in terms of treating me. I was able to overcome that handicap because I carry a device with access to PDFs and summaries of my medical records in my pocket, because I take an active hand in my health, and because I have the time and the energy to nerd out about both health and statistics.

Most people are not so lucky. They would have gone home and told their family that, whatever their primary care provider had thought, the specialist doc said they looked healthy. Their screening criteria would have been based on population aggregates – and uninformed ones at that.

I’ve been working in this field for nearly two decades, preaching the gospel of data driven medicine, and I say that this is pathetic. We must give physicians the data driven tools that they need to practice effective medicine.

We must do better, and we must do it starting today.

Genomics and Medicine

2 Comments

September 20, 2017

First steps in a data strategy for science

I had the opportunity to write a guest blog post for Elastifile. It’s about the fact that a lot of the data in the life sciences is housed on big NFS fileservers. It has been challenging to shift our workflows, which rely both on the data and on the access patterns of NFS, to public clouds without massive disruption.

At least that has been my experience. As always, I’m interested in your thoughts.

Conferences and Talks, Storage

0 Comments

September 13, 2017

Identity, Equifax, and Google

I’ve been reading Who Owns the Future by Jaron Lanier. It’s a good book, and you should probably read it. It’s particularly important if you’re a person who participates in the economy – which is most of us.

Among the good points he makes is the importance of our online identity and how it must persist – stable and reliable – for many, many years. This should be on your mind because of the Equifax data breach. Critical identifying data on nearly 200 million people was apparently stolen, including social security numbers, birthdates, addresses, and so on. Basically, all the stuff that I talked about in my post about zero knowledge proofs is now known to be out in the wild.

I reacted by putting a “credit lock” on my information with all four major credit reporting agencies. You should probably do that too. It took about an hour, all online (I made zero phone calls), and cost less than $50 in total. Frankly, I’m horrified and disappointed. These companies make a living accumulating data about me, and I have to pay what amounts to protection money to get them to even make a pause in selling it.

I have the option of paying protection money on my credit rating because the major credit ratings agencies are federally regulated. There is no plausible way to opt out of their databases, but at least I can insist on a bit of a firewall.

Meditation: The data that Equifax lost is exactly and completely the data that those same credit agencies (along with every one of my credit card companies and banks) use to “verify” my identity in the event that I want to make changes – including unlocking that very same credit report. They did offer a personally identifying number (PIN) with each lock. For accounts where the lock pre-dates the breach, my bet is that the PIN went into the wild along with the other information.

I was, at least, able to register a mobile number and email address with one of the services – Transunion – so that I’ll get word when changes happen. If I was a bad actor, that’s the first thing that I would disable. Hopefully I will also get notification from my bank if someone calls up and asks to transfer my retirement accounts to some other institution. My experience with a recent rollover transaction suggests that I can do the whole thing with one phone call, with no second factor required.

Conveniently, the data that a bank might check on a drivers license is also among the data that was leaked. Fortunately, they don’t need a picture of me for the fake ID – they can use their own picture for that part.

We need something better.

Unfortunately, one major alternative on offer is going to turn out very, very badly.

That alternative, of course, is to let Google or Facebook handle identity for us. It’s already an option on many websites. The link to “sign up with a different email” is getting smaller and smaller on the signup pages. Google and Facebook provide, effectively, a Single Sign On service at no direct cost to the user.

One problem with this idea is that Google and Facebook will not remain in their current form for long enough to serve as a stable source of identity. At some point, they will change, be purchased, split up, merge, or something. Along the way, they will modify their business plans. At that point, any online services that rely on Google and Facebook for identity services stand to be disrupted.

If enough of our digital life relies corporate credentials, we will wind up regulating them. That’s how the government got into the business of roads and electrical power. Even if it doesn’t rise to that level, we all stand to lose access to a lot of our online identity and social history when social media sites undergo change and growth.

If that’s uncomfortably complicated – just consider what will happen when Google exits the business of providing free email accounts. How will you recover a lost password on the various sites where you’re using that gmail.com address?

We are already in the bad place, and the SSO thing makes it easier and far worse.

The other problem, of course, is that Google and Facebook, just like the credit reporting agencies, are not in the business of serving us as their customers. That’s why these identity services are provided at no direct cost to the user. Their primary product is information about us. They are, without putting too fine a point on it, gigantic, barely regulated, commercial spy operations. As we move from email to SSO, we move from less to more tracking – which amounts to still more data about me, all in one place, which will eventually be compromised.

The incentives and trends do not point in the right directions.

A better solution, in my opinion, would be a very lightweight bit of regulation coupled with identity solutions whose incentives are aligned with human interests rather than corporate ones. Technologies like blockchain will almost certainly play a part in this, though the simplistic solutions being floated now are premature. This should be a long, thoughtful social conversation about identity and privacy in the digital age. Anyone who tells you that they’ve already got all the answers is (a) wrong and (b) trying to make a quick buck.

Speaking of making a quick buck, we should revoke Equifax’s corporate charter and hold their business and technology leadership personally accountable for this mess. There is redundancy in the credit ratings system – I’m paying protection money to three -other- firms to moderate the amount of my data that they sell. Equifax did lasting damage to nearly 200 million of us, and they need to be made to close up shop.

Equity and inclusion, Infosec

2 Comments

September 7, 2017

Developing business

People seemed to appreciate my first post about the mechanics of starting a one-person consulting shop. This post builds on it, talking about the pattern I use as I progress interesting conversations through to closed business.

This is offered in the same spirit as that first post. Your milage will certainly vary, this just is my experience – based on what has worked well for me. I’m super curious to hear any feedback you’re willing to offer.

Conversations from possibility

After a couple of decades in the same field, I have a really solid professional network. I took my first months of independence as an opportunity to re-connect with as much of that network as I could. I filled my schedule with phone calls, Skype dates, breakfasts, coffees, lunches, and after work beers. Even though I’m an inveterate introvert, I found it really pleasant (though tiring) to re-connect with so many people.

In all honesty, it eased my transition from a highly engaged role at a full time job to a much more solitary day to day routine.

It’s important to note that I don’t approach these conversations with any kind of a sales pitch. I was really, seriously just re-connecting with friends and colleagues, letting them know that I had changed jobs. Most of my early business came from referrals from those conversations rather than having my friends hire me directly.

As my schedule has filled up with paid work, I’ve deliberately left gaps so that when someone reaches out asking if I have time for a coffee or a beer, I say “yes, how about Friday.” Those meetings are just as much a part of the job as showing up to a client site. I’m working to develop next quarter’s income, not deliver on this quarter’s.

Scoping conversations and the Nondisclosure

From time to time, one of these casual conversations will unearth the fact that there’s a match between somebody’s needs and my abilities. At this point, I usually say “this sounds really interesting. Can we put a nondisclosure agreement (NDA) in place so we can get into the details?” NDAs are closely related to CDAs, or Confidential Disclosure Agreements. The differences aren’t really relevant at this point, either suits the purpose of these conversations.

An NDA should be a very straightforward document, not more than a page or two, in which both parties agree to keep each other’s information private. The NDA should make no commitments about any particular work, nor should it get unduly detailed. The NDA should also be 100% reciprocal. I won’t blab about you, and you won’t blab about me. The point of this document is to enable further conversation. This is not the time to get fancy with the lawyering or try to exact some competitive edge.

I have an NDA template, but I usually wind up using whatever my prospective client’s attorneys suggest – provided that it meets the criteria above. Most companies are bigger and more formal than my one-person shop, so it’s easier for me to adjust than for them.

From my perspective, the primary value of the NDA is as a lightweight validation that we’re all professionals and taking the conversation seriously. It lends gravity to the conversation, and serves as a gentle on-ramp to slightly more formal meetings.

Building a Statement of Work

As a social side-note, this is the point at which the meetings usually move from coffee shops and restaurants to an actual office. There’s a clear link between where we choose to have a conversation and how serious we are about it.

The point of the post-NDA conversation is to understand enough about each other’s situations to allow a good statement of work to take shape. This is the part where, as the consultant looking for work, it’s my job to have an idea and to make a proposal.

Full disclosure: I used to experience this as the very most stressful part of the process. I assumed that proposing a statement of work was some sort of commercial imposition on what had been, to this point, a delightfully high minded conversation.

When I started out with BioTeam, the founders were constantly behind the scenes, prompting and supporting me to (as they put it), “have an idea! go get some work!” After I got good at it, a major part of my job became to provide that same sort of coaching to help new members of the team overcome the same inertia and nervousness that I had felt.

Over the years, I’ve realized that creating a statement of work is the very best part of consulting. I literally get to write my own job descriptions and (as one mentor put it) “speak solutions into being.”

I talk to interesting people, try to understand their situation, and then try to imagine something that we could do together that would help them. It takes practice to get up the nerve to say “it sounds like there’s potential here! How about this.” These days, I see it as a creative challenge rather than an imposition on the other person.

My statements of work almost always start with a “situation” section that briefly outlines the current state of affairs and scopes the reason for considering a project. I find it incredibly useful, mired in the depths of a negotiation – or even when delivering on a project – to go back and read the “situation” section that we wrote together. All too frequently, I realize that I have wandered away from the original problem that we set out to solve.

Having written the situation, I write a description of what I will do to impact or change that situation. This should be specific enough to see how it would matter, but still leave lots of space for adjustment in both scope and tasking.

The Proposal

At some point, the statement of work starts looking pretty solid. We’ve come to a shared understanding of the situation, and we’ve described things that I could reasonably do that would improve that situation.

This is where it starts to get a bit commercial. I’ve got a proposal template (it’s in Word), with sections for both “situation” and “statement of work.” It also includes a bunch of stuff about how we’ll deal with travel (whole days only, pre-approved only, customer pays for single occupancy rooms, coach class airfare, and a car if necessary), time tracking (I track it and provide detailed invoices), and payment (I invoice monthly, they pay within 30 days of receiving the invoice). There is also a section titled “about the consultant,” which reads like a mini-resume. This bit is important, because the proposal is the document that the person I’ve been talking to will shop around to their organization.

There’s also a section titled “investment.” This is the “how much will it cost,” part.

There has been a lot of ink spilled about negotiation, and still more about how to set prices. I have strong opinions on both topics. Without going too far into those opinions, I will simply say: Ask for a rate at which you will be happy to be doing this work, and be fair and open in the conversation. The real trick to negotiation is to keep the conversation going. Have a list of things you’re willing to give, and have a list of things you’re willing to accept. If the conversation stalls, offer to accept something that you know is easy for the other person to give.

Recall – you wrote the statement of work. You can edit it as you see fit. As the saying goes, negotiate scope rather than rate.

Also, do keep in mind that you’ll be setting aside half of your gross receipts for taxes. That’s a lot.

Closing the deal

Closing the deal is wildly variable. In my experience, smaller and younger organizations tend to send an email that says “yes! Proposal accepted! When can you start?” Larger shops will have a more formal process. I will leave the nuance of navigating a big commercial purchasing system for a later post. Suffice it to say, this is the point at which the documents get complex and it might be good to have a lawyer looking over your shoulder.

As I said up top – I am very interested in feedback and opinions on this stuff. This is what I know, and it works for me. What works for you?

Consulting

0 Comments

September 4, 2017

The Mechanics of Consulting

Since going independent in February, A few people have asked me about starting an independent consulting practice. This post shares some of my experiences. In order to keep it to a manageable length, I have omitted stuff like developing and maintaining relationships with clients, writing statements of work, running the actual projects, as well as the banal necessities of invoicing and collection.

If you happen to be thinking of doing something like this, please keep in mind that your milage will certainly vary. A cursory internet search about starting a company turns up lots of strong opinions – many of them written by people who seem to be selling something. I’m not an attorney, I’m not a CPA, and I do not specialize in setting up small businesses. This post describes my experiences and should be taken with a grain of salt.

With that, here are some of the mechanics:

Incorporation

I registered an LLC with the secretary of state of Massachusetts. This involves filling out a straightforward web form and costs $500. There is a $500 annual fee to keep the company active, and a $500 fee if you want to make changes to the filing (like updating an address or a name) in the middle of the year. I paid an attorney to fill out the form for me, but in hindsight it’s simple enough to do for yourself.

My understanding is that I didn’t have to register an LLC. I could also have simply started “doing business as,” myself. I registered both because I thought it was cool to own a “real” company, and also to provide a legal framework to keep my personal and my business finances separate. Despite the words “limited liability,” an LLC does not actually provide much in the way of legal protection for my assets. That sort of protection comes from consistent, audit-ready financial practices, and insurance.

The form requires a brick and mortar street address as the official location of the business. Everything else can run through a post office box, but the state wants to know where the business actually operates. Since the filing is a public document, some people might be leery of using their home address. When I started my business, I rented a mail slot from WorkBar, my local co-working facility. That let me use their address rather than my own.

You aren’t allowed to have the same name as another registered company. Attorneys can do the search for you, but google and the secretary of state’s website provide a solid first pass. Because my brand is just “me,” I used my name in the filing. There are many reasons that a person might name their company based on what they do, rather than on who they are. Based on my experience and that of several friends, either path can work.

Legal contracts always use the formal name as registered with the state. In conversation, I may refer to my business as “Dwan Consulting,” or similar variants – but on the contract it’s always “Dwan, LLC.”

Tax ID

Once the corporate registration has been accepted, you can apply for a corporate tax ID from the US government. This is another self-explanatory web form.

You will receive a PDF with your company’s tax ID, which you can use to fill out the W-9 form that all your customers will want.

Bank Accounts

With the tax ID in hand, I set up a pair of business bank accounts. I use the checking account for the vast majority of my transactions, and the other one to hold money for taxes. A good rule of thumb on taxes is to set aside half of your gross income. This is almost certainly overkill. Over time you will accumulate data that allows a more accurate number.

Credit Card

In addition to the bank accounts, I use an American Express card for my business expenses. The one I use has an annual fee and comes with benefits that in my opinion make it worth the money. The benefits that I like the most are passes for in-flight internet (I used one in writing this post!), access to airport priority lounges (snacks, comfy chairs, good wifi, and ample power outlets), upgraded memberships in car rental and airline upgrade programs, and so on. There are tons of non-billable expenses associated with business travel, so this works for me. As mentioned above, your milage will vary.

Since I set up the tax ID number with the IRS, my company can have its very own credit history. This will come in handy if I ever decide that I need a business loan.

Insurance

Some of your customers will insist that you carry insurance. Whether they do or not, it’s a really good idea to sign up for a policy. I was surprised at how little it cost for me to have remarkable amount of coverage.

There are two primary kinds of policy that are of interest to the independent consultant. I have both, and it costs me about $100/month:

General Liability: This is insurance in the event that I directly cause damage or losses to my customers. For example, if I spill coffee into their precious server and destroy their data, that’s general liability.
Errors and Omissions or Professional Liability: This is insurance in the event that I do not directly cause the damage, but my advice and service are bad enough that when the accident happens – it’s obviously my fault. If I wrote the design that specified that the office coffee pot should be located directly above the precious server, that would fall under “errors and omissions.”

Boilerplate Legal Agreements

I had the good fortune to work with a lot of different contracts and agreements during my time at Bioteam. Because of that, I was confident enough to distill several example documents into templates for Nondisclosure, Proposal, and Consulting agreements. If you don’t have that background, this is a place where I would suggest hiring an attorney to be sure that you understand the terms and conditions under which you intend to do business. This is particularly true if you intend to write software or generate intellectual property.

Accounting

I use Quickbooks Online. It’s about $50/month, which comes with bank account integration and time tracking. I like the time tracking feature a lot, since I can create invoices that show, day to day, when I was on the clock. I make a practice of recording my time in the system every day. Time tracking software like Harvest is also quite good in this regard. There are other features of Quickbooks that I haven’t needed yet, like payroll and direct deposit.

Taxes

Taxes are a big deal when you’re out on your own. You are responsible for both the employer and the employee portion of social security and medicare tax. You are also responsible to file quarterly pre-payments. Even though it’s not technically required in your first year of operation, it’s a good idea.

As mentioned above, a good rule of thumb when starting out is that you should set aside 50% of all the money that you get paid. That will feel really uncomfortably high, but it’s way better to over than to under estimate. Over time, you will get a feel for your actual effective tax rate.

I pay an accountant to do my taxes at the end of the year. We usually sit down and go through quickbooks together so that he’s confident that he understands my system, and then he does the rest.

It’s worth noting that a single member LLC like mine is something that the IRS refers to as a “disregarded entity.” The government, doesn’t care if I move my money around between my own pockets. This means that even though it’s a big deal to me when I pay myself, it doesn’t matter at tax time whether I moved money from my business checking to my personal accounts. It’s all the same filing.

Expenses and Deductions

There are lots of good references out there about what you can and cannot deduct as business expenses. I tend to err on the low side, only using the business accounts where it’s really unambiguous that the only reason I’m spending the money is to support my business. I know people who are more aggressive on that front. It seems to work out okay for them.

And that’s it! Even though there are plenty of moving parts, it’s really not all that complex to set up and run a sole proprietorship.

I’m interested in whether this post was interesting or useful to you. Please leave comments or shoot me an email.