The Dark Secret of Bioinformatics

Last week, I gave a presentation on my current research. My brief introduction ended with comments that are a mantra to many grad students: “…which will hopefully be my dissertation project.” The talk went well enough—the audience was just other grad students—and I got a lot of good feedback. At the start of the question-and-answer period, one of the girls raised her hand and asked, “Do you have a hypothesis?”

Read the rest of this entry »

Bioinformatics Research

I have a lot of anxiety over what I’m going to do after I finish grad school. Even though I’m only starting my second year, I’ve still spent some time browsing job listings online at sites like Nature Jobs. This is not necessarily representative because in my field, I think a lot of recruiting is done at conferences and via word-of-mouth. In fact, I think it’s only fueled my anxiety because most of the bioinformatics lobs listed are for coders and technicians — people with Bachelors and Masters degrees — which is not what I want to do.

Strangely, it took me a while to figure out that rather than being paranoid about limited opportunities available to me (as listed online), I need to position myself as a bioinformatics researcher as opposed to a mere technician. This is similar to the difference in the software industry between a coder and an engineer. What I need to be doing, I realized, is not just taking one bioinformatics course and farting around on my computer. I need to be developing my toolbox so that I have a variety of ways of solving new problems rather than just getting headaches while trying to reinvent the computer science wheel. (Much of the foundation of computer science was set out years ago by people who are much smarter than me, so I couldn’t possibly recapitulate it all.)

This curriculum was driven home recently by an editorial in the journal Bioinformatics by Pavel Pevzner, a name I recognized from browsing bioinformatics books on Amazon.com. Pevzner’s argument is that undergraduates in molecular biology need to take an introductory algorithms class, but his broader point is that biologists need to understand the logic behind bioinformatics and not just “cookbook” their way thru computational biology.

The bioinformatics class that I’m currently taking does a good job of explaining the algorithms that are used. The students in there certainly understand sequence alignments better than most practicing biologists. The professor is not just giving us a recipe, he takes us through the algorithms. The problem for my educational goals is that, out of necessity and aim, he only hits a few algorithms that are currently employed. He can’t build the toolkit that I want to have. He does a good job of explaining the current state of the science, but he’s not preparing us to advance that state. It’s just beyond the scope of his course.

The class is a good starting point, a solid foundation that I have already applied to my research. But from here, I definitely want to take at least a basic algorithms class and I’d like to take something more advanced, too. But I also want to take some high level statistics and math courses, too, because a lot of bioinformatics is statistical modeling and such, so I need that background as well. The only problem is that I don’t want to have to subject myself to taking a slew of undergraduate classes if I can avoid it. So maybe I’ll just audit all of the class, we’ll see. Certainly having this plan has alleviated much of my anxiety.

Second Rotation Research

I�m through the first version of my bioinformatics program for my first rotation. The results are disappointing, but you can take a look at some of them here. (You�ll need the Chime browser plug-in.) One problem lies in trying to calculate distances in three-dimensional space between a nucleotide and an amino acid that is hydrogen-bonded to it. Another problem is trying to figure out possible orientations for the amino acid that would still satisfy distance and H-bond requirements. I did some thinking on these topics yesterday, made some sketches, and then tried to do some research on 3D math on the internet. It turns out a lot of this stuff has already been worked out — a lot of it by people who do computer animation and graphics — so I don�t have the reinvent the wheel. The bad part is that it�s all horribly horribly complex. And involves vector math. Which I don�t remember. So I think I�m going to do a little more fine tuning on this algorithm, which should make it more accurate — and hopefully yield some interesting data — before moving on to other approaches that, again, will hopefully give me some really interesting results.

Second Rotation

I’ve started work on my second rotation in Andy Ellington’s lab. It’s a bioinformatics problem, so I’ve had a crash course in Perl before I started writing any programs. But so far it’s progressing very nicely, and Andy is excited about it. I hope that before too long, I’ll be able to put the keyboard down and start doing some bench work to verify the computational work.

Previous people in the lab — mainly an undergrad who is now at Cambridge on a Marshall Scholarship — developed a database of amino acid-nucleotide interactions called “AANT”. I’ll spare you the details on how this was done, but there is a Nucleic Acids Research paper on it. Now I’m going back through the data looking for nucleotide-amino acid interactions that are at approximately the same distance and orientation as other interactions. I think ultimately Andy would like to be able to intelligently design aptamers (RNA molecules that bind other molecules) to bind proteins rather than having to screen for them, which can take months.