Mobile Menu

A guide to the basics of computational biology

A very kind Reddit user under the nom de plume, PalePinkPith, has decided the current situation we find ourselves in is the perfect time for wet lab biologists to turn to the dark side and hone their computational skills. In an anonymous act of benevolence, the PhD student has compiled a list of resources they used personally to transition into computational biology.

All the resources are free and span data analysis and visualisation:

“Learning the basics of R: I personally believe R is the best language for wet lab biologists who want to get into data analysis. The numerous libraries available and accessible UI console (Rstudio) make it much more approachable than python. I also use python and can add some info if anyone specifically wants to learn it, but for the beginner biologist who is language agnostic, R is a great place to start.

R tutorials for biologists:

  1. datacamp has some very basic and advanced tutorials that will walk you through installing R, setting up your environment, managing libraries, etc.
  2. Swirl is an R library that provides tutorials on basic R syntax and statistical testing directly within the R environment. This is how I first learned the basics. start here!
  3. Datamentor: These written tutorials give a more in-depth description of the data structures and syntax of R. It is great for people who have some limited programming experience and as a companion to other tutorials.
  4. MarinStatsLecture is a youtube channel with hours of videos providing tutorials on everything from study design to plotting figures.
  5. BioConductor offers a huge list of resources (videos, github repos, slides, and books) that focus on using R for real biological data. This is a great resource for learning to use R for your specific niche topic.
  6. Rmarkdown notebooks: lab notebooks are also important in computational biology. Rmarkdown notebooks are an easy way to log your code, plot figures, and export as a PDF. This is a good tutorial to get you started with notebooks.

Example biological datasets to help you begin exploring:

Of course, learning on your own data is a productive option, but sometimes cleaning and loading data is a major hurdle. Luckily, R has a bunch of example datasets built in. Many of these are biological including elisa data of DNase, biochemical oxygen demand, growth patterns of orange trees.

In addition, the R bioinformatics suite Bioconductor has many more realistic and domain-specific datasets available from their website. e.g. NGS data, drug screens, microarrays.

Learning the basics of command line:

Not everything requires programming. Much of bioinformatics involves using software/packages that are executed on the command line. Executing these software requires a little bit of knowledge on the command line. starting with the basics (changing directories, seeing files) to more advanced shell scripts that can help automate your workflow and improve compute efficiency.

Command line / shell tutorials for biologists:

  1. The 8 most useful shell commands for data science
  2. Beginners guide to the bash terminal is a video where someone walks you through navigating the command line.
  3. Bioinformatics 101 by Hadrien Gourle is a great place to learn about the command line and about various file formats and programs used in NGS analysis.
  4. Exercises for NGS data processing by Umer Zeeshan Ijaz also NGS focused but provides some helpful tutorials that will be helpful to any domain

Data visualization and making figures:

I imagine many people’s interest in computer stuff ends at making beautiful figures. There are many ways to do this in most languages. I do most of my figure generation within the Rstudio IDE.

  1. Fundamentals of Data Visualization by Claus O. Wilke is a fantastic resource for properly visualizing quantitative information. In addition to the book, he published a github repoof all figures written in R.
  2. Columbia’s intro to Data Visualization is the course page of a class taught by Agnes Chang. All slides and readings are feely available. Some advanced visualizations are programmed in D3.js
  3. Tutorial of plotting with ggplot2 in R. I could have listed this in the R section as it provides some basic R tutorials. However, this provides all you need to start using ggplot2 to make beautiful figures, without the burden of details in the R tutorials listed above. ggplot is my favorite way of making quick, beautiful graphs.”

And, as it’s Reddit, the user comments are full of further suggestions, which depending on how long the current situation persists, you may have time to get through a brave few.

Here’s the link. The sub-reddit is r/bioinformatics.

PalePinkPith is sure to gain a lot of karma from this one


More on these topics

Bioinformatics / Resources

Share this article