How do I use the Handbook?

First of all do note the search box on the top left. Once you are more familiar with site the search becomes the simplest way to find what you are looking for.

What is currently covered in the book?

The Handbook is divided into the following sections. We cover both the foundations and their applications to realistic data analysis scenarios.

1. Bioinformatics foundations

  • Data formats and repositories.
  • Sequence alignments.
  • Data visualization.
  • Unix command line usage.

2. Bioinformatics data analysis protocols

  • Genome variation and SNP calling.
  • RNA-seq and gene expression analysis
  • Genome Assembly (coming in 2017)
  • Metagenomics (coming in 2017)
  • ChIP-Seq analysis (coming in 2017)

3. Software tool usage

  • Using short read aligners
  • Using quality control tools
  • Manipulating sequence data

The table of contents on the left allows you to jump to the corresponding sections.

Should all life scientists understand how bioinformatics operates?


The results of bioinformatic analyses are relevant for most areas of study inside the life sciences. Even if a scientist isn't performing the analysis themselves, they need to be familiar with how bioinformatics operates so they can accurately interpret and incorporate the findings of bioinformaticians into their work. All scientists informing their research with bioinformatic insights should understand how it works by studying its principles, methods, and limitations––the majority of which is available for you in this Handbook.

We believe that this book is of great utility even for those that don't plan to run the analysis themselves.

Was the book designed to be read from top to bottom?

This book follows a curricula that teaches practical data analysis for life scientists. We gradually introduce concepts and chapters tend to build on information covered before. For newcomers following top bottom might be the best approach. Yet not all chapters need to be followed in order -- readers may jump ahead to any topic of interest.

Is there a theme to the book?

The book explains most concepts through the task of analyzing the genomic data obtained from the 2014 Ebola virus outbreak in Africa. The data representing 99 sequenced Ebola virus genomes published in the scientific article Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak is used to demonstrate the data processing and data analysis tasks that a scientist might need to undertake.

What type of computer is required?

All tools and methods presented in this book have been tested and will run on all three major operating systems: MacOS, Linux and Windows 10. See the Computer Setup page.

macOS windows linux

For best results Windows 10 users will need to join the Windows Insider program (a free service offered by Microsoft) that will allow them to install the newest release of "Bash Unix for Windows."

Is there data distributed with the book?

Yes, we have a separate data site at Various chapters will refer to content distributed from this site.

Who is the Handbook for?

The Biostar Handbook provides training and practical instructions for students and scientists interested in data analysis methodologies of genome-related studies. Our goal is to enable readers to perform analyses on data obtained from high throughput DNA sequencing instruments.

All of the Handbook's content is designed to be simple, brief, and geared towards practical application.

Is bioinformatics challenging to learn?

Bioinformatics engages the distinct fields of biology, computer science, and statistical data analysis. Practitioners must navigate the various philosophies, terminologies, and research priorities of these three domains of science while keeping up with the ongoing advances of each.

Its position at the intersection of these fields might make bioinformatics more challenging than other scientific subdisciplines, but it also means that you're exploring the frontiers of scientific knowledge, and few things are more rewarding than that!

Can I learn bioinformatics from this book?


The questions and answers in the Handbook have been carefully selected to provide you with steady, progressive, accumulating levels of knowledge. Think of each question/answer pair as a small, well-defined unit of instruction that builds on the previous ones.

  • Reading this book will teach you what bioinformatics is all about.
  • Running the code will teach you the skills you need to perform the analyses.

How long will it take me to learn bioinformatics from this book?

About 100 hours.

Of course, a more accurate answer depends on your background preparation, and each person's is different. Prior training in at least one of the three fields that bioinformatics builds upon (biology, computer science, and data analysis) is recommended. The time required to master all skills also depends on how you plan to use them. Solving larger and more complex data problems will require greater skills, which need more time to develop fully.

That being said, based on several years of evaluating trainees in the field, we have come to believe that an active student would be able to perform publication quality analyses after dedicating about 100 hours of study. This is what this book is really about -- to help you put those 100 hours to good use.

