This lesson is in the early stages of development (Alpha version)

Using packages from Bioconductor

Overview

Teaching: 10 min
Exercises: 3 min
Questions
  • How do I use packages from the Bioconductor repository?

Objectives
  • Describe what the Bioconductor repository is and what it is used for

  • Describe how Bioconductor differs from CRAN

  • Search Bioconductor for relevent packages

  • Install a package from Bioconductor

Installing packages from somewhere else besides CRAN?

In some cases, you may want to use a specialized package that is not hosted on CRAN (the Comprehensive R Archive Network). This may be because the package is so new that it hasn’t yet been submitted to CRAN, or it could be that it is on a focal topic that has an alternative repository. One major example of an alternative repository source is Bioconductor, which has a mission of “promot[ing] the statistical analysis and comprehension of current and emerging high-throughput biological assays.” This means that many if not all of the packages available on Bioconductor are focused on the analysis of biological data, and that it can be a great place to look for tools to help you analyze your -omics datasets!

So how do I use it?

Since access to the Bioconductor repository is not built in to base R ‘out of the box’, there are a couple steps needed to install packages from this alternative source. We will work through the steps (only 2!) to install a package to help with the VCF analysis we are working on, but you can use the same approach to install any of the many thousands of available packages.

screenshot of bioconductor homepage

First, install the BiocManager package

The first step is to install a package that is on CRAN, BiocManager. This package will allow us to use it to install packages from Bioconductor. You can think of Bioconductor kind of like an alternative app store for your phone, except instead of apps you are installing packages, and instead of your phone it’s your local R package library.

# install the BiocManager from CRAN using the base R install.packages() function
install.packages("BiocManager")

To check if this worked (and also so you can make a note of the version for reproducibility purposes), you can run BiocManager::version() and it should give you the version number.

# to make sure it worked, check the version
BiocManager::version()

Second, install the vcfR package from Bioconductor using BiocManager

# install the vcfR package from bioconductor using BiocManager::install()
BiocManager::install("vcfR")

You may need to also allow it to install some dependencies or update installed packages in order to successfully complete the process.

Note: Installing packages from Bioconductor vs from CRAN

Some packages begin by being available only on Bioconductor, and then later move to CRAN. vcfR is one such package, which originally was only available from Bioconductor, but is currently available from CRAN. The other thing to know is that BiocManager::install() will also install packages from CRAN (it is a wrapper around install.packages() that adds some extra features). There are other benefits to using BiocManager::install() for Bioconductor packages, many of which are outlined here. In short, Bioconductor packages have a release cycle that is different from CRAN and the install() function is aware of that difference, so it helps to keep package versions in line with one another in a way that doesn’t generally happen with the base R install.packages().

Search for Bioconductor packages based on your analysis needs

While we are only focusing in this workshop on VCF analyses, there are hundreds or thousands of different types of data and analyses that bioinformaticians may want to work with. Sometimes you may get a new dataset and not know exactly where to start with analyzing or visualizing it. The Bioconductor package search view can be a great way to browse through the packages that are available.

screenshot of bioconductor search

Tip: Searching for packages on the Bioconductor website

There are several thousand packages available through the Bioconductor website. It can be a bit of a challenge to find what you want, but one helpful resource is the package search page.

In bioinformatics, there are often many different tools that can be used in a particular instance. The authors of vcfR have compiled some of them. One of those packages that is available from Bioconductor is called VariantAnnotation and may also be of interest to those working with vcf files in R.

Challenge

Add code chunks to

  • Install the BiocManager package
  • Use that package’s install() function to install vcfR
  • Browse the Bioconductor website to find a second package, and install it

Resources

Key Points

  • Bioconductor is an alternative package repository for bioinformatics packages.

  • Installing packages from Bioconductor requires a new method, since it is not compatible with the install.packages() function used for CRAN.

  • Check Bioconductor to see if there is a package relevent to your analysis before writing code yourself.