Sequence Ranges

Introduction to Bioconductor in R

Paula Andrea Martinez, PhD.

Data scientist

IRanges with numeric arguments

# Loading IRanges
library(IRanges)

A range is defined by start and end

myIRanges <- IRanges(start = 20, end = 30)
myIRanges
IRanges object with 1 range and 0 metadata columns:
   start       end     width
<integer> <integer> <integer>
[1]   20        30        11
Introduction to Bioconductor in R
(myIRanges_width <- IRanges(start = c(1, 20), width = c(30, 11)))
 IRanges object with 2 ranges and 0 metadata columns:
         start       end     width    
      <integer> <integer> <integer>
[1]         1        30        30       
[2]        20        30        11
(myIRanges_end <- IRanges(start = c(1, 20), end = 30))
 IRanges object with 2 ranges and 0 metadata columns:
         start       end     width    
      <integer> <integer> <integer>
[1]         1        30        30       
[2]        20        30        11

Equation: width = end - start + 1

Introduction to Bioconductor in R

Rle - run length encoding

  • Rle stands for Run length encoding
  • Computes and stores the lengths and values of a vector or factor
  • Rle is general S4 container used to save long repetitive vectors efficiently
(some_numbers <- c(3, 2, 2, 2, 3, 3, 4, 2))
3 2 2 2 3 3 4 2
(Rle(some_numbers))
numeric-Rle of length 8 with 5 runs
Lengths: 1 3 2 1 1
Values : 3 2 3 4 2
Introduction to Bioconductor in R

IRanges with logical vector

IRanges(start = c(FALSE, FALSE, TRUE, TRUE))
 IRanges object with 1 range and 0 metadata columns:
      start       end     width
  <integer> <integer> <integer>
[1]       3         4         2
Introduction to Bioconductor in R

IRanges with logical Rle

gi <- c(TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE)
myRle <- Rle(gi)
logical-Rle of length 7 with 3 runs
Lengths:     2     2     3
Values :  TRUE FALSE  TRUE
IRanges(start = myRle)
IRanges object with 2 ranges and 0 metadata columns:
         start       end     width
      <integer> <integer> <integer>
[1]         1         2         2
[2]         5         7         3
Introduction to Bioconductor in R

In summary

IRanges are hierarchical data structures can contain metadata.

To construct IRanges objects:

  • start, end, or width as numeric vectors (or NULL).
  • start argument as a logical vector or logical Rle object.
    • Rle stands for Run length encoding and is storage efficient.
    • IRanges arguments get recycled (fill in the blanks).
    • equation for sequence range: width = end - start + 1.
Introduction to Bioconductor in R

Let's practice using sequence ranges!

Introduction to Bioconductor in R

Preparing Video For Download...