Why are we interested in patterns?

Introductie tot Bioconductor in R

James Chapman

Curriculum Manager, DataCamp

sequence patterns

Introductie tot Bioconductor in R

What can we find with patterns?

  • Gene start
  • Protein end
  • Regions that enhance or silence gene expression
  • Conserved regions between organisms
  • Genetic variation
Introductie tot Bioconductor in R

Pattern matching

  • Biostrings provides functions for pattern matching

  • matchPattern(pattern, subject)

    • 1 string to 1 string
  • vmatchPattern(pattern, subject)

    • 1 set of strings to 1 string
    • 1 string to a set of strings
Introductie tot Bioconductor in R

Palindromes

never odd or even

findPalindromes() # find palindromic regions in a single sequence
Introductie tot Bioconductor in R

Not new biology

  • The Genetic code was first described by Nirenberg in 1963 On the coding of genetic information Nirenberg, Marshall et al. Cold Spring Harb Symp Quant Biol 1963, 28

  • How translation might differ according to the reading frame, was first described by Streisinger in 1966 Frameshift Mutations and the Genetic Code Streisinger, George et al. Cold Spring Harb Symp Quant Biol 1966, 31: 77-84

Introductie tot Bioconductor in R
# Original dna sequence
[1]    30 ACATGGGCCTACCATGGGAGCTACGAAGCC
# 6 possible reading frames, DNAStringSet
[1]    30 ACATGGGCCTACCATGGGAGCTACGAAGCC             + 1     
[2]    30 GGCTTCGTAGCTCCCATGGTAGGCCCATGT             - 1     
[3]    29  CATGGGCCTACCATGGGAGCTACGAAGCC             + 2     
[4]    29  GCTTCGTAGCTCCCATGGTAGGCCCATGT             - 2     
[5]    28   ATGGGCCTACCATGGGAGCTACGAAGCC             + 3     
[6]    28   CTTCGTAGCTCCCATGGTAGGCCCATGT             - 3
# 6 possible translations, AAStringSet
[1]    10 TWAYHGSYEA                                 + 1     
[2]    10 GFVAPMVGPC                                 - 1     
[3]     9 HGPTMGATK                                  + 2
[4]     9 AS*LPW*AH                                  - 2
[5]     9 MGLPWELRS                                  + 3
[6]     9 LRSSHGRPM                                  - 3
Introductie tot Bioconductor in R

Conserved regions in the Zika virus

zika blocks Adapted figure From Mosquitos to Humans: Genetic Evolution of Zika Virus Wang, Lulan et al. Cell Host & Microbe 2016, Vol 19 5: 561-565

Facts

  • The Zika Virus has a positive strand genome
  • It lives in humans, monkeys, and mosquitoes
  • The Flaviviruses family and share 11 conserved proteins
Introductie tot Bioconductor in R

Let's practice finding patterns!

Introductie tot Bioconductor in R

Preparing Video For Download...