Unicode and pattern matching

String Manipulation with stringr in R

Charlotte Wickham

Assistant Professor at Oregon State University

Unicode

  • Associates each character with a code point
Character Code point
 
String Manipulation with stringr in R

Unicode

  • Associates each character with a code point
Character Code point
a 61
String Manipulation with stringr in R

Unicode

  • Associates each character with a code point
Character Code point
a 61
μ 3BC
String Manipulation with stringr in R

Unicode

  • Associates each character with a code point
Character Code point
a 61
μ 3BC
Screen Shot 2020-11-06 at 15.34.58.png 1F600
String Manipulation with stringr in R

Unicode in R

"\u03BC"

μ

"\U03BC"

μ

writeLines("\U0001F44F")

👏

String Manipulation with stringr in R

Unicode in R

unicode_code

String Manipulation with stringr in R

Matching Unicode

String Manipulation with stringr in R

Matching Unicode groups

  • Regular expression
    • Use \p followed by {name}
  • rebus
    • str_view_all(x, greek_and_coptic())

chap_4_3.019.png

  • ?Unicode
  • ?unicode_property
  • ?unicode_general_category
String Manipulation with stringr in R

Let's practice!

String Manipulation with stringr in R

Preparing Video For Download...