Basi delle espressioni regolari

Introduzione all'Elaborazione del Linguaggio Naturale in R

Kasey Jones

Research Data Scientist

Cos’è il Natural Language Processing?

NLP:

Argomenti:

Sequenza di caratteri per cercare nel testo
Esempi:
- cercare file in una directory da riga di comando
- trovare articoli con un certo pattern
- sostituire testo specifico
- ...

words <- c("DW-40", "Mike's Oil", "5w30", "Joe's Gas", "Unleaded", "Plus-89")

# Finding Digits
grep("\\d", words, value = TRUE)

[1] 1 3 6

# Finding Apostrophes
grep("\\'", words, value = TRUE)

[1] "Mike's Oil"     "Joe's Gasoline"

Pattern	Corrispondenze nel testo	Esempio in R	Esempio testo
\w	Alfanumerico	gregexpr(pattern ='\w', <text>)	a
\d	Qualsiasi cifra	gregexpr(pattern ='\d', text)	1
\w+	Alfanumerico di qualsiasi lunghezza	gregexpr(pattern ='\w+', text)	word
\d+	Cifre di qualsiasi lunghezza	gregexpr(pattern ='\d+', text)	1234
\s	Spazi	gregexpr(pattern ='\s', text)	' '
\S	Qualsiasi non-spazio	gregexpr(pattern ='\S', text)	word

Funzione	Scopo	Sintassi
grep	Trova corrispondenze del pattern in un vettore	grep(pattern ='\w', x = <vector>, value = F)
gsub	Sostituisce tutte le occorrenze in una stringa/vettore	gsub(pattern ='\d+', replacement = "", x = <vector>)

Ci sono varie risorse online per esercitarsi con le espressioni regolari. Approfondiscono molto e offrono molti esempi.

¹ https://regexone.com/lesson/matching_characters

Introduzione all'Elaborazione del Linguaggio Naturale in R