R ile Veri Temizleme
Maggie Matsui
Content Developer @ DataCamp
| Veri türü | Örnek değerler |
|---|---|
| Adlar | "Veronica Hopkins", "Josiah", ... |
| Telefon numaraları | "6171679912", "(868) 949-4489", ... |
| E-postalar | "[email protected]", "[email protected]", ... |
| Parolalar | "JZY46TVG8SM", "iamjosiah21", ... |
| Yorumlar/Değerlendirmeler | "great service!", "This product broke after 2 days", ... |
"6171679912" ve "(868) 949-4489""9239 5849 3712 0039" ve "4490459957881031"+1 617-167-9912 ve 617-167-9912"Veronica Hopkins" ve "Josiah""0492" telefon numarası çok kısa"19888" posta kodu yokcustomers
# A tibble: 99 x 3
name company credit_card
<chr> <chr> <chr>
1 Galena In Magna Associates 5171 5854 8986 1916
2 MacKenzie Iaculis Ltd 5128-5078-8008-5824
3 Megan Acosta Semper LLC 5502 4529 0732 1744
4 Phoebe Delacruz Sit Amet Nulla Limited 5419-7308-7424-0944
5 Jessica Pellentesque Sed Ltd 5419 2949 5508 9530
# ... with 95 more rows
str_detect(customers$credit_card, "-")
FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE ...
customers %>%
filter(str_detect(credit_card, "-"))
name company credit_card
1 MacKenzie Iaculis Ltd 5128-5078-8008-5824
2 Phoebe Delacruz Sit Amet Nulla Limited 5419-7308-7424-0944
3 Abel Lorem PC 5211-6023-0805-0217
...
customers %>%mutate(credit_card_spaces = str_replace_all(credit_card, "-", " "))
name company credit_card_spaces
1 Galena In Magna Associates 5171 5854 8986 1916
2 MacKenzie Iaculis Ltd 5128 5078 8008 5824
3 Megan Acosta Semper LLC 5502 4529 0732 1744
4 Phoebe Delacruz Sit Amet Nulla Limited 5419 7308 7424 0944
5 Jessica Pellentesque Sed Ltd 5419 2949 5508 9530
...
credit_card_clean <- customers$credit_card %>%str_remove_all("-") %>% str_remove_all(" ")customers %>% mutate(credit_card = credit_card_clean)
name company credit_card
1 Galena In Magna Associates 5171585489861916
2 MacKenzie Iaculis Ltd 5128507880085824
3 Megan Acosta Semper LLC 5502452907321744
...
str_length(customers$credit_card)
16 16 16 16 16 16 16 16 16 16 16 16 12 16 16 16 16 16 16 16 16 16 16 16 16 ...
customers %>%
filter(str_length(credit_card) != 16)
name company credit_card
1 Jerry Russell Sed Eu Company 516294099537
2 Ivor Christian Ut Tincidunt Incorporated 544571330015
3 Francesca Drake Etiam Consulting 517394144089
customers %>%
filter(str_length(credit_card) == 16)
name company credit_card
1 Galena In Magna Associates 5171585489861916
2 MacKenzie Iaculis Ltd 5128507880085824
3 Megan Acosta Semper LLC 5502452907321744
4 Phoebe Delacruz Sit Amet Nulla Limited 5419730874240944
5 Jessica Pellentesque Sed Ltd 5419294955089530
...
(, ), [, ], $, ., +, * ve diğerleristringr işlevleri düzenli ifadeler kullanırfixed() gerekir:str_detect(column, fixed("$"))
Daha fazlası için bkz. R'de stringr ile Dize İşleme ve R'de Orta Düzey Düzenli İfadeler
R ile Veri Temizleme