Privasi Data dan Anonimisasi di Python
Rebeca Gonzalez
Data engineer
PII non-sensitif juga bisa menjadi quasi-identifier.

Jika digabung dengan quasi-identifier lain, ini menjadi mengidentifikasi.


Teknik anonimisasi untuk quasi-identifier guna menurunkan risiko pengungkapan data.

DataFrame asli
name phone
0 Cassandra Nelson 4399406975395
1 Brian Moss 0389407128613
2 Melody Gill 8283308773967
3 Sandra Huber 4366608954250
4 Patricia Webster 4466462475574
DataFrame dengan nama dimasking
name phone
0 xxxx 4399406975395
1 xxxx 0389407128613
2 xxxx 8283308773967
3 xxxx 4366608954250
4 xxxx 4466462475574
Mengganti nilai sensitif dengan karakter lain seperti "x" disebut masking data.
# Explore the DataFrame
df.head()
country card_number email
0 Finland 3546746666030419 [email protected]
1 Belarus 4303032415762821 [email protected]
2 Turkmenistan 4536883671157 [email protected]
3 Puerto Rico 3568819286614160 [email protected]
4 Angola 2514167462583016 [email protected]
# Uniformly mask the card number colum df['card_number'] = '****'# See resulting DataFrame df.head()
country card_number email
0 Finland **** [email protected]
1 Belarus **** [email protected]
2 Turkmenistan **** [email protected]
3 Puerto Rico **** [email protected]
4 Angola **** [email protected]

# Mask username from email df2['email'] = df2['email'].apply(lambda s: s[0] + '****' + s[s.find('@'):] )# See the resulting pseudonymized data df2.head()
country card_number email
0 Finland 3546746666030419 f****@gmail.com
1 Belarus 4303032415762821 m****@gmail.com
2 Turkmenistan 4536883671157 a****@gmail.com
3 Puerto Rico 3568819286614160 k****@gmail.com
4 Angola 2514167462583016 d****@gmail.com

# Import Faker class from faker import Faker# Create fake data generator fake_data = Faker()# Generate a credit card number fake_data.credit_card_number()
3542216874440804
# Mask nomor kartu dengan data baru menggunakan fungsi lambda df['card_number'] = df['card_number'].apply(lambda x: fake_data.credit_card_number())# Lihat data tersamar hasilnya df.head()
country card_number email
0 Finland 3596625386355448 [email protected]
1 Belarus 376297265347524 [email protected]
2 Turkmenistan 4377494880888682 [email protected]
3 Puerto Rico 30553931809810 [email protected]
4 Angola 4241735748382 [email protected]
fake_data.name()
'Kelly Clark'
fake_data.name_male()
'Antonio Henderson'
fake_data.name_female()
'Jennifer Ortega'
Privasi Data dan Anonimisasi di Python