Privasi Data dan Anonimisasi di Python
Rebeca Gonzalez
Instructor
# Lihat dataset
hr.head()
Age BusinessTravel Department EducationField EmployeeNumber
0 41 Travel_Rarely Sales Life Sciences 1
1 49 Travel_Frequently Research & Development Life Sciences 2
2 37 Travel_Rarely Research & Development Other 4
3 33 Travel_Frequently Research & Development Life Sciences 5
4 27 Travel_Rarely Research & Development Medical 7
Gunakan distribusi kontinu terbaik untuk data kita.

import scipy.stats# Fit distribusi genlogistic ke variabel kontinu Age params = scipy.stats.genlogistic.fit(hr['Age'])# Lihat parameter fungsi kontinu print(params)
(4.9899067653418285, 22.32808853181744, 7.046590524738551)
# Sampel dari distribusi genlogistic df['Age'] = scipy.stats.genlogistic.rvs(size=len(df.index), *params)# Lihat dataset hasilnya df['Age'].head()
Age BusinessTravel Department EducationField EmployeeNumber
0 40.767259 Travel_Rarely Sales Life Sciences 1
1 45.730504 Travel_Frequently Research & Development Life Sciences 2
2 41.910050 Travel_Rarely Research & Development Other 4
3 35.275320 Travel_Frequently Research & Development Life Sciences 5
4 40.198134 Travel_Rarely Research & Development Medical 7
# Bulatkan nilai untuk mendapatkan nilai diskret
df['Age'] = df['Age'].round()
Age BusinessTravel Department EducationField EmployeeNumber
0 41 Travel_Rarely Sales Life Sciences 1
1 46 Travel_Frequently Research & Development Life Sciences 2
2 42 Travel_Rarely Research & Development Other 4
3 35 Travel_Frequently Research & Development Life Sciences 5
4 40 Travel_Rarely Research & Development Medical 7
Privasi Data dan Anonimisasi di Python