Mengonversi data tidak terstruktur ke DataFrame

Pemrograman Paralel dengan Dask di Python

James Fulton

Climate Informatics Researcher

Data JSON bertingkat

  • Di dalam berkas JSON contoh, example_0.json
{"name": "Beth", "employment": [{"role": "manager", "start_date": ...}, ...], ...}
{"name": "Omar", "employment": [{"role": "analyst", "start_date": ...}, ...], ...}
{"name": "Fang", "employment": [{"role": "engineer", "start_date": ...}, ...], ...}
...
Pemrograman Paralel dengan Dask di Python

Restrukturisasi dictionary

def add_number_of_jobs(employee_dict):
    employee_dict['number_of_previous_jobs'] = len(employee_dict['employment'])
    return employee_dict

    dict_bag = dict_bag.map(add_number_of_jobs)
Pemrograman Paralel dengan Dask di Python

Menghapus bagian dictionary

def delete_dictionary_entry(dictionary, key_to_drop):
    del dictionary[key_to_drop]
    return dictionary

    dict_bag = dict_bag.map(delete_dictionary_entry, key_to_drop='employment')
Pemrograman Paralel dengan Dask di Python

Memilih bagian dictionary

def filter_dictionary(dictionary, keys_to_keep):
    new_dict = {}
    for k in keys_to_keep:
        new_dict[k] = dictionary[k]
    return new_dict

    dict_bag = dict_bag.map(
    filter_dictionary, 
    keys_to_keep=['name', 'number_of_previous_jobs']
)
Pemrograman Paralel dengan Dask di Python

Mengonversi ke DataFrame

print(dict_bag.take(1))
({'name': 'Beth',
  'number_of_previous_jobs': 3},)
converted_bag_df = dict_bag.to_dataframe()

print(converted_bag_df)
                 name    number_of_previous_jobs    
npartitions=3
               object                    float64
                  ...                        ...
Pemrograman Paralel dengan Dask di Python

Ayo berlatih!

Pemrograman Paralel dengan Dask di Python

Preparing Video For Download...