Developing Machine Learning Models for Production
Sinan Ozdemir
Data Scientist, Entrepreneur, and Author
Allows us to establish processes for evaluating the quality of our data.
This also offers other benefits:
A structure that describes the organization of data.
For a relational database schema:
Database key | Data type | Data order |
---|---|---|
Person.name |
string |
nominal |
Person.survey_score |
integer |
ordinal |
Documenting how we labeled our response variable enhances:
Reproducibility of the training pipeline.
Model reliability through label quality.
Model performance through label improvement.
Labeling methods can evolve over time.
A visual representation of the different steps involved in building your machine learning model.
This often includes:
Documenting the process of experimentation and selection of the best model includes documenting:
To document our training environment, we should include:
scikit-learn==1.1.3
).Why?
Developing Machine Learning Models for Production