Writing effective ML documentation

Developing Machine Learning Models for Production

Sinan Ozdemir

Data Scientist, Entrepreneur, and Author

The components of excellent ML documentation

Allows us to establish processes for evaluating the quality of our data.

This also offers other benefits:

sources

A structure that describes the organization of data.

For a relational database schema:

Database key	Data type	Data order
`Person.name`	`string`	`nominal`
`Person.survey_score`	`integer`	`ordinal`

schema

Documenting how we labeled our response variable enhances:

select

Labeling methods can evolve over time.

A visual representation of the different steps involved in building your machine learning model.

This often includes:

Documenting the process of experimentation and selection of the best model includes documenting:

choice

To document our training environment, we should include:

Packages used with versions (eg. scikit-learn==1.1.3).
Any random seeds used for non-deterministic training (eg. dimensionality reduction algorithms).

Why?

Developing Machine Learning Models for Production