Reproducibility and references

Data Communication Concepts

Hadrien Lacroix

Curriculum Manager

Written report

 

A report must be clear and reproducible.

Data Communication Concepts

Reproducibility example

 

  • Baking a cake

    • Recipe
    • Raw ingredients
    • Our oven and kitchen measuring gadgets
    • Cake with a similar flavor
  • Data project

    • Run analysis again - same results

 

cooking recipe icon

Data Communication Concepts

Replicability example

  • Baking a cake
    • Own utensils
    • Own ingredients
  • Data project
    • Different environment
Data Communication Concepts

Reproducibility and replicability virtues

 

  • Prevents duplication of effort
  • Build upon pre-existing work
  • Focus on new challenges
  • Peer review
  • Tool agnostic
Data Communication Concepts

Best practices

  1. Keep track of how results were produced
    • Well document scripts
      • Comments in code
    • List packages and environment used
    • Version control
Data Communication Concepts

Best practices

  1. Keep track of how results were produced
  2. Avoid manual data manipulation
    • Data versioning
    • Store raw data and intermediate steps
    • Adapt and resolve problems
    • Example: data imputation
      • impute missing values with the mean
      • save and close editor
      • how to know which values were replaced in the first place?
Data Communication Concepts

Best practices

  1. Keep track of how results were produced
  2. Avoid manual data manipulation
  3. Control randomness
    • Random seeds for ML pipelines
    • Controls confounding variables
Data Communication Concepts

Best practices

  1. Keep track of how results were produced
  2. Avoid manual data manipulation
  3. Document randomness
  4. Interpretability
    • Understand the cause of a decision or predict model results
    • Story with compelling narrative
    • Link with reproducibility
1 Molnar C. Interpretable Machine Learning. 2019.
Data Communication Concepts

Best practices

  1. Keep track of how results were produced
  2. Avoid manual data manipulation
  3. Document randomness
  4. Interpretability
  5. Cite bibliography correctly
Data Communication Concepts

References

 

  • A citation is the basic information required to identify and locate a specific publication
Data Communication Concepts

References

 

  • Different styles but same underlying logic
    • Book: Author Name (Year). Title. Publisher.
    • Journal Article: Author Name. (Year) 'Article Title.' Journal Title, Volume Number, Issue Number, Page Numbers.
    • Website: Author Name. Date of Publication, 'Title of Page/Work.' Title of Website, Location

 

  • APA style:
    • In text citations (author, date)
Data Communication Concepts

Reference

  • Reference management tools
    • Easier to keep track
    • Change between styles
    • Search for reference online
    • Options:
      • EndNote
      • Mendeley
      • RefWorks
Data Communication Concepts

References

  • Business context
    • Less strict
    • Simpler (hyperlink)
    • ==> information available and retrievable
Data Communication Concepts

Let's practice!

Data Communication Concepts

Preparing Video For Download...