Range validation

Cleaning Data in Java

Dennis Lee

Software Engineer

Chocolate sales dataset

  • Work with a chocolate sales dataset throughout the chapter
  • Modify certain columns like amount and date to illustrate data validation

 

Salesperson Country Product Date Amount Boxes Shipped
James Rudeforth UK Mint Chip Choco 4-Jan-22 $5,320 180
Van Tuxwell India 85% Dark Bars 1-Aug-22 $7,896 94
Gigi Bohling India Peanut Butter Cubes 7-Jul-22 $4,501 91
Jan Morforth Australia Peanut Butter Cubes 27-Apr-22 $12,726 342
Jehu Fischer UK Peanut Butter Cubes 24-Feb-22 $13,685 184
Cleaning Data in Java

Why we need range validation

// We extracted these sales amounts and dates from our dataset
List<Double> salesAmounts = Arrays.asList(5320.0, 7896.0, 4501.0,
                                          12726.0, 13685.0);
List<String> saleDates = Arrays.asList("4-Jan-22", "1-Aug-22", "7-Jul-22", 
                                       "27-Apr-22", "24-Feb-22");

System.out.println("Sales amounts: " + salesAmounts);
System.out.println("Sale dates: " + saleDates);
Sales amounts: [5320.0, 7896.0, 4501.0, 12726.0, 13685.0]
Sale dates: [4-Jan-22, 1-Aug-22, 7-Jul-22, 27-Apr-22, 24-Feb-22]
Cleaning Data in Java

Finding range boundaries

// Get the minimum sales amount
Double minSale = Collections.min(salesAmounts);
// Get the maximum sales amount
Double maxSale = Collections.max(salesAmounts);

System.out.println("Range: $" + minSale + " - $" + maxSale);
Range: $4501.0 - $13685.0
Cleaning Data in Java

Setting a valid range

// Sales amount should not be less than 0.0
Double lowerThreshold = 0.0;
// Sales amount should not be greater than 15000.0
Double upperThreshold = 15000.0;

for (Double amount : salesAmounts) { // Check that each amount falls within range if (amount >= lowerThreshold && amount <= upperThreshold) { System.out.println(amount + " is within range"); } }
Cleaning Data in Java

Setting a valid range: outputs

5320.0 is within range
7896.0 is within range
4501.0 is within range
12726.0 is within range
13685.0 is within range
Cleaning Data in Java

Defining range categories

import org.apache.commons.lang3.Range;
// Low sales category: $0 - $5000
Range<Double> lowSales = Range.between(0.0, 5000.0);
// Medium sales category: $5000 - $10000
Range<Double> mediumSales = Range.between(5000.0, 10000.0);
// High sales category: $10000 - $15000
Range<Double> highSales = Range.between(10000.0, 15000.0);
Cleaning Data in Java

Checking range categories

for (Double amount : salesAmounts) {
    if (lowSales.contains(amount)) { // Is amount in low sales range?
        System.out.println("$" + amount + " - Low sales");
    } else if (mediumSales.contains(amount)) { // Is amount in medium sales range?
        System.out.println("$" + amount + " - Medium sales");
    } else if (highSales.contains(amount)) { // Is amount in high sales range?
        System.out.println("$" + amount + " - High sales");
    } else { // Is amount outsize of expected range?
        System.out.println("$" + amount + " - Out of expected range");
    }
}
Cleaning Data in Java

Checking range categories: outputs

Sales categorization:
$5320.0 - Medium sales
$7896.0 - Medium sales
$4501.0 - Low sales
$12726.0 - High sales
$13685.0 - High sales
Cleaning Data in Java

Validating dates

// d-MMM-yy: day of month (d), abbreviated month name (MMM), two-digit year (yy)
DateTimeFormatter formatter = DateTimeFormatter.ofPattern("d-MMM-yy");
LocalDate startDate = LocalDate.of(2022, 1, 1); // Start of 2022
LocalDate endDate = LocalDate.of(2022, 12, 31); // End of 2022

for (String dateStr : saleDates) { LocalDate saleDate = LocalDate.parse(dateStr, formatter);
// Valid dates occur after startDate and before endDate boolean isValid = saleDate.isAfter(startDate) && saleDate.isBefore(endDate); System.out.println(dateStr + " is valid: " + isValid); }
Cleaning Data in Java

Validating dates: outputs

4-Jan-22 is valid: true
1-Aug-22 is valid: true
7-Jul-22 is valid: true
27-Apr-22 is valid: true
24-Feb-22 is valid: true
Cleaning Data in Java

Putting it all together

  • Key import: org.apache.commons.lang3.Range
  • Get the min and max: Collections.min(salesAmounts), Collections.max(salesAmounts)
  • Define range categories: Range.between(0.0, 5000.0)
  • Checking range categories: lowSales.contains(amount)
  • Validating dates: saleDate.isAfter(startDate), saleDate.isBefore(endDate)
Cleaning Data in Java

Let's practice!

Cleaning Data in Java

Preparing Video For Download...