Data type verification

Cleaning Data in Java

Dennis Lee

Software Engineer

Why check data types?

  • Need to parse text from external data sources
  • Invalid data types cause runtime errors
  • Proper validation prevents crashes

 

String priceText = "$30.61";
double price = Double.parseDouble(priceText);  // Fails: $ not valid in number
Exception in thread "main" java.lang.NumberFormatException
Cleaning Data in Java

Validating numeric text

// Utility for number validation
import org.apache.commons.lang3.math.NumberUtils; 
String reviewCountText = "165";
System.out.println(NumberUtils.isParsable(reviewCountText));
true
String priceText = "$30.61"; // Invalid number
System.out.println(NumberUtils.isParsable(priceText)); // false: has $ symbol
false
Cleaning Data in Java

Parsing numeric text

private static String validateNumeric(String value) {
    if (NumberUtils.isParsable(value)) return value;  // Return if valid number
    throw new IllegalArgumentException("Invalid: " + value);  // Else throw error
}

int reviews = Integer.parseInt(validateNumeric("165"));  // Validate then parse

// Remove $ first double price = Double.parseDouble(validateNumeric("$30.61".replace("$", ""))); System.out.println("reviews: " + reviews + "\nprice in $: " + price);
reviews: 165
price in $: 30.61
Cleaning Data in Java

Validating date formats

import java.time.format.DateTimeFormatter; // For date pattern formatting
private static LocalDate parseDate(String dateStr) {
    DateTimeFormatter format = DateTimeFormatter.ofPattern("M/d/yy");

return LocalDate.parse(dateStr, format); // Convert string to LocalDate }
LocalDate date = parseDate("1/10/23"); // "1/10/23" converts to 2023-01-10 System.out.println("Date: " + date);
Date: 2023-01-10
Cleaning Data in Java

Combining numeric and date validation

private static double parsePrice(String price) {
    return Double.parseDouble(validateNumeric(price.replace("$", "")); // Remove $
}


// Convert CSV fields to BookSales public static BookSales parseSalesData(String[] fields) { String title = fields[0]; LocalDate publishDate = parseDate(fields[1]); // Custom date parsing int reviews = Integer.parseInt(validateNumeric(fields[2])); double rating = Double.parseDouble(validateNumeric(fields[3])); double price = parsePrice(fields[4]); // Handle currency format return new BookSales(title, publishDate, reviews, rating, price); }
Cleaning Data in Java

Reading CSV files safely

List<BookSales> sales = new ArrayList<>(); // Store BookSales objects


try (BufferedReader reader = new BufferedReader(new FileReader("book_sales.csv"))) {
String line = reader.readLine(); // Skip header row
while ((line = reader.readLine()) != null) { // Read until end of file String[] fields = line.split(","); // Split CSV line into array sales.add(parseSalesData(fields)); // Convert fields to BookSales object } }
sales.forEach(System.out::println); // Print each sale
BookSales[title=Python Crash Course, publishDate=2023-01-10, 
reviewCount=165, rating=4.8, price=30.61]
Cleaning Data in Java

Summary: data type verification

  • Validate text before parsing to prevent runtime errors
  • Key imports
    • org.apache.commons.lang3.math.NumberUtils
    • java.time.format.DateTimeFormatter
  • Use NumberUtils.isParsable() to check numeric strings
  • Handle multiple date formats with DateTimeFormatter
  • Catching data type issues ensures data integrity
Cleaning Data in Java

Let's practice!

Cleaning Data in Java

Preparing Video For Download...