Cleaning Data in Java
Dennis Lee
Software Engineer
// From assessing quality DescriptiveStatistics stats = new DescriptiveStatistics(values); System.out.println("Mean: " + stats.getMean());// To transforming strings String clean = text.trim().toLowerCase();// To validating ranges boolean isValid = Range.between(0, 100).contains(value);// To cleaning tables Table cleaned = data.where(sales.isGreaterThan(1000));
// Check statistics double mean = stats.getMean();// Detect nulls boolean isMissing = Optional.ofNullable(value).isEmpty();// Verify types boolean isValidDate = LocalDate.parse(dateStr, formatter);
// Normalize strings (e.g., remove non-letters) String cleaned = dirtyName.replaceAll("[^a-zA-Z\\s]", "");// Standardize categories String standard = categoryMap.getOrDefault(value, "Other");// Convert dates LocalDateTime datetime = LocalDateTime.parse(dateStr, formatter);
// Check ranges boolean inRange = Range.between(min, max).contains(value);// Validate patterns boolean matches = Pattern.matches(pattern, text);// Enforce constraints Set<ConstraintViolation<Data>> violations = validator.validate(data);
for (String colName : data.columnNames()) { // Check missing values System.out.println(data.column(colName).countMissing()); }StringColumn cleaned = names.map(String::toLowerCase) // Clean text .setName("Clean_Names"); // Name new columnTable summary = data .where(sales.isGreaterThan(1000)) // Filter rows .summarize("Sales", mean) // Aggregate data .by("Category"); // Group results
$$

Cleaning Data in Java