Cleaning Data in Java
Dennis Lee
Software Engineer
| Product Name | Category | Status | Date Received | Quantity |
|---|---|---|---|---|
| Eggplant | Fruits & Vegetables | Discontinued | 3/1/25 | 46 |
| VEGETABLE OIL | Oils & Fats | Backordered | 4/1/25 | 51 |
| Cheese | Dairy | Active | 6/1/25 | 78 |
| Fresh (Organic) *Carrots* | Fruits & Vegetables | Discontinued | 5/1/25 | 51 |
| Bell Pepper Fresh | Fruits & Vegetables | Active | 5/2/25 | 67 |
String[] messyProducts = {
"Eggplant ", // Extra whitespace at the end
"VEGETABLE OIL", // Inconsistent case
"Fresh (Organic) *Carrots*", // Special characters
"Bell Pepper Fresh", // Extra whitespace between words
};
String[] products = {"Eggplant ", " Vegetable Oil", " Cheese "};
for (String product : products) {
String cleaned = product.trim(); // Removes leading/trailing whitespace
System.out.println(cleaned);
}
Eggplant
Vegetable Oil
Cheese
List<String> products = Arrays.asList("Eggplant", "VEGETABLE OIL", "cheese");products.stream().map(String::toLowerCase) // Convert all to lowercase.forEach(System.out::println); // Print each product
eggplant
vegetable oil
cheese
[^a-zA-Z\\s]: Finds any character that isn't a letter or space// This pattern matches any character that is NOT:
[ // Start a character set
^ // NOT - match anything not in this set
a-z // any lowercase letter
A-Z // any uppercase letter
\\s // any whitespace character (need extra \ for Java to interpret \s)
] // End character set
String dirtyName = "Fresh (Organic) *Carrots*"; // Special characters: (, ), *String cleaned = dirtyName.replaceAll("[^a-zA-Z\\s]", ""); // Remove non-lettersSystem.out.println(cleaned); // Output: "Fresh Organic Carrots"
Fresh Organic Carrots
import java.util.regex.Pattern;
Pattern pattern = Pattern.compile("\\s+"); // Match multiple spacesString messyProduct = "Bell Pepper Fresh"; // Contains extra spaces // Replace multiple spaces with single space String cleanedProduct = pattern.matcher(messyProduct).replaceAll(" ");System.out.println(cleanedProduct);
Bell Pepper Fresh
List<String> messyProducts = Arrays.asList(
"Eggplant ", "VEGETABLE OIL",
"Fresh (Organic) *Carrots*", "Bell Pepper Fresh"
); // Product names extracted from our grocery inventory dataset
messyProducts.stream()
.map(s -> s.trim() // Fix outer spaces
.replaceAll("[^a-zA-Z\\s]", "") // Remove special chars
.replaceAll("\\s+", " ") // Fix inner spaces
.toLowerCase()) // Standardize case
.forEach(System.out::println);
eggplant
vegetable oil
fresh organic carrots
bell pepper fresh
Cleaning Data in Java