Advanced data manipulation techniques

Importing Data in Java

Anthony Markham

VP Quant Developer

Column removal and addition

  • Drop unnecessary columns with .drop()
  • Add precomputed or categorical columns with .addColumns()
    • Number of rows must be the same ❗
// Remove specific columns
Table cleaned = dataTable.drop("TempID", "Notes");

// Add two new columns StringColumn statusCol = StringColumn.create("Status", "Active", "Inactive", "Active"); IntColumn priorityCol = IntColumn.create("Priority", 1, 2, 3); Table enhanced = dataTable.addColumns(statusCol, priorityCol);
Importing Data in Java

Row filtering with .dropWhere()

  • Removes matching rows
  • Uses Selection for filtering criteria
// Create a selection condition
Selection outliers = dataTable.doubleColumn("Value")
    .isLessThan(lowerBound)
    .or(dataTable.doubleColumn("Value")
        .isGreaterThan(upperBound));

// Remove rows matching the condition Table cleanedData = dataTable.dropWhere(outliers);
Importing Data in Java

Row counting

  • .rowCount() - counts the number of rows in a table
// Compare row counts
System.out.println("Original rows: " + dataTable.rowCount());
System.out.println("After dropping outliers: " + 
    cleanedData.rowCount());
Original rows: 100
After dropping outliers: 95
Importing Data in Java

Boolean filtering

  • Supports and,or, not methods
// Complex boolean filtering
Selection techHighPaid = dataTable.stringColumn("Department")
    .isEqualTo("Technology")
    .and(dataTable.doubleColumn("Salary")
        .isGreaterThan(100000));

// Inverse selection (NOT) Selection nonTechOrLowPaid = techHighPaid.not();
Importing Data in Java

Transformation with .map()

  • Applies a function to transform column values
  • Supports lambda expressions, making our code easy to read
// Transform an entire column with a predefined function
StringColumn upperNames = dataTable.stringColumn("Name").map(s -> s.toUpperCase());
// Transform values in a column
DoubleColumn prices = dataTable.doubleColumn("Price");
DoubleColumn discounted = prices.map(price -> price * 0.9);

// Set a name and add a column discounted.setName("DiscountedPrice"); Table withDiscounts = dataTable.addColumns(discounted);
Importing Data in Java

Summary

  • .drop() - Remove columns from table
  • .dropWhere() - Remove rows matching condition
  • .addColumns() - Add new columns to table
  • .map() - Transform column values
// Core advanced manipulation methods
dataTable.drop("TemporaryID");              
dataTable.dropWhere(selection);             
dataTable.addColumns(newColumn);            
doubleCol.map(value -> value * 2);
Importing Data in Java

Let's practice!

Importing Data in Java

Preparing Video For Download...