Data sanitization and masking
Using Data Stores in AWS
Dunieski Otano
AWS Solutions Architect
The log file that exposed everything
- Application logs user activity for debugging
- Logs contain full national IDs, credit cards, passwords
- Logs sent to CloudWatch (accessible to dev team)
- Compliance audit finds violation: $2M fine
What is data sanitization?
- Definition
- Removing or obscuring sensitive data
- When to sanitize
- Before logging
- Before displaying to users
- Before transmitting to third parties
- Goal
- Maintain utility while protecting privacy
Full masking technique
- Use for
- Passwords, API keys, tokens
- Implementation
- Replace all characters with asterisks
- Example
Partial masking technique
- Use for
- National IDs, credit cards, phone numbers
- Implementation
- Show last 4 digits, mask the rest
- Example
- "123-45-6789" ==> "*--6789"
- "4532-1234-5678-9010" ==> "--**-9010"
Hashing and tokenization
- Hashing
- Tokenization
- Replace with random token
- Store mapping securely
- Use cases
- Password verification, duplicate detection, payments, compliance
Redaction technique
- Use for
- Medical diagnoses, legal information
- Implementation
- Completely remove sensitive content
- Example
- "Patient has diabetes" ==> "Patient has [REDACTED]"
Implementing masking functions
- Create utility library
- mask_national_id(), mask_credit_card(), mask_email(), mask_phone()
- Apply consistently
- Before logging, displaying, transmitting
- Never store masked data
- Store original encrypted, mask on output
Compliance and testing
- GDPR requirements
- Data minimization, purpose limitation
- HIPAA requirements
- Minimum necessary standard
- Testing
- Test with real patterns and edge cases
Let's practice!
Using Data Stores in AWS
Preparing Video For Download...