Data Optimization

Overview

Data Optimization is a comprehensive suite of features within the Secure60 Collector that enables organisations to transform, enrich, protect, and control their log data before it reaches the Secure60 platform. These features are applied within your environment, ensuring optimal data quality, compliance, and utility.

The Data Optimization framework includes:

  1. Field Extraction - Extract structured data from unstructured fields using regex patterns
  2. Content Redaction - Protect sensitive information using regex-based redaction
  3. Field Management - Add, map, and remove fields to optimize data structure
  4. Event Filtering - Control which events are ingested based on content
  5. Data Enrichment - Add contextual information to enhance analytical value

Execution Priority

Data optimization features are executed in a specific order within the Secure60 Collector pipeline to ensure maximum effectiveness:

  1. Field Extraction (Blocks 1-5)
  2. Field Mapping
  3. Static Field Addition
  4. Field Removal Operations
  5. Event Filtering
  6. Content Redaction (Blocks 1-5)
  7. Data Enrichment (Subnet & Exact Match)

This order allows you to extract data first, then manipulate and protect it before final enrichment and transmission.

Field Extraction

Overview

Field Extraction enables you to extract structured data from unstructured log fields using regular expressions with named capture groups. This is particularly useful for parsing custom log formats, extracting specific values from message fields, or breaking down complex data structures.

Features

Configuration

Via Portal UI

  1. Navigate to Integrations → Secure60 Collector
  2. Click “Advanced Config”
  3. Find “Field Extraction” section
  4. Configure for each block (1-5):
    • Source Field: Field name containing data to extract from
    • Regex Pattern: Regular expression with named capture groups

Via Environment Variables

# Block 1
EXTRACT_FIELD_NAME=message
EXTRACT_FIELD_REGEX=r'User (?P<user_name>\w+) from (?P<source_ip>\d+\.\d+\.\d+\.\d+)'

# Block 2
EXTRACT_FIELD_NAME_2=raw_log
EXTRACT_FIELD_REGEX_2=r'Action: (?P<action>\w+), Result: (?P<result>\w+)'

# Additional blocks: _3, _4, _5

Regex Syntax

The Secure60 Collector uses the Rust regex engine, which supports:

For complete syntax reference, see the Rust regex documentation

Examples

Example 1: Extracting User and IP from SSH Logs

Source Field: message
Content: "Accepted password for admin from 192.168.1.100 port 22 ssh2"
Regex: r'Accepted password for (?P<user_name>\w+) from (?P<source_ip>\d+\.\d+\.\d+\.\d+)'
Result: Creates fields user_name="admin", source_ip="192.168.1.100"

Example 2: Parsing Application Logs

Source Field: raw_log
Content: "2024-01-15 10:30:15 [ERROR] Database connection failed for user john_doe"
Regex: r'(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) \[(?P<log_level>\w+)\] (?P<error_message>.*)'
Result: Creates fields timestamp, log_level="ERROR", error_message

Content Redaction

Overview

Content Redaction protects sensitive information by applying regex-based filters to field content. Unlike field removal, redaction preserves the field structure while masking sensitive portions of the data.

Features

Configuration

Via Portal UI

  1. Navigate to Integrations → Secure60 Collector
  2. Click “Advanced Config”
  3. Find “Content Redaction” section
  4. Configure for each block (1-5):
    • Target Field: Field name containing content to redact
    • Redaction Pattern: Regular expression pattern to identify sensitive content

Via Environment Variables

# Block 1
REDACT_CONTENT_FIELD_NAME=message
REDACT_CONTENT_REGEX=r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b'

# Block 2
REDACT_CONTENT_FIELD_NAME_2=url
REDACT_CONTENT_REGEX_2=r'password=[^&\s]+'

# Additional blocks: _3, _4, _5

Examples

Example 1: Credit Card Number Redaction

Target Field: transaction_log
Original: "Transaction 12345: Card 4532-1234-5678-9012 approved"
Pattern: r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b'
Result: "Transaction 12345: Card **************** approved"

Example 2: Password Redaction in URLs

Target Field: request_url
Original: "https://api.example.com/login?user=john&password=secret123"
Pattern: r'password=[^&\s]+'
Result: "https://api.example.com/login?user=john&password=***"

Field Management

Static Fields

Add consistent metadata to all events flowing through the collector.

Configuration

# Format: field_name=value,field_name2=value2
STATIC_FIELDS=environment=production,app_name=web-portal,region=us-east-1

Example

Original Event: {"message": "User login", "timestamp": "2024-01-15T10:30:00Z"}
Static Fields: environment=production,app_name=web-portal
Result: {
  "message": "User login",
  "timestamp": "2024-01-15T10:30:00Z",
  "environment": "production",
  "app_name": "web-portal"
}

Field Mapping

Rename fields to align with your desired schema or naming conventions.

Configuration

# Format: source_field=destination_field,source_field2=destination_field2
MAP_FIELDS=username=user_name,clientip=source_ip,app=application_name

Example

Original Event: {"username": "john", "clientip": "192.168.1.100", "app": "webapp"}
Field Mapping: username=user_name,clientip=source_ip,app=application_name
Result: {"user_name": "john", "source_ip": "192.168.1.100", "application_name": "webapp"}

Field Removal

Remove unwanted fields or fields containing specific content.

Remove Fields by Name

# Format: field_name1,field_name2,field_name3
DROP_FIELD_NAMED=debug_info,temp_data,internal_id

Remove Fields Containing Specific Content

# Format: field_name=search_string,field_name2=search_string2
DROP_FIELD_CONTAINING=message=DEBUG,log_level=TRACE

Event Filtering

Control which events are ingested by dropping entire events based on field content.

Configuration

# Format: field_name=search_string,field_name2=search_string2
DROP_EVENT_CONTAINING=log_level=DEBUG,message=heartbeat

Example

Event: {"log_level": "DEBUG", "message": "Detailed debug information"}
Filter: log_level=DEBUG
Result: Event is dropped and not sent to Secure60

Data Enrichment

Subnet-Based Enrichment

Enhance events with contextual information based on source IP subnet matching.

Configuration

ENRICH_SUBNET_ENABLE=true
ENRICH_SUBNET_SOURCE_FIELD=source_ip  # Default: ip_src_address
ENRICH_SUBNET_LOOKUP_PREFIX=/24       # Default: /24
ENRICH_SUBNET_MAPPING_FIELDS=source_department,source_business_unit,source_location

CSV File Format

subnet,source_department,source_business_unit,source_location,source_criticality
192.168.1.0/24,IT,Technology,New York,High
10.0.1.0/24,Finance,Business,London,Critical
172.16.0.0/16,Development,Technology,San Francisco,Medium

Exact Field Matching Enrichment

Enhance events with contextual information based on exact field value matching.

Configuration

ENRICH_CUSTOM_EXACT_ENABLE=true
ENRICH_CUSTOM_EXACT_SOURCE_FIELD=host_name  # Default: host_name
ENRICH_CUSTOM_EXACT_MAPPING_FIELDS=source_department,source_business_unit,environment

CSV File Format

field_value,source_department,source_business_unit,environment,source_criticality
web-server-01,IT,Technology,Production,High
db-server-prod,Database,Technology,Production,Critical
app-server-dev,Development,Technology,Development,Low

Best Practices

Performance Optimization

  1. Use Specific Regex Patterns: Avoid overly broad patterns that may impact performance
  2. Limit Extraction Blocks: Only use the number of blocks you actually need
  3. Order Fields by Frequency: Place most commonly used extractions in lower-numbered blocks

Security Considerations

  1. Test Redaction Patterns: Ensure sensitive data is properly masked before production deployment
  2. Use Multiple Redaction Blocks: Apply different patterns for different types of sensitive data
  3. Regular Pattern Updates: Keep redaction patterns updated as data formats evolve

Data Quality

  1. Validate Extractions: Test regex patterns with sample data before deployment
  2. Monitor Field Creation: Ensure extracted fields contain expected data types
  3. Use Consistent Naming: Follow your organisation’s field naming conventions

Getting Started

To implement data optimization in your Secure60 Collector:

  1. Plan Your Strategy: Identify which fields need extraction, redaction, or enrichment
  2. Configure via Portal UI: Use the Secure60 Portal’s Advanced Config for easy setup
  3. Test with Sample Data: Validate your configuration with representative log samples
  4. Monitor and Iterate: Use the Secure60 Portal to verify results and adjust as needed

For assistance with data optimization configuration, contact our integrations team at integrations@secure60.io

Integration with Existing Features

Data Optimization works seamlessly with existing Secure60 Collector features:

This comprehensive approach ensures your data is optimized, protected, and enriched before reaching the Secure60 platform.

Back to top