Extract mode

Use regular expressions to extract text from fields in incoming Events.

Features 

  • Specify one or more Ruby compatible regular expressions to match and extract text from incoming Events.

  • Any matched text will be included in an array in a new Event. If there are no matches, no Event will be emitted.

  • Specify the name of the key for each array of matched text.

Configuration Options 

  • mode: 'extract'

  • matchers: When using 'extract' mode, define an array of regular expression configuration blocks to match text in incoming Events:

  • path: Specify the wrapped JSON path for the field containing text to extract.

  • regexp: Specify the regular expression to be used to extract text.

  • to: Specify the name of the field to contain the array of matches.

Emitted Events 

{
  "email_addresses": ["alice@example.com"],
  "ips": ["10.1.1.12"],
  "urls": ["http://example.com", "https://tines.com"]
}

Example Configuration Options 

Given the incoming Event below, extract all email addresses and store them in a field called 'email_addresses' in a new Event.

{
  "text": "You received to email to alice@example.com sent from bob@example.com"
}
{
  "mode": "extract",
  "matchers": [
    {
      "path": "<<text>>",
      "regexp": "\\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,4}\\b",
      "to": "email_addresses"
    }
  ]
}

Given the incoming Event below, extract all email addresses from 'text' store them in a field called 'email_addresses'; extract all URLs from 'referrers' and store them in a field called 'urls'; and extract all IP addresses from 'servers' and store them in a field called 'ips'.

{
  "text": "You received to email to alice@example.com sent from bob@example.com",
  "metadata": {
    "servers": "10.1.1.1, 10.2.3.4, 10.15.6.8",
    "referrers": "https://tines.com and https://www.example.com"
  }
}
{
  "mode": "extract",
  "matchers": [
    {
      "path": "<<text>>",
      "regexp": "\\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,4}\\b",
      "to": "email_addresses"
    },
    {
      "path": "<<metadata.referrers>>",
      "regexp": "https?:\\/\\/[\\S]+",
      "to": "urls"
    },
    {
      "path": "<<metadata.servers>>",
      "regexp": "\\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\b",
      "to": "ip_addresses"
    }
  ],
  "message": "Email addresses taken from text, Urls from referrers and IPs from servers."
}
Was this helpful?