Use regular expressions to extract text from fields in incoming Events.
Features
Specify one or more Ruby compatible regular expressions to match and extract text from incoming Events.
Any matched text will be included in an array in a new Event. If there are no matches, no Event will be emitted.
Specify the name of the key for each array of matched text.
Configuration Options
mode
: 'extract'matchers
: When using 'extract' mode, define an array of regular expression configuration blocks to match text in incoming Events:path
: Specify the wrapped JSON path for the field containing text to extract.regexp
: Specify the regular expression to be used to extract text.to
: Specify the name of the field to contain the array of matches.
Emitted Events
{
"email_addresses": ["alice@example.com"],
"ips": ["10.1.1.12"],
"urls": ["http://example.com", "https://tines.com"]
}
Example Configuration Options
Given the incoming Event below, extract all email addresses and store them in a field called 'email_addresses' in a new Event.
{
"text": "You received to email to alice@example.com sent from bob@example.com"
}
{
"mode": "extract",
"matchers": [
{
"path": "<<text>>",
"regexp": "\\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,4}\\b",
"to": "email_addresses"
}
]
}
Given the incoming Event below, extract all email addresses from 'text' store them in a field called 'email_addresses'; extract all URLs from 'referrers' and store them in a field called 'urls'; and extract all IP addresses from 'servers' and store them in a field called 'ips'.
{
"text": "You received to email to alice@example.com sent from bob@example.com",
"metadata": {
"servers": "10.1.1.1, 10.2.3.4, 10.15.6.8",
"referrers": "https://tines.com and https://www.example.com"
}
}
{
"mode": "extract",
"matchers": [
{
"path": "<<text>>",
"regexp": "\\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,4}\\b",
"to": "email_addresses"
},
{
"path": "<<metadata.referrers>>",
"regexp": "https?:\\/\\/[\\S]+",
"to": "urls"
},
{
"path": "<<metadata.servers>>",
"regexp": "\\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\b",
"to": "ip_addresses"
}
],
"message": "Email addresses taken from text, Urls from referrers and IPs from servers."
}