Extracting Text Patterns

There are two ways of extracting information from a line of log data. You can use a regular expression pattern on normal data or use the split function when you come across more structured text data like csv files. The split function and regex pattern can be used on the search or data types page. Each group that is extracted is numbered and can be assigned to a field name when used on the Data types page or be used as a search term as part of a search expression. The Regex library in use in Logscape is JRegex, which is a Perl5 compliant Regex implementation.

The Search Page

Here are a few example searches of patterns being used on the search page

Example 1 - Search with a regex pattern

Each group defined in the regex pattern is labelled with a number.

cpu=(\d+)% | 1.max() chart(line)

The number representing the group can then be used as a search term. In this example the cpu percentage is extracted and plotted as a line chart.

 
2014-03-11 11:23:00 HPCSVR0001 Server Health Metrics cpu=34% mem=26% diskUtilization=0.1% 
2014-03-11 11:23:10 HPCSVR0001 Adding Node 0001 to cluster BigDataNode 
2014-03-11 11:23:12 HPCSVR0001 Adding Node 0002 to cluster BigDataNode 
2014-03-11 11:23:17 HPCSVR0001 Adding Node 0003 to cluster BigDataNode 

Example 2 - Search with the split function

The split function is handy when working with delimited or CSV data. Often it is the simpler option when it comes to more structured text.

split(\|,7) | 7.avg(3,cpuMS) chart(stacked)

Each column is mapped or labelled with a number and can be used as a search term. Column 7 represents the cpu time of a task and column 3 is the node Id. The resulting chart plots the average cpu time of each nodeId.

 
#datetime|server|nodeId|taskname|startTime|endTime|avgCpuTImeMs
2014-03-11 11:23:00|HPCSVR0001|00abc0343da|riskcalc001|2014-03-11 11:20:00||300
2014-03-11 11:23:10|HPCSVR0001|007dfa333a2|riskcacl002|2014-03-11 11:20:00||370
2014-03-11 11:23:12|HPCSVR0001|001aaabe453|IRriskCalc002|2014-03-11 11:20:00||450
2014-03-11 11:23:17|HPCSVR0001|0022abefeaa|IRriskCal002|2014-03-11 11:20:00||600

The Datatypes Page

Any data extracted from a log line can be assigned to a field name in a datatype. A type at its simplest is a collection of named field extractions. A regex pattern or the split function can be used on this page. Here are few data type mappings using a pattern or a split expression.

Example 1 - Datatype for a log4j style log

Each matched group is mapped sequentially to the field names defined in the data type. The example uses the perl regex syntax. The ^ symbol is optional.

Pattern: ^(2*)\s+(INFO|DEBUG|WARN|ERROR|FATAL)\s+(*)\s+(*)\s+(**)

Each

Example 2 - Split function

The split function works on csv data. This example splits the text 7 times with the delimiter '|' and then maps the values sequentially to a field name

split(\|,7)

The first column is mapped to the datetime field, the second to the server name, the third to the nodeId field and so on.

Regex Shortcuts

Logscape provides a few regex shortcuts to make them easier to read when used on the search or data type page. Here is a summary of the shortcuts and what they translate to.

  • (*) - (\S+) matches a block of text
  • (d) - Matches a numerical expression. This expression will capture numbers with decimal points in them too.
  • (**) - Matches the remaining text on a line. It translates to (.*)
  • (*,) - Matches all the characters until it reaches the ',' character