I was testing the Text Mine Operation and think that the extraction of custom entities is a very nice feature. However, is it possible to extract regular expression (RegEx) or 'wildcard' patterns, like for example: "Host * has failed"? This would extract all matching phrases/tuples like "Host A has failed", "Host B has failed", and so on?
I haven't been able to get this to work yet unfortunately. If it is not possible, is there any other way to achieve something like this?
While wildcard is not yet supported in the text-mining formulas, you can use it in Search/Replace block. Option 1) You can create a duplicate field in the Field Organiser (in DataManager) and search/replace "Host*has failed" with something like "match", then use that result for filtering or formula criteria for your next steps. Option 2) In the Field Organiser block you can use a formula to quickly identify records that contain both "Host" and "has failed" and return value in those cells, leaving the others blank (you can replace "null" with other value). IF( (AND(CONTAINS([Value], "Host"),CONTAINS([Value], "has failed")))=true, [Value], null)
Another set of useful text-mining functions are: ENDSWITH(text, sub_text) - Returns true if [sub_text] occurs in the end of [text] (case insensitive). STARTSWITH(text, sub_text) - Returns true if [sub_text] occurs in the beginning of [text] (case insensitive). You could use them in the above formula instead of CONTAINS where appropriate.
FINDBETWEEN(all, before, after) Returns the first shortest matching text surrounded by [before] and [after], or null if not found. For example, FINDBETWEEN("apple apple orange plum pear apple banana pear", "apple", "pear") would return " orange plum "
FINDLASTBETWEEN(all, before, after) Returns the last shortest matching text surrounded by [before] and [after], or null if not found. For example, FINDLASTBETWEEN("apple apple orange plum pear apple banana pear", "apple", "pear") would return " banana "