Poll: Improving Text Mining Capabilities?

tjbate · January 14, 2015 12:48PM

We are planning to improve the text-mining capabilities of Omniscope, using R libraries and implementing some aspects in Omniscope Java. Do you have a text-mining requirement? What tools are you currently using and why?

mamillerpa · January 25, 2015 9:12AM

Exciting!
I mostly use tm and opennlp in R. Other tools that I have dabbled with include GATE, @note2. I like the fact that tm can import text from many sources, like existing dataframes, directories, etc. I also use the XML and JSON libraries.

I like the fact that these tools are generally free of charge and that they have excellent documentation.

Tasks that are important to my business: term-vector text mining, especially dictionary-based. Named entity extraction and relationship extraction.

nils · January 29, 2015 5:00AM

Hi Mark,

may I ask in what context you are using the term-vector text mining, and against what kind of dictionary you are using it?

Thanks!
-Nils

carlosmartinmari · February 24, 2015 4:35AM

It'd be good to have the possibility to extract words (a list of words separated by a comma) that fulfill certain regular expression. I currently do it with R, but I'd preffer something cleaner.

tjbate · February 24, 2015 12:25PM

Carlos: We have already extended existing RegEx filtering to include Search/Replace as described here:

http://forums.visokio.com/discussion/2457

You may be able to use this to 'flag' records containing matches that fulfill your Regex, and/or to re-write the matches in a more useful/filterable/exportable way.

You can change any Sidebar filter device set to Text Search to apply RegEx by choosing 'Show Text Tools' then choosing 'Search Type > Regular Expressions'.

paola · February 24, 2015 1:47PM

Suggestion... depending on the complexity of your Regex filtering...You could replace the criteria with multiple Record filter block rules, Search/Replace the spaces with commas, tick the option that field is tokenised and get the field where each word will be treated as an individual value.
You can use this field now to create charts e.g. most frequently used word bar view, word cloud visualisation (the Tag View), Pivot view to identify the combinations - table showing how many times words appeared in combination with other words.
Please see the demo file with few of these ideas.
You can also have a look at the Text-mining block tutorial video http://tc.visokio.com/videos/?name=DataManagerTextMine&title=Text+mine&lang=gb

	Tokenised.JPG	84K
	Tokenised_ReutersNews.iok	76K

Welcome!

Categories

Ideas Parade

Tagged