Update on 6/30/2015: added Proxem’s API “ontology based topic détection” to the list thanks to Tom‘s comment.
Here is a list of machine learning resources that helps to extract key phrases in the input text.
1. Azure Machine Learning’s Text Analytics service
Text Analytics API is a suite of text analytics services built with Azure Machine Learning. For key phrase extraction: the API returns a list of strings denoting the key talking points in the input text. The tool employs techniques from Microsoft Office’s sophisticated Natural Language Processing toolkit.
2. Alchemy API
AlchemyAPI is capable of extracting topic keywords from your HTML, text, or web-based content. We employ sophisticated statistical algorithms and natural language processing technology to analyze your data, extracting keywords that can be used to index content, generate tag clouds, and more.
Semantria applies Text and Sentiment Analysis to tweets, facebook posts, surveys, reviews or enterprise content.
The Content Analysis Web Service detects entities/concepts, categories, and relationships within unstructured content. It ranks those detected entities/concepts by their overall relevance, resolves those if possible into Wikipedia pages, and annotates tags with relevant meta-data. Please give our content analysis service a try to enrich your content.
MeaningCloud extracts the meaning of all kind of unstructured content: social conversation, articles, documents…
A text analysis service to find out what any text is about by extracting the most relevant Wikipedia’s categories through a patented NLP technology.
Open Source Tools and Papers:
1. KEA – Keyphrase extraction algorithm
KEA is an algorithm for extracting keyphrases from text documents. It can be either used for free indexing or for indexing with a controlled vocabulary. KEA is implemented in Java and is platform independent. It is an open-source software distributed under the GNU General Public License. It’s based on supervised approach using training data and controlled vocabulary.
2. maui indexer
Maui automatically identifies main topics in text documents. Depending on the task, topics are tags, keywords, keyphrases, vocabulary terms, descriptors, index terms or titles of Wikipedia articles. It is basically extension of kea which provide facility to use encyclopedia for key phrase extraction.
Carrot2 is an Open Source Search Results Clustering Engine. It can automatically organize small collections of documents (search results but not only) into thematic categories. It is based on unsupervised approach for key phrase extraction.
Topic models provide a simple way to analyze large volumes of unlabeled text. A “topic” consists of a cluster of words that frequently occur together. Using contextual clues, topic models can connect words with similar meanings and distinguish between uses of words with multiple meanings. For a general introduction to topic modeling, see for example Probabilistic Topic Models by Steyvers and Griffiths (2007).
The Stanford Topic Modeling Toolbox (TMT) brings topic modeling tools to social scientists and others who wish to perform analysis on datasets that have a substantial textual component.
7. Paper: TextRank
This paper introduces TextRank – a graph-based ranking model for text processing, and shows how this model can be successfully used in natural language applications. In particular, it proposes two innovative unsupervised methods for keyword and sentence extraction, and shows that the results obtained compare favorably with previously published results on established benchmarks
1. Phrase Breaker by Microsoft Research
Phrase breaking divides or segments a sequence of words into groups or “phrases” which tend to occur together in whichever body of text was used to train the phrase model. Since phrases often carry more meaning than individual words, an application may achieve improved results by using phrases in place of a simple “bag-of-words”. For instance, a machine learning system can use entire phrases instead of lone words as features. Or an application can treat certain phrases as topics for text categorization.
2. Word Breaker by Microsoft Research
Word breaking is an established task in natural language processing (NLP). It is especially important for handling European languages that use compound words, such as German, Dutch, Greek, etc.
Please share in the comments any other key phrase extraction resources that you are aware of. I will add them to the list and appreciate your contributions.