Work@Microsoft    Live@Seattle

Best Key Phrase Extraction APIs in the Market

Rate this post

Update on 6/30/2015: added Proxem’s API “ontology based topic détection” to the list thanks to Tom‘s comment.

Here is a list of machine learning resources that helps to extract key phrases in the input text.

Commercial APIs:

1. Azure Machine Learning’s Text Analytics service

Text Analytics API is a suite of text analytics services built with Azure Machine Learning.  For key phrase extraction: the API returns a list of strings denoting the key talking points in the input text. The tool employs techniques from Microsoft Office’s sophisticated Natural Language Processing toolkit.

2. Alchemy API

AlchemyAPI is capable of extracting topic keywords from your HTML, text, or web-based content. We employ sophisticated statistical algorithms and natural language processing technology to analyze your data, extracting keywords that can be used to index content, generate tag clouds, and more.

3. semantria API

Semantria applies Text and Sentiment Analysis to tweets, facebook posts, surveys, reviews or enterprise content.

4. Yahoo Content Analysis API

The Content Analysis Web Service detects entities/concepts, categories, and relationships within unstructured content. It ranks those detected entities/concepts by their overall relevance, resolves those if possible into Wikipedia pages, and annotates tags with relevant meta-data. Please give our content analysis service a try to enrich your content.

5. meaningcloud

MeaningCloud extracts the meaning of all kind of unstructured content: social conversation, articles, documents…

6. Proxem API “Ontology-Based Topic Detection”

A text analysis service to find out what any text is about by extracting the most relevant Wikipedia’s categories through a patented NLP technology.


Open Source Tools and Papers:

1. KEA – Keyphrase extraction algorithm

KEA is an algorithm for extracting keyphrases from text documents. It can be either used for free indexing or for indexing with a controlled vocabulary.  KEA is implemented in Java and is platform independent. It is an open-source software distributed under the GNU General Public License.  It’s based on supervised approach using training data and controlled vocabulary.

2. maui indexer

Maui automatically identifies main topics in text documents. Depending on the task, topics are tags, keywords, keyphrases, vocabulary terms, descriptors, index terms or titles of Wikipedia articles.  It is basically extension of kea which provide facility to use encyclopedia for key phrase extraction.

3. carrot2

Carrot2 is an Open Source Search Results Clustering Engine. It can automatically organize small collections of documents (search results but not only) into thematic categories.  It is based on unsupervised approach for key phrase extraction.

4. mallet topic modeling module

Topic models provide a simple way to analyze large volumes of unlabeled text. A “topic” consists of a cluster of words that frequently occur together. Using contextual clues, topic models can connect words with similar meanings and distinguish between uses of words with multiple meanings. For a general introduction to topic modeling, see for example Probabilistic Topic Models by Steyvers and Griffiths (2007).

5. Stanford topic modeling tool

The Stanford Topic Modeling Toolbox (TMT) brings topic modeling tools to social scientists and others who wish to perform analysis on datasets that have a substantial textual component.

6. Mahout clustering algorithms

7. Paper: TextRank

This paper introduces TextRank – a graph-based ranking model for text processing, and shows how this model can be successfully used in natural language applications. In particular, it proposes two innovative unsupervised methods for keyword and sentence extraction, and shows that the results obtained compare favorably with previously published results on established benchmarks


Other Tools

1. Phrase Breaker by Microsoft Research

Phrase breaking divides or segments a sequence of words into groups or “phrases” which tend to occur together in whichever body of text was used to train the phrase model. Since phrases often carry more meaning than individual words, an application may achieve improved results by using phrases in place of a simple “bag-of-words”. For instance, a machine learning system can use entire phrases instead of lone words as features. Or an application can treat certain phrases as topics for text categorization.

2. Word Breaker by Microsoft Research

Word breaking is an established task in natural language processing (NLP). It is especially important for handling European languages that use compound words, such as German, Dutch, Greek, etc.


Please share in the comments any other key phrase extraction resources that you are aware of.  I will add them to the list and appreciate your contributions.

Comments to Best Key Phrase Extraction APIs in the Market

Leave a Comment

Your email address will not be published. Required fields are marked *