API Overview

Our API is powered up by the Datumbox Machine Learning Framework. It currently offers 14 different functions as part of our Machine Learning platform. All of the following functions use several sophisticated classification techniques and they are accessible via our REST API. To call any of our functions, just sign up for an API key and follow the guidelines as described on the Technical Details. Please note that the API provides a limited number of calls (1000 per day). If you require more calls consider using the Datumbox Machine Learning Framework instead.

Sentiment Analysis

The Sentiment Analysis function classifies documents as positive, negative or neutral (lack of sentiment) depending on whether they express a positive, negative or neutral opinion.

Twitter Sentiment Analysis

The Twitter Sentiment Analysis function allows you to perform Sentiment Analysis on Twitter. It classifies the tweets as positive, negative or neutral depending on their context.

Subjectivity Analysis

The Subjectivity Analysis function categorizes documents as subjective or objective based on their writing style. Texts that express personal opinions are labeled as subjective and the others as objective.

Topic Classification

The Topic Classification function assigns documents in 12 thematic categories based on their keywords, idioms and jargon. It can be used to identify the topic of the texts.

Spam Detection

The Spam Detection function labels documents as spam or nospam by taking into account their context. It can be used to filter out spam emails and comments.

Adult Content Detection

The Adult Content Detection function classifies the documents as adult or no-adult based on their context. It can be used to detect whether a document contains content unsuitable for minors.

Readability Assessment

The Readability Assessment function determines the degree of readability of a document based on its terms and idioms. The texts are classified as basic, intermediate and advanced depending their difficulty.

Language Detection

The Language Detection function identifies the natural language of the given document based on its words and context. This classifier is able to detect 96 different languages.

Commercial Detection

The Commercial Detection function labels the documents as commercial or non-commercial based on their keywords and expressions. It can be used to detect whether a website is commercial or not.

Educational Detection

The Educational Detection function classifies the documents as educational or non-educational based on their context. It can be used to detect whether a website is educational or not.

Keyword Extraction

The Keyword Extraction function enables you to extract from an arbitrary document all the keywords and word-combinations along with their occurrences in the text.

Text Extraction

The Text Extraction function enables you to extract the important information from a given webpage. Extracting the clear text of the documents is an important step before any other analysis.

Document Similarity

The Document Similarity function estimates the degree of similarity between two documents. It can be used to detect duplicate webpages or detect plagiarism.

Apply Machine Learning in...

Social Media

With the rise of Social Media internet users became able to easily express and share their opinions about companies, products, services, events etc. Thus companies became interested in monitoring what people say about their brands in order to get feedback or enhance their marketing efforts.

Machine Learning has several interesting applications in Social Media Monitoring. It is used in order to evaluate the opinions of the users and classify them as positive, negative or neutral (Also known as Sentiment Analysis). In addition it can be used to detect whether the posts are objective or subjective, what is the natural language of the posts and whether the posts were written by a man or woman. By using our API you are able to develop your own custom Social Media Monitoring tools in no time!


Search Engine Traffic is one of the most important and profitable sources of traffic for websites. Due to the intensive competition, website owners and SEO Professionals use several SEM and SEO tools in order to improve their placement on Search Engine Results.

Machine Learning can be applied in order to develop intelligent Online Marketing and SEO Tools. It is used in order to extract the important information from webpages, to find the important keywords within documents, to detect cases of Duplicate Content and to assess the Quality and Readability of the webpages. Check out our powerful API and start building your own intelligent SEO & Online Marketing Tools!

Quality Evaluation

Managing Online Communities or Services that accept User generated content involves the time consuming task of moderating the content. To maintain the quality in high levels, moderators must detect and remove any spam messages or content which is inappropriate for minors.

Machine Learning can automate this process by filtering inappropriate content and marking it as spam. Moreover it makes it possible to categorize posts based on their topic or language and assess their readability. Our service offers you the ability to automatically evaluate the Quality of the content; check out our API and build your own quality filters for your online community.

Text Analysis

Most hot applications on the net use Text Analysis to provide targeted results to their users, automate time consuming tasks or evaluate the quality of user generated content. Machine Learning can make Text Analysis more accurate and targeted than ever.

Large Ecommerce systems use Machine Learning to label automatically their products, Parental Control services use it to detect inappropriate content, Online Marketing tools apply it to analyze and enhance the campaigns of their clients, Online Community services use it to evaluate the quality of submitted content and Search Engines apply it to improve their rankings. The next idea is yours, start building it by using our powerful API!

Technical Details

The Datumbox API is a web service which allows you to use our tools from your website, software or mobile application. The API gives you access to all of the supported functions of our service. In this page you will find all the information that you need in order to use our API, fully implemented code samples and the latest API Documentation.

Our Web Service uses "REST-Like" RPC-style operations over HTTP POST requests with parameters URL encoded into the request and its response is encoded in JSON. It is designed to be easy to use and you can implement it in any model computer language that allows you generating web requests.

How to use the API?

The current version of the API is 1.0v. In order to use the API you must Register for a Datumbox account and get your API Key from your profile. Once you have your key you immediately start generating API requests on our service. Below you can find a list of code samples and the API Documentation which can help you use our API within minutes:

What Applications can you build?

At the moment our API currently allows you to build applications that make use of Text Analysis and Natural Language Processing techniques such as Online Marketing Tools, SEO Tools, Social Media Monitoring services, Anti-Spam filters and other Text Classification apps. The currently supported API functions are: Sentiment Analysis, Twitter Sentiment Analysis, Subjectivity Analysis, Topic Classification, Spam Detection, Adult Content Detection, Readability Assessment, Language Detection, Commercial Detection, Educational Detection, Keyword Extraction, Text Extraction and Document Similarity.

Use API Sandbox