Battle of the AI Titans Part 3 – Microsoft Azure’s AI Services
This is the last installment in a 4-part series about artificial intelligence (AI). The first article in this series focused on the basics of AI, followed by posts highlighting each of the main AI players: Google, Amazon and today, Microsoft Azure’s AI services.
There’s no denying that Microsoft Azure is taking artificial intelligence seriously. For example, their Cognitive Services group represents 28 distinct services – and that’s just one arm of their overall AI portfolio! So, we’ve got a lot to cover. Let’s get started!
Microsoft Azure divides its Cognitive Services into 5 different areas: Vision, Speech, Language, Knowledge, and Search.
Microsoft Azure’s AI Services – Vision
Computer Vision API
This service identifies the objects and actions inside of images and videos. Building on that basic capability, it does a lot of things, including:
- Reading printed and handwritten text.
- Recognizing celebrities and landmarks.
- Analyzing videos identifying objects within them.
This service provides moderation of text, image and video. For text, Content Moderator detects potential profanity across more than 100 languages. It can also use custom lists of what the developer views as inappropriate terms. For video, Content Moderator scores possible adult content. It also provides the ability to build-in human reviews, so its operations can be supervised.
Custom Vision Service (Preview)
This service is trainable. Users upload images that have been tagged (or they can upload untagged images and let Custom Vision Service tag them). Once Custom Vision Service has the tagged images, the user then “teaches” Custom Vision Service to recognize aspects of the image, like a particular room or food. Watch this video to see how easy it is to train Custom Vision Service:
Face API can tell the likelihood that two images contain the same person. It also pulls details of a person from the picture, including: age, gender, pose, smile, emotions, and facial hair. It detects where (by coordinates) features of a face sit (i.e. the left eyebrow is at X & Y coordinates). I tried it on an image of myself and found it eerily accurate.
What’s interesting though, is that when I uploaded an image taken several years ago and another from last year, I received the same age back. (Apparently I haven’t aged over the past several years!)
This service is set to be deprecated since its functionality is included in the Face API application. As the name suggests, it detects the emotions of individuals in images.
Video Indexer (Preview)
One of the better aspects of Microsoft Azure’s AI services, is that they tie together several underlying AI technologies to deliver a broader service. Video Indexer is a great example of this. This simple service packs a powerful punch with:
- Audio Transcription: Supports speech-to-text across English, Spanish, French, German, Italian, Chinese, Portuguese (Brazilian), Japanese, and Russian.
- Face tracking and identification: As the name suggests, this capability identifies faces (celebrities). Video Indexer can also be taught to recognize other faces, and identify them across video feeds.
- Speaker indexing: Notes who said what words and when.
- Visual text recognition: Extracts text in traffic signs, documents, etc. that are shown in a video.
- Voice activity detection: Identifies background and separate voices from background noise.
- Scene detection: Performs visual analysis on video to determine when a scene changes.
- Keyframe extraction: Automatically identifies keyframes in a video.
- Sentiment analysis: Self explanatory.
- Translation: Translates transcript of video.
- Content moderation: Identifies any adult material in videos.
- Keyword extraction: Video Indexer identifies keywords, based on transcript.
- Annotation: Video Indexer annotates videos using a pre-defined model of 2000 objects. (Similar to many of the other AI Titans.)
Microsoft Azure’s AI Services – Speech
Translator Speech API
This service is similar to Google Cloud Translate. What’s notable about this service is that it’s embedded in some of Microsoft’s popular applications, including Skype and Powerpoint. More on that in a minute.
Translator Speech API, as the name suggests, translates speech. It does this through 5 steps:
- Recognizes something was spoken.
- Identifies the language.
- Transcribes the words to text.
- Translates the text.
- Speaks the translation.
There are a many use cases detailed in Microsoft’s website here. But one of the more intriguing use cases involves using what Microsoft calls the “live feature.”
With this capability developers can build live, automatic language, translation into their applications. It’s worth noting that Microsoft uses Translator Speech API in both Skype and Powerpoint. Now individuals speaking on Skype, but in different languages, can have Speech API translate their conversations in real-time.
Giving a presentation to individuals that don’t speak your native language? No problem! Microsoft teams banded together to add the Translator Speech API to Powerpoint. This new plugin allows you to provide subtitles in another language as you are giving your presentation.
If you want to learn more about this technology, download the Microsoft Translate app on your phone – its available in both Android and iOS forms.
The app is powered by Translator Speech API and provides a live translation during a conversation between two individuals. Both my colleague (native Spanish speaker) and I (native English speaker) tried the app. We found it surprisingly accurate. Plus had great fun too!
Speaker Recognition API
This service is well named! It identifies speakers based on their voice. If you remember Video Indexer from earlier in this article, this is the service underpinning Video Indexer’s ability to recognize who is speaking and when.
Custom Speech Service
At a high-level, this service allows you to create your own speech recognition model. Diving in a bit deeper, you can create both customized language and acoustic models. Microsoft does a good job of describing the difference between the models here. I’ve included an excerpt below:
“The acoustic model is a classifier that labels short fragments of audio into one of a number of phonemes, or sound units, in a given language. For example, the word ‘speech’ is comprised of four phonemes ‘s p iy ch.’ These classifications are made on the order of 100 times per second.
The language model is a probability distribution over sequences of words. The language model helps the system decide among sequences of words that sound similar, based on the likelihood of the word sequences themselves. For example, ‘recognize speech’ and ‘wreck a nice beach’ sound alike but the first hypothesis is far more likely to occur, and therefore will be assigned a higher score by the language model.”
Custom Speech Service is great for applications that have their own terms. So those of us looking to build applications may want to strongly consider it. Which is the perfect segue into the next block of Microsoft Azure’s AI services – Language.
Microsoft Azure’s AI Services – Language
Language Understanding (LUIS)
If you’re looking to incorporate language-based actions into your app, this is the service for you! It uses Bing Speech API (discussed in greater detail below) to translate spoken word to text, which is then processed by LUIS. LUIS deciphers the intents (or actions a user intends to take) and the entities those objects are acting on.
It returns a JSON with all of the information you need to take action in your apps. This service also ties into Azure’s Bot Service so you can quickly create Bots to power your applications.
Know what’s really cool? They’ve taken a similar approach to Google’s DialogFlow and provide what they call “prebuilt domain models.” These models come with intents, entities, and what Azure calls “utterances.” So, building a Bot or adding language understanding to apps is much faster. They offer an impressive 21 domain models.
As a side note, kudos to Microsoft for doing an amazing job in documentation. Everything you need to get up and running with their services is available and easy to access – including an explanation of intents, entities, and utterances, should you be interested.
Bing Spell Check API
Most of Microsoft Azure’s AI Services have names that closely fit their intended purpose. This service is no different. Bing Spell Check ensures words in documents and in web searches are spelled correctly. You feed the service a string of text and it responds with a JSON message, highlighting misspelled words along with the confidence score for that result.
Text Analytics API
I was just talking about how great Microsoft was at naming services. This service might be an exception to that rule :).
Text Analytics API does 3 things: sentiment analysis, key phrase extraction, and language detection – not exactly what I would have guessed from the name “Text Analytics.” The API returns a sentiment score between 0 and 1, where 1 is most positive.
Beyond returning the detected language, the API returns a confidence score between 0 and 1, where 1 is (you guessed it) 100% sure. To date, Microsoft only has a few languages supported for key phrase extraction. But they do have an impressive list supported in preview. Check out the full list here.
Translator Text API
Translator Text API fits under the same umbrella as Translator Speech API – Microsoft Translator. The difference, as you could probably guess, is that this service deals only with text. It detects the language of the text and then performs the requested translation. Microsoft now uses Neural Machine Translation (NMT) for this service, putting them on par with AWS and Google – from a technical approach perspective.
Microsoft Azure’s AI Services – Knowledge
QnA Maker (Preview)
This is Azure’s FAQ service. QnA Maker makes (no pun intended) it refreshingly easy to create an FAQ. You simply point it to an online Q&A source or upload a document (supports .tsv, .pdf, .doc, .docx, and .xlsx) that is well formatted with a table of contents, name your service, and that’s it. Azure does the rest.
This service is meant to be paired with Azure’s Bot Service to power the answers a Bot may need to provide.
Custom Decision Service (Preview)
Suggesting content based on a user’s article reading history has become more and more common across different websites and content providers. Now this capability is available through a simple API call to mere developers like myself, thanks to Custom Decision Service.
Custom Decision Service works to understand the context for the information you need. It learns as it goes to provide better suggested content. It also goes one step further and experiments with new options, so it can adjust to emerging trends.
Project Knowledge Exploration
Targeted toward academics, this service sits in what Microsoft calls Cognitive Services Labs. It takes a natural language request and turns it into a structured query expression to search academic journals. It also has other capabilities that help researchers, including auto-completion.
Project Academic Knowledge
This again, sits in Cognitive Services Labs. It takes a natural language request, then figures out the academic intent and retrieves the information requested, using the Microsoft Academic Graph (MAG). (More on MAG in a minute.)
This service also calculates the similarity between 2 papers, not just in terms of words but also intent.
You may be asking, what is MAG? Microsoft is probably best suited to answer that question:
“The Microsoft Academic Graph is a heterogeneous graph containing scientific publication records, citation relationships between those publications, as well as authors, institutions, journals, conferences, and fields of study. This graph is used to power experiences in Bing, Cortana, Word, and in Microsoft Academic. The graph is currently being updated on a weekly basis.”
Project Entity Linking
This service is great for providing context to information being provided. It identifies “entities” such as the Earth, USA, or University of Florida (my personal favorite 🙂 ). Then it automatically adds a link to each entity’s entry in Wikipedia, automatically giving context.
Microsoft Azure’s AI Services – Search
Bing Autosuggest API
This one is simple. It enables you to provide intelligent type-ahead capabilities in your application. Doesn’t get much simpler to describe than that!
Bing News Search API
Another relatively simple service. This API returns:
- An image for the news article.
- A URL to the article.
- News provider information.
Bing Web Search API
Another simple service. This service searches the web, and returns the number of articles available matching the topic and a subset of those articles by title, link, and last crawled date. You can provide a number of parameters including whether the search should be “safe” and what content you want returned (i.e. News, Images, etc.)
Bing Entity Search API
Keeping in line with the other Bing services, this is another focused service. It returns links and information on entities – businesses, places, people, books, etc – rather than links to websites that mention the entities.
Bing Image Search API
Bing Image Search API returns thumbnails, URLs, metadata, and more after a request has been submitted.
Bing Video Search API
This service returns the name and link to videos matching the search terms. Bing Video Search API also allows for video previews in the results as well as returning a large amount of MetaData including: an image, media attribution, place the video was taken, publisher, etc.
Bing Custom Search API
This service puts the power of Bing Search into developers’ hands. There are 2 primary ways to use Bing Custom Search: Site Search and Custom Vertical Search. Both ways are done in the Custom Search GUI.
Within this GUI you can add a list of websites you want searched by Bing, and only results from those websites will be returned. You can also “pin” a particular website. This ensures that results matching the search term from your “pinned” site are provided first, with other site responses after.
For a site search, you simply enter your websites address, and Bing Custom Search does the rest! For custom vertical search, you simply add the particular websites you want crawled and, again, Bing Custom Search does the rest!
Microsoft even offers a hosted UI for you to use once you’ve built your search. You can make some minor customizations to the look and feel to better match your website. To learn more, watch this great demo video by Mahesh Balachandran
Microsoft Azure’s AI Services – Machine Learning
Workbench is the ultimate tool for the data scientist. It’s a desktop application that allows data scientists to prepare data, build AI models, and review the results. There are a number of helpful views of data provided with this useful application. It comes with Jupyter Notebooks – a must for data scientists.
Workbench also supports what Azure calls “By Example Transformations.” This powerful feature allows you to look at a table with your data in it and, in a column next to your data, type how you want each data element to appear. It’s easier to understand if you see it in action, watch this video. I’m definitely a fanboy!
For a quick Workbench overview, check out this short video:
Azure Machine Learning Experimentation Service
When you combine this service, workbench, and the model management service (detailed in the section below) you get something similar to AWS’s SageMaker. If you are not familiar with that product, please view my blog on AWS’s artificial intelligence capabilities, for a quick summary of its capabilities.
The experimentation service works with Workbench and provides project management, Git integration, access control, roaming and sharing. It allows you to run your models in the exact same environment every time, just with different algorithms.
Then, it records the run history and gives you the ability to visually see the model that best fits your needs. For running experiments, it supports native machine, local Docker, Docker on VM, and a scaled out Spark cluster (more on Spark later).
Azure Machine Learning Model Management Service
This service allows you to deploy your models in a wide variety of environments. It requires the use of CLI commands, which containerize your models into Docker images. Once that is done, you deploy your models to local machines, on-prem servers, the cloud, and IoT edge devices. This service has a lot to offer. You can learn more about it here.
Microsoft Machine Learning Library for Apache Spark (MMLSpark)
This library includes an integration of SparkML pipelines with the Microsoft Cognitive Toolkit and OpenCV. If you are using Workbench and are running your experiments on Docker, you don’t need to do anything else. Workbench automatically employs MMLSpark.
Data Science Virtual Machines (DSVM)
Just as AWS has deep learning AMIs pre-built with components for AI and Machine Learning tasks, Microsoft Azure’s AI Services include pre-built virtual machines ready for AI-work. Azure offers both Linux and Windows varieties with an impressive number of applications already installed, such as Jupyter notebooks, databases (SQL and PostgreSQL), R, Python, etc. You can view the full list at their website here.
Best of all, Azure’s DSVM offers a number of prebuilt examples and sample Jupyter notebooks to get you up and running fast!
This powerful service automates performing machine learning algorithms on multiple sets of data. It spins up resources needed to run calculations in parallel, and then immediately brings the resources down when the task is finished.
This is the go-to service for any programmer looking to run machine learning algorithms on large amounts of data!
And that’s a round-up of Microsoft Azure’s AI Services.
If you want to learn practical tips and strategies for getting the most out of AI, check out our on-demand webinar!