Battle of the AI Titans Part 2 – Google’s AI Services
This blog is the third installment of our AI series. The first two in this series discussed AI use cases and technologies supporting AI, as well as the AI services Amazon is offering. Today we focus on Google’s AI services.
Google’s AI Services
As we mentioned in our last blog, each AI Platform provider has strengths based on their heritage. An example is AWS’s Comprehend service which is informed by perhaps the richest source of reviews and feedback – comments on products for sale on Amazon. When thinking about Google’s heritage as well as recent investments, we can guess some obvious strengths: search, image recognition, and language translation. But there’s a lot more. This blog will discuss each of Google’s AI services.
Oh, warning to those of you with sensitive stomachs – I geek out a bit on the section around Google TPUs. So brace yourselves.
Let’s dive in!
Cloud AutoML – Alpha
AutoML is currently in Alpha, so it is not a full-fledged product offering just yet. But don’t let that stop you! If you are looking for image recognition technology – AutoML is worth a look.
Google’s vision is to offer a suite of ML products/services giving developers the ability to create high-quality ML models. The first in the suite is AutoML Vision – image recognition. This service is built on top of Google’s proprietary image recognition technology – arguably one of the most tested in the world.
AutoML Vision doesn’t stop at AI. It brings in a human element to help drive even more value. If you haven’t done the groundwork and provided labels and images to train AutoML Vision, you can crowdsource to humans to generate tags for the images you supply.
TPU stands for Tensor Processing Unit. This is the processing unit that powers Google’s internal AI capabilities. Google is now making the power of TPUs available to the masses.
Excuse me while I geek out for a moment. If you are an electrical engineer or computer engineer, you’ll be able to follow along quite easily. For those who aren’t, I’ll try to put the beauty of the TPU in non-technical terms.
If you just want the “so what” without the back story, Cloud TPU is estimated to be 15 – 30x better in performance, with much less energy use (30 – 80x performance/watt improvement) than virtual machines powered by traditional CPUs. Which means, as a user of a Google AI service, you get more performance/dollar than with other services.
Want to know why? Follow me down the rabbit hole :)!
There are 4 aspects of TPUs to consider.
Let’s start with Machine Learning models in the context of a neural network. Think of a neural network as simply a group of nodes that form a network to make a decision.
At each node, we multiply data by weights and add the results. Then, based on the result, we have to decide whether that “neuron” is “on” or “off.” The simplest way to do that is by using a step-function comparing the result to a set value. If the result is greater than that number, then the neuron is “on.” If not, it is “off.”
For a variety of reasons, this simplistic step function is not enough to tell us the combined results across the neurons in the network. So, we use more complex functions – activation functions.
At each neuron, we multiply the data by the weight, add the results, then apply an activation function.
What’s interesting about this approach, is that the level of detail needed at each neuron is not that high. As Google’s blog points out, if you are trying to decide if it is raining outside, you don’t need to know how many droplets are falling per second, just if they are falling or not. Which means that the level of accuracy (in math terms, think number of decimal points) at each neuron doesn’t need to be that great. Following me?
Typical CPUs and GPUs operate on 32 or 64 bits. But if you are doing basic calculations that don’t need to go to the 30th decimal point, you don’t need 32 or 64 bits. You can get away with good old fashioned 8-bits. And that is how Google has architected their TPUs.
There are different architectural designs for building CPUs. The RISC style is common and focused on simple instructions used by most applications. Google, instead, uses the CISC style which focuses on more complex tasks. This makes the chip not as useful across all applications, but very useful for the tasks it was designed for—like artificial intelligence.
CPU, GPU, TPU – oh my!
TPUs do what Google refers to as matrix processing. CPUs are designed for Scalar processing – or one operation per instruction. GPUs are known as vector processors. They can execute operations concurrently, resulting in hundreds to thousands of operations in a single clock cycle. TPUs are designed for matrix processing, which provides hundreds of thousands of operations per clock cycle (i.e. many more operations per clock cycle than GPUs).
The Romantic Processor
TPUs are built to use what is known as a systolic array, where the data flows through the array in a wave—like blood flows through a heart. Hence the “Romantic Processor” reference :). This approach requires vastly less memory and power. (For a detailed explanation, check out Google’s blog, here).
When you add all of those pieces together, you get a high-performing, energy-efficient processor, honed for artificial intelligence.
Google Video Intelligence
Video is also in Google’s wheelhouse, thanks to YouTube. Google Video Intelligence allows you to search your videos with specific terms. For example, if you are looking for videos that feature cats, you simply search for cats, and videos with highlights of where cats appear are delivered to you.
Google Video Intelligence is powered by over 20,000 tags, so Google is likely to support any classification you need. Google Video Intelligence also provides recommended content and identifies adult-themed content.
Aside from the detection of where specific objects appear in a video, the coolest feature of this service is its ability to serve ads at the right time – i.e. triggered by tags appearing in a video. Additionally, the service also offers the ability to transcribe videos.
Google Vision API
Google Vision API is the sister service of Video Intelligence. It brings intelligence to pictures—a lot of intelligence:
- Label Detection—identifies what is in the photo.
- Face Detection—can tell you whether someone is sad, angry, or surprised based on their face.
- Optical Character Recognition (OCR)—identifies not only the content of text in an image, but also the language and location of that text—even in handwritten notes.
- Explicit Content— self-explanatory.
- Landmark Detection—flags common landmarks in photos and returns not only the landmark but the longitude and latitude.
- Logo Detection— self-explanatory.
- Crop Suggestion—suggests where to crop an image.
- Web Annotations—search the web for more information on your image. The API returns annotations from across the web to enrich the information about the image. This capability is also useful for determining where the same image is used across the internet, to help with copyright concerns.
- Document Text Annotations—this capability is specifically designed for images of documents, and can provide more accurate annotations.
Sara Robinson, a Developer Advocate at Google, highlights a lot of this in the following short demo:
Cloud Translation API
At the beginning of this blog, I referenced some of Google’s inherent strengths including its translation capability. Google has been offering translation services via Google Translate for years (I used the service in 2015 during a trip to Europe).
Over time, these abilities were honed, and now Google is making this same technology available to humble developers like you and me! Here’s a quick rundown of some of the notable features supported by Google Cloud Translation API:
- Dynamic Translation—provide a string of text in any supported language, and Google Translation will translate the text in near real-time.
- Support for more than 100 Languages—talk about global! Google’s Translation service supports over 100 different languages. It provides both phrase-based translation for all languages and neural machine translation for a subset of language pairs.
- Language Detection—self-explanatory.
If you are curious about the difference between phrase-based and neural machine translation, Systran’s (a specialist in language translation via machine learning) excellent blog article on the topic can be found here.
Cloud Natural Language
This service extracts key data from a block of text. It can provide sentiment analysis as well as information about the key subjects of the text – people, places, events, etc. You could easily make the case that this is another one of Google’s strengths—considering that they power the most used search engine on the planet.
Cloud Speech API
Cloud Speech API is an audio-to-text service. I’d argue this is another natural area of strength for Google, thanks to the phrase “Ok Google.” Similar to Google Cloud Translation API, this service supports over 110 languages and variants. Key capabilities include:
- Speech in Transit and at Rest—Google Cloud Speech API can return partial results as it transcribes, so text appears as the words are spoken. If you want to do a bulk transcription, Google Speech API can accept and transcribe from audio files as well.
- Noisy Environments—supports speech coming from a wide variety of noisy environments.
- Context-Aware—supports “word hints” with each API call, which allows developers to pass along meta-data about where the API request is coming from (i.e. which app).
There’s a good demo of Google Cloud Speech API by Sara Robinson, included below. But beware. It’s all command-line and code-driven, developer-centric information. So, it’s not for the faint of heart :).
Dialogflow – Beta
Dialogflow is Google’s chatbot platform. It was originally developed by a start-up – API.ai which was acquired by Google in 2016. Dialogflow supports both voice and text interaction across several messaging platforms (including Facebook Messenger, Kik, Slack, Telegram, Viber, and Skype).
For me, one of the best parts of Dialogflow’s approach is that they provide pre-built agents specializing in particular areas. You use the pre-built agents as the template for your custom agent.
For example, interacting with a car’s electronic systems, converting currencies, data calculator and holiday lookups, etc. The idea is to help developers get up and running with their chatbots even faster—without having to code.
Cloud Job Discovery
If you read my previous article on AWS’s AI services, you may remember the award I gave to one of their services: “closest-to-TV-show.” Google is also receiving the dubious honor of an award – the “most-niche-AI-service” award. It’s an interesting application of their machine learning technology, and has its roots in an initiative called “Google for Jobs.”
The Google for Jobs initiative is, as Google says, “a Google-wide commitment to help people find jobs more easily.” It uses machine learning to better understand what jobs are available and match those jobs to the jobs users are seeking. For example, if a job posting listed “bus. Development” instead of “business development,” Google’s Cloud Job Discovery API is smart enough to know that the user actually meant “business development.”
What this means is that job seekers and recruiters no longer have to agonize over coming up with the perfect terms. Google’s Cloud Job Discovery API helps recruiting platforms overcome human errors in the job search.
If you’re interested, you can try it out yourself without any coding required! Just open up a Google search engine and type a job search. Results immediately populate, localized to your area.
That’s a wrap-up of Google’s AI services.
Apologies for geeking out on Google Cloud TPUs. But you have to admit, they’re pretty cool! If you haven’t read it yet, I strongly encourage you to read the initial blog in this series outlining AI and ML.
Next week we’ll talk about the last giant in AI as a Service – Microsoft Azure. Stay tuned!