2023 is viewed as the starting year of AI computing power investment. Google released its most powerful general generative AI model: Gemini on December 6. Google combined two departments, Google Brain and DeepMind, for this project.
What ‘s so good about the Google Gemini Model
The first version of Gemini supported multi-modality. Gemini’s most obvious advantage comes from its ability to understand and interact with video and audio. Unlike OpenAI, Google did not train separate models for images and speech, but it built a multimodal model from the start based on a large amount of video and audio data from Google’s YouTube.
As a general-purpose model, Gemini can recognize the native multimodality. In other words, Gemini can directly recognize image or video information. If it is non-native, then after the model sees the image, it needs to translate the image into text information before recognizing it.
What does this mean?
Let’s say you’re going to do an anti-AIDS campaign. The Gemini model will not only give you the text version of the material but will also be able to generate the corresponding images and videos.
One thing that distinguishes humans from animals is the ability to use tools. Google also wants to train Gemini into a model that can use tools. At present, it is still in training. Google has integrated generative AI model into more than 25 of its existing products, including Gmail, Google Docs, the search engine, YouTube and more.
Gemini is very efficient in terms of tool and API integration. For example, we want to find tourist information about Niagara Falls. If the search engine is integrated into the model, it will automatically search for relevant information. If Gemini also integrates a professional calculator, combined with the model’s understanding regarding each customer’s consumption and travel habits, it is possible to plan travel routes in advance and estimate the expenses required. If the model is also embedded with various text editors, image editors, or even video editors, the system can also help you automatically generate travel photos and videos. Isn’t it exciting?
The Gemini Family
Gemini is not an AI model, but a family. It has a lightweight version called Gemini Nano, which runs natively offline on Android devices. Gemini Namo comes in two tiers, Nano 1（1.8B）and Nano 2（3.25B）, which are geared towards low and high RAM devices, respectively. If you bought Google’s Pixel 8 Pro, the Gemini Nano is built-in, which means you already have an AI-powered smartphone.
There is a more powerful version in the Gemini family called Gemini Pro, whose main function is to power AI services. It is the backstone of Google’s large language model chatbot Bard. Gemini Pro is comparable to GPT 3.5. The features are not necessarily better, but Google has optimized it for “cost and latency”.
There is a third, more powerful model in the family called Gemini Ultra. What’s marketed as Gemini exceeding the GPT 4 in every benchmark refers to Gemini Ultra. It has been specifically designed by Google to run in data centres, so it won’t be installed on home computers. Gemini Ultra is still undergoing a red team security review but will launch in early 2024 on a new version of Google’s chatbot, Bard Advanced.
Case Study — Wendy’s
Wendy’s Burger has partnered with Google Cloud in 2021 to develop its generative AI platform, FreshAI. The AI system learns and adapts to different expressions and accents of different people, thereby enhancing the efficiency and accuracy of food ordering.
Generative AI is more real-time and flexible than traditional AI, such as understanding a customer’s true intent, personalizing each order, and making small talk with customers in a relaxed tone.
Currently, the platform is being tested in the Columbus, Ohio, and according to Wendy’s, the FreshAI platform has a service time that is 22 seconds faster than the market average. The accuracy rate of orders is 86% on average without employee interaction. If the test results improve further, Wendy’s Burger will expand FreshAI to other channels, including the company’s app, in-restaurant kiosks, mobile devices, smart home devices and more.