On December 6, Alphabet, Google's parent company, introduced Gemini, its most substantial and versatile AI model to date, aiming to compete with OpenAI's GPT-4 and Meta's Llama 2 in the emerging field of artificial intelligence (AI). This marks the initial AI model after the consolidation of Alphabet's AI research divisions, DeepMind and Google Brain, into a unified entity named Google DeepMind, overseen by DeepMind CEO Demis Hassabis.
Gemini, built from scratch, is "multimodal," enabling it to comprehend and process diverse information types—text, code, audio, image, and video simultaneously. The model comes in three variants: Ultra (for intricate tasks), Pro (for a broad range of tasks), and Nano (for on-device tasks). Sundar Pichai, Alphabet CEO, termed this as a monumental science and engineering effort for the company, reflecting the vision set upon the formation of Google DeepMind.
Gemini Pro will be accessible to developers through the Gemini API on Google AI Studio and Google Cloud Vertex AI starting December 13. Gemini Nano will be available to Android developers via AICore, introduced in Android 14, initially on Pixel 8 Pro devices. Gemini Ultra is in early experimentation with select users, developers, partners, and safety experts, with broader availability to developers and enterprises expected in early 2024.
Gemini will be integrated across Google's products, with Bard utilizing a fine-tuned version of Gemini Pro for advanced reasoning, planning, and understanding starting December 6. The Gemini Nano will power new features on Pixel 8 Pro smartphones. In terms of performance, Gemini Ultra surpasses current benchmarks on 30 out of 32 widely-used academic benchmarks in large language model (LLM) research. It outperforms human experts on MMLU (massive multitask language understanding) benchmark, covering 57 subjects. Gemini Pro outperforms GPT-3.5 in six out of eight benchmarks before its public launch.
Hassabis highlighted Gemini's flexibility, capable of running efficiently from data centers to mobile devices, enhancing how developers and enterprises leverage AI. Gemini's multimodal reasoning enables it to extract insights from vast amounts of documents, understand nuanced information, answer questions on complex topics, and generate high-quality code in popular programming languages. Gemini's capabilities come with added protections, incorporating safety policies and AI principles to address potential risks, with ongoing collaboration with external experts and partners for comprehensive stress testing.