Table of Contents

Wed Apr 10 2024

1. Gemini 1.5 Pro

2. Google has also made improvements in the Gemini API

3. Three major open source tools

4. The first self-developed Arm processor Axion

5. Code completion and generation tool——CodeGemma

6. Open language model ——RecurrentGemma

7. Video editing tool – Google Vids

Google updated a large number of large model products last night

On Tuesday local time, Google released a series of AI-related model updates and products at Google's Cloud Next 2024, including Gemini 1.5 Pro, which provides local audio (speech) understanding functions for the first time, a new code generation model CodeGemma, and the first self-developed Arm processor Axion and so on.

✨ Gemini 1.5 Pro

Gemini 1.5 Pro, Google’s most powerful generative AI model, is now available in public preview on Vertex AI, Google’s enterprise-focused AI development platform. This is Google’s AI development platform for enterprises. The context it can handle increases from 128,000 tokens to 1 million tokens. One million tokens is equivalent to approximately 700,000 words, or approximately 30,000 lines of code. That’s roughly four times the amount of data Anthropic’s flagship model Claude 3 can handle as input, and about eight times the maximum context amount of OpenAI’s GPT-4 Turbo.

Official original text link: https://developers.googleblog.com/2024/04/gemini-15-pro-in-public-preview-with-new-features.html

This release provides for the first time native audio (speech) understanding capabilities and a new file API to make file processing easier. Input modalities are being expanded in Gemini 1.5 Pro, including adding audio (speech) understanding in the Gemini API and Google AI Studio. Additionally, Gemini 1.5 Pro is now able to perform inference on the images (frames) and audio (speech) of videos uploaded in Google AI Studio.

✨ Google has also made improvements in the Gemini API, mainly including the following three contents:

System commands: System commands can now be used in Google AI Studio and Gemini API to guide the model's response. Define roles, formats, goals, and rules to guide the model's behavior for your specific use cases.
JSON mode: Instructs the model to output only JSON objects. This pattern makes it possible to extract structured data from text or images. cURL is now available, with Python SDK support coming soon.
Improvements to function calls: You can now select modes to limit the output of the model to improve reliability. Select text, function calls, or just the function itself.

Additionally, Google will release a next-generation text embedding model that outperforms similar models. Starting today, developers will be able to access next-generation text embedding models through the Gemini API. This new model, text-embedding-004 (text-embedding-preview-0409 in Vertex AI), achieves stronger retrieval performance on the MTEB benchmark and outperforms existing models with comparable dimensions. On the MTEB benchmark, Text-embedding-004 (aka Gecko) with 256 dims output outperforms all larger models with 768 dims output

However, it’s important to note that Gemini 1.5 Pro is not available to anyone without access to Vertex AI and AI Studio. Currently, most people engage with Gemini language models through the Gemini chatbot. Gemini Ultra powers the Gemini Advanced chatbot, and while it's powerful and can understand long commands, it's not as fast as Gemini 1.5 Pro.

✨ Three major open source tools

At the Google Cloud Next conference in 2024, the company launched several open source tools mainly to support generative AI projects and infrastructure.

1.Max Diffusion

It is a collection of reference implementations of various diffusion models that run on XLA (Accelerated Linear Algebra) devices. GitHub address: https://github.com/google/maxdiffusion

2.Jetstream，

A new engine for running generative AI models. Currently, JetStream only supports TPU, but may be compatible with GPU in the future. Google claims that JetStream can deliver up to 3x the price/performance of models like Google’s own Gemma 7B and Meta’s Llama 2. GitHub address: https://github.com/google/JetStream

3.MaxTest

This is a collection of text generation AI models targeting TPUs and Nvidia GPUs in the cloud. MaxText now includes Gemma 7B, OpenAI’s GPT-3, Llama 2 and models from AI startup Mistral, all of which Google says can be customized and fine-tuned to developers’ needs. GitHub address: https://github.com/google/maxtext

✨ The first self-developed Arm processor Axion

Google Cloud has announced its first in-house developed Arm processor, called Axion. It's based on Arm's Neoverse 2 and is designed for data centers. Google says its Axion instances perform 30% better than other Arm-based instances from competitors like AWS and Microsoft, and are up to 50% better in performance and 60% more energy efficient than corresponding X86-based instances.

Google emphasized during Tuesday's launch that because Axion is built on an open foundation, Google Cloud customers will be able to bring their existing Arm workloads to Google Cloud without any modifications.

However, Google has not yet released a detailed introduction to this.

✨ Code completion and generation tool——CodeGemma

CodeGemma is based on the Gemma model and brings powerful yet lightweight coding capabilities to the community. The model can be divided into a 7B pre-trained variant that specifically handles code completion and code generation tasks, a 7B command-tuned variant for code chat and command following, and a 2B pre-trained variant that runs fast code completion on the local computer. Variants.

CodeGemma has the following advantages:

Intelligent code completion and generation: complete lines, functions, and even generate entire code blocks, whether you are working locally or in the cloud;
Higher accuracy: CodeGemma mainly uses English language data of 500 billion tokens from online documents, mathematics and codes for training. The generated code is not only more grammatically correct, but also more semantically meaningful, helping to reduce errors and debugging time. ;
Multi-language capability: supports Python, JavaScript, Java and other popular programming languages;
Streamline your workflow: Integrate CodeGemma into your development environment to write less boilerplate code and write important, interesting, and differentiated code faster.

For more technical details and experimental results, please refer to the paper released simultaneously by Google. Paper address: https://storage.googleapis.com/deepmind-media/gemma/codegemma_report.pdf

✨ Open language model ——RecurrentGemma

Google DeepMind also released a series of open weight language models - RecurrentGemma. RecurrentGemma is based on the Griffin architecture, which enables fast inference when generating long sequences by replacing global attention with a mixture of local attention and linear recurrences. Technical report: https://storage.googleapis.com/deepmind-media/gemma/recurrentgemma-report.pdf

RecurrentGemma-2B achieves superior performance on downstream tasks, comparable to Gemma-2B (transformer architecture). At the same time, RecurrentGemma-2B achieves higher throughput during inference, especially on long sequences.

✨ Video editing tool – Google Vids

Google Vids is an AI video creation tool that is a new feature added to Google Workspace. Google says that with Google Vids, users can create videos alongside other Workspace tools like Docs and Sheets, and collaborate with colleagues in real time.

✨ Enterprise-specific code assistant——Gemini Code Assist

Gemini Code Assist is an AI code completion and assistance tool for enterprises, benchmarked against GitHub Copilot Enterprise. Code Assist will be available as a plug-in for popular editors such as VS Code and JetBrains.

Code Assist is powered by Gemini 1.5 Pro. Gemini 1.5 Pro has a million-token context window, allowing Google's tools to introduce more context than competitors. Google says this means Code Assist can provide more accurate code suggestions and the ability to reason about and change large chunks of code.

"Code Assist enables customers to make large-scale changes to their entire code base, enabling AI-assisted code transformations that were previously impossible," Google said.

✨ Agent Builder - Vertex AI

AI agents are a hot industry development direction this year. Google has now announced a new tool to help enterprises build AI agents – Vertex AI Agent Builder.

Thomas Kurian, CEO of Google Cloud, said: “Vertex AI Agent Builder makes it extremely easy and fast to build and deploy production-ready, generative AI-powered conversational agents that can guide intelligence in the same way humans can. body to improve the quality and accuracy of model generation results."

Reference links:

https://techcrunch.com/2024/04/09/google-open-sources-tools-to-support-ai-model-development/
https://developers.googleblog.com/2024/04/gemma-family-expands.html?utm_source=twitter&utm_medium=unpaidsoc&utm_campaign=fy24q2-googlecloudtech-blog-next_event-in_feed-no-brand-global&utm_content=-&utm_term=-&linkId= 9603600