Multimodal AI models are bound to change everything

Multimodal AI models are bound to change everything

As we look back on one of the most significant years in the history of enterprise tech, we consider why all the excitement around multimodal AI models is the real deal, and how it’s going to have major impacts across industries and applications. To quote our author, Phil Moyer, Global VP for AI and Business Solutions, “This is the moment where more and more leaders will not only see new uses for gen AI — they’ll start using it themselves for nearly everything.”


As I look back at the start of the year, it’s hard to believe how much has happened since generative AI first exploded into the mainstream. This month alone, Google launched Gemini, the AI Hypercomputer, and Duet AI for Developers (now generally available), the latest among dozens of gen AI products and hundreds of gen AI updates we released in 2023. The pace is frankly astonishing.

This accelerated innovation is everywhere. At Google Cloud, the number of active gen AI projects on Vertex AI has grown more than 7X. Gemini is already supercharging the Vertex AI platform, giving developers the power to build sophisticated AI agents, and it’s coming soon to our Duet AI portfolio, so customers have AI assistance whenever and wherever they need it. There’s also been an explosion of activity in the open-source gen AI world along with many outstanding models from organizations across the industry — it’s truly an exciting time.

This year, much of the attention has been on novel consumer applications and clever experiments at larger enterprises. With multimodal models like Gemini, we expect to see more serious and significant advances across industries.

What’s more, we started 2023 with most models confined to their training data — but now, we have robust solutions to fine-tune models and connect them to external and proprietary sources, letting organizations apply the intelligence of AI models across their data. From enabling question-answer chatbots that span a company’s enterprise data to synthesizing and analyzing diverse information, these capabilities are enabling remarkable use cases.

Not to sound hyperbolic, but the first few times I used Gemini felt like that magical “Eureka” moment. And I can’t wait for everyone else to have theirs. This is the moment where more and more leaders will not only see new uses for gen AI — they’ll start using it themselves for nearly everything.


Multimodality unlocks advanced reasoning

Gemini was built from the ground up to be multimodal, which means it can generalize, understand, operate across, and combine many different types of information simultaneously — whether it’s text, code, audio, image, or video.

For instance, you can ask Gemini: “Over the past five years, what was this bank or this online retailer’s cash dividend payout ratio?”

Payout ratios show the proportion of a company’s earnings are given to shareholders as dividends in relation to its total earnings. In order to provide an answer, a model will need to understand all the different definitions of cash, cash equivalents, and dividends and be able to apply them within the mathematical concept of ratios. It will also need to accurately retrieve financial information from outside systems for the last five years and access other AI models to calculate the ratio.

Multimodality is the difference between models that can predict the next word (or words) in a sentence and more sophisticated models that both understand and act on information across different data types. To answer the question above, a model not only has to understand a question but also distinguish mathematical concepts like equations and retrieve the specific elements needed — two things that weren’t possible less than a year ago.

Models like Gemini indicate that we’re about to enter an entirely new era of gen AI that will take us closer to true language understanding, where systems can synthesize across many different types of data and create even more business value across industries.

It also means the applications across domains and real-world environments are that much stronger, since models like Gemini can tackle so many more situations. Gemini Nano, our mobile-sized model that can operate on-device, creates powerful opportunities to run AI at the edge, meaning data can be securely analyzed and responded to faster and with limited connection. These mobile-first models can enhance tasks as diverse as emergency services, mobile banking, or augmented gaming.


Blending information to solve real-world problems

Multimodal capabilities also offer organizations new ways to merge different types of data to tackle challenges in the physical world. Many industries face unstructured, unexpected problems that may not be possible to solve through a single mode of analysis or limited data sources.

For instance, improving safety on construction sites requires analyzing and combining many different types of information. A company might have visual data like video feeds or images, incident reports from construction sites, or other types of data like financial costs or timeline delays. Multimodal gen AI models can help blend together all of this information and understand where, when, and how accidents are most likely to occur and create safer, more efficient approaches.

Or consider an airline mechanic trying to evaluate an engine that’s making an unusual sound when accelerating. The mechanic could take a video with sound and then describe by voice a few other details. A gen AI app could help consider all these modalities of information and retrieve relevant information from that specific craft’s technical handbook, helping the mechanic to quickly identify the problem and come up with solutions for how to fix the issue.

Continue reading on Transform with Google Cloud to get even more examples on the potential uses of multimodal models across a range of industries.

Fausto Patricio Endara Jiménez

Gerente general en TECNIBIO S.R.

5mo

                       TECNIBIO S.R. ECUADOR.   PROVEEDOR DE BIENES Y SERVICIOS UTILIZANDO PRODUCTOS Y PROCESOS BIOTECNOLÓGICOS PARA DESARROLLAR NEGOCIOS AMBIENTALES A TRAVÉS DE PROYECTOS SOSTENIBLES.             BIOTECNOLOGÍAS Y DESCONTAMINACIÓN.    NEGOCIOS      AMBIENTALES - PROYECTO SOSTENIBLES: >. TRATAMIENTO BIOLÓGICO DE AGUAS RESIDUALES CONTAMINADAS.PRODUCCIÓN DE BIOMASA. >. RECICLAJE BIOLÓGICO DE DESECHOS ORGÁNICOS SÓLIDOS (Biomasa – Basura orgánica) CONTAMINANTES. PRODUCCIÓN DE ABONOS ORGÁNICOS. >. UTILIZACIÓN DE LOS ABONOS ORGÁNICOS EN LA AGRICULTURA CONVENCIONAL Y EN EL PROCESO HACIA LA AGRICULTURA ORGÁNICA.   >. PROCESOS DE BIORREMEDIACIÓN Y DESCONTAMINACIÓN EN VERTEDEROS CON DESECHOS ORGÁNICOS SÓLIDOS (LODOS) CONTAMINANTES.    https://openbadgefactory.com/v1/assertion/445949ffe799b1e0e2c74c51225823a14808dda1 ¡Felicidades! Ha ganado una credencial de Academia BID.    https://connectamericas.com/es/company/tecnibio-sa                                                                                ConnectAmericas.BID.    Fausto Patricio Endara Jiménez. Gerente General TECNIBIO S.R. Emails:biopatricio@yahoo.combiopatricio@hotmail.com Móvil.0990526264. Ecuador. BIOTECNOLOGÍAS Y DESARROLLO SOSTENIBLE.

Like
Reply
Hảo Trần

Attended University of school

6mo

CODE hi88 ha0dua89

Like
Reply
Sam Dinesh T D

Junior Data Scientist Intern @ Zummit Infolabs | Ex - Intern @ Tata Consultancy Services | M. Tech, Data Science @ Rajalakshmi Engineering College | PG Diploma in Data Science & Analytics @ NIELIT | B. Tech, ECE @ KITS

6mo

Absolutely riveting insights from Philip Moyer on the transformative impact of multimodal AI models! The pace of innovation at Google Cloud in 2023, especially with the launch of Gemini and Duet AI for Developers, is nothing short of astonishing. Looking forward to witnessing more groundbreaking developments in the coming year! 🚀 #AIInnovation #MultimodalAI #GoogleCloudTransformations

Like
Reply
Bruno Araujo

CEO na Corecorp.it | Monitor IA generativa e Chatbot de vendas

6mo

Help.... Can anyone ggl help me? I'm defending a thesis on linguistics, real and imaginary semiotics as a solution to the lack of sense of humor and mockery of AI... they misdirected me and

Like
Reply
Michael Baumstark

IT Consultant and Technician @ CONOVEX // I create solutions for everyday business challenges

6mo

AI and ML technologies are going to change the world! An interesting time to be alive 😎

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics