Google Gemini: Marketing Tool or Multimodal Power?

Google Gemini, the next fearsome AI model, has received a lot of attention after a video presentation of its alleged multimodal capabilities. However, observers discovered that certain portions of the film contradicted Gemini’s actual performance, prompting concerns about the company’s marketing strategy. This article will look at the expectations and actuality of Google Gemini. More about Bard vs. Gemini read in our blog post What’s Changed and Is it Better?

In the promotional video, Gemini was portrayed as an AI capable of reacting to voice instructions, recognizing user artwork, and interacting with the surroundings.

At first glance, this material appears to be rather amazing. The video showcases an intriguing process in which Google Gemini recognizes user drawings. From initial sketches determining it’s a bird, identifying the species (duck), noticing the blue color on the duck, and concluding it’s a rarity among ducks. Furthermore, this AI has a sense of humor; for example, when a rubber, blue duck shows on the screen, Gemini responds with a funny “What the quack!” while explaining the material it’s made of. Placed on a world map, Gemini notes that a duck wouldn’t survive there.

Additionally, the AI demonstrates creativity by creating games, recognizing “rock-paper-scissors,” and performing magic tricks. It surprisingly excels in finding similarities between two connected or seemingly unrelated objects. Its impressive ability to detect the movement of paper under a plastic cup or predict a dot-to-dot drawing of a crab before completion showcases its capabilities.

In summary, Gemini is shown as an extremely intelligent system capable of distinguishing things. Some of them are correlating forms, predicting events, merging musical instrument pictures with their sounds, and much more. These advanced capabilities set it apart from other artificial intelligence models.

The truth, however, is quite different. The questions were written ant the replies weren’t created in real time, nor were they voice-activated. As Parmy Olson reported for Bloomberg, Google admitted that things didn’t go exactly as portrayed. Instead, researchers used static images input into the model and combined successful responses, partially showcasing the model’s capabilities.

“We created the demo by recording material to test Gemini’s abilities in various challenges”, a Google representative explained. “Then we prompted Gemini using static images from the recorded material and encouraged it through text”, he added.

Google recorded human hand activities, showed them to Gemini one by one as static pictures and spoke with the model via text (not voice). Then it picked the finest interactions to compile into the film using speech synthesis.

“We created the demo by recording material to test Gemini’s abilities in various challenges. Then we prompted Gemini using static images from the recorded material and encouraged it through text.”
Google representative

The video description emphasizes latency reduction and shortening of Gemini’s reactions and responses for greater conciseness, crucial for the purposes of this demonstration material. Oriol Vinyals, Vice President of Research and Deep Learning at Google DeepMind and a key figure in leading the Gemini project, noted that the demo video actually illustrates ‘…how multimodal user experiences created with Gemini could look like.’

Processing static pictures and text through huge language models currently needs enormous computer resources, making real-time video interpretation impracticable. For artificial intelligence scientists, this was a major signal that the video depicted the situation in an unworthy manner.

The edited processes and “cherry-picked” moments created an impression that Gemini could do much more than it was actually able.

Exploring image sequences really lets you dig into how Gemini reasons over time. Let’s try a classic magic trick.
Two open hands and in right one is a coin. Explain what I'm doing.
Gemini: You are holding a coin in your right hand. Hands are turned down and under right one is a coin. Where do you think the coin is?
Gemini: The coin is in your right hand.
When right hand is up there is no koin. Look here, what happened?
Gemini: The coin is gone.
Summarize what may have happened, step by step, starting with the first image.

Gemini:
1. You started with a coin in your right hand.
2. You then made a motion with your hands, and the coin disappeared.
3. You then showed that the coin was gone.
Gemini is leveraging the entire context of this conversation – both text and image – to summarize its answer. Finally, let’s ask:

Can you explain how it disappeared?
Gemini: The coin was palmed.

Gemini, according to Google, will be the most powerful AI ever created. It will be capable of language understanding, multimodal interactions, visual interpretation, code production, data management, and analytics. Google plans for Gemini to serve as a crucial component across the majority of its products and services.

Google highlighted multimodality as a fundamental aspect of Gemini from the start. This capacity to handle many sorts of information at the same time, such as text, graphics, and sound, constitutes a significant advancement over current AI models.

Gemini comes in three sizes - Ultra, Pro and Nano

Source: https://www.techopedia.com/google-gemini-goes-live-heres-what-to-expect

Google intends to give developers access to Gemini, allowing them to build their own AI apps and APIs. Unlike past models with restricted access, this marks a departure by opening up to the development community.

The parameters of an AI model are an important aspect in determining its strength. While ChatGPT 4.0 contains 1.75 trillion parameters, Gemini has between 30 and 65 trillion parameters, according to claims. This parameter increase signifies potentially revolutionary power for Gemini.

Google Gemini in Practice

Despite marketing claims, the true potency of Gemini is yet unknown. According to independent reports, Google spent a large amount of money training its model with powerful hardware and vast volumes of data. However, the question of how much real value people would notice in regular use remains unanswered.

While Google Gemini claims to revolutionize AI, problems and differences between marketing promises and the model’s real capabilities raise concerns. The question arises regarding how much power Gemini has truly achieved.

Gemini’s picture recognition skills are noteworthy when seen in isolation and presented more precisely (as shown in this Google blog) . It seems comparable to the multimodal GPT-4V (GPT-4 with vision) artificial intelligence model from OpenAI. It can also recognize the content of static images. However, when completely integrated for promotional reasons, it gives the appearance that the Gemini model is more competent than it actually is. It is generating excitement among many people.

In short, a more accurate portrayal of Gemini’s specific skills, devoid of marketing gimmicks, would reveal that they are very outstanding. However, the advertising editing gave the impression of larger powers than it actually has, piqued the public’s attention.

What exactly is Google Gemini?

Google Gemini demonstrated its powerful AI capabilities in a video presentation, drawing attention to its multimodal features.

In the promotional video, how was Gemini portrayed?

Gemini portrayed itself as an AI with extraordinary powers, including responding to voice commands, recognizing user artwork, and displaying a sense of humor.

Were the talents depicted accurate?

The video gave false impressions. While the interactions appeared excellent, Gemini scripted them and did not execute them in real-time or through voice activation.

How did Google respond to the video’s concerns?

Google acknowledged departures from the presented reality, confirming the demonstration’s use of static visuals and staged interactions.

What makes Gemini unique among AI models?

Gemini receives attention for its exceptional intelligence, as it distinguishes objects, predicts events, and merges images with sounds.

Does Google intend to incorporate Gemini into its products?

Yes, Google intends to include Gemini into the bulk of its products and services.

In terms of access, how does Gemini differ from previous models?

Unlike previous models with limited access, Google aims to provide developers unrestricted access to Gemini for the development of AI apps and APIs.

What distinguishes Gemini in terms of parameters?

Gemini has between 30 to 65 trillion parameters, implying revolutionary strength when compared to AI models such as ChatGPT 4.0.

What is the current understanding of Gemini’s real potency?

The entire power of Gemini is yet unclear, raising concerns about the real-world value users might get.

What is your conclusion on Gemini’s capabilities?

While Gemini’s image recognition abilities are impressive, there are reservations about how much power Gemini has genuinely gained, casting doubt on its promised capabilities.

Google Gemini: Marketing Tool or Multimodal Power?

Google Gemini in Practice

Recent Posts

Recent Comments

+385 99 3462 542

info@resplitcowork.com

Cvite Fiskovića 4, 21000 Split

Working hours

Convert time between timezones

Google Gemini: Marketing Tool or Multimodal Power?

Introduction

Misleading Perceptions

Google Gemini: The Most Powerful AI Yet?

Google Gemini in Practice

Conclusion

Recent Posts

Recent Comments