Technology

Gemini Spark agent wowed—and then demanded oversight

Google’s new “24/7” Gemini Spark AI agent can draft emails, pull details from Drive, and set up calendar events after just a few minutes of prompting. But in a hands-on test, the agent also missed key personalization, linked to the wrong content, and required

On Friday afternoon, a “24/7” AI assistant did something that feels almost too slick to be real. In minutes. Gemini Spark assembled grocery totals from a 2026 spreadsheet. drafted an email in Gmail. and even addressed the recipient by her first name—despite the fact that her email address doesn’t include her first name.

It was a moment of genuine disbelief. “Wow, that’s actually nuts,” the tester said after getting the output shortly after running the prompt.

Google has been selling Spark as an agent that can take on multi-step tasks in the background—so you can put your phone down and walk away from your computer. The company’s own Spark site pitches it as “always under your direction. ” “you choose to turn it on. ” and “it’s designed to check with you before taking major actions.” The promise is control plus autonomy.

In practice, the biggest surprise wasn’t that Spark could do tasks. It was how often the tester still felt forced to stay close to make sure it behaved.

The initial test: groceries, privacy questions, and a near-perfect draft

At Google I/O, a demo by VP Josh Woodward showed Spark doing several things on command, including asking Spark to draft an email to a team at Google that compiles Gemini Live launches and “wins from last week,” then using a special AI skill to make the email sound like him.

The tester tried to push the setup farther at home. They asked Gemini to draft an email to their wife that compiles their total monthly average grocery spending in 2026.

The prompt was also designed as a three-part stress test: whether Spark could figure out who the wife was without being given her name. whether it could find the relevant budget spreadsheet in Drive even though the file name doesn’t include “budget. ” and whether it could actually draft the email in Gmail.

Spark answered fast and specifically. It found the wife’s email address. pulled the right information from the 2026 budget spreadsheet. and took the monthly grocery totals—including incomplete data from May that still wasn’t finished when the test was run. It then averaged the totals and placed everything into a draft email in Gmail.

The draft addressed the wife by her first name, included a sign-off the couple uses for each other, and did it all without the wife’s first name being present in the email address used.

The same push, different result

When the tester moved to Woodward’s next onstage example—planning a block party—the attempt didn’t land.

Instead of producing a clean plan. Spark created a table of friends and family as a “highly realistic reference for who is bringing what. ” drafted an email in Gmail that mentioned a shared sign-up sheet that doesn’t exist. and created an ugly deck with slides detailing information about city permits.

To see whether Spark could correct its own misfire. the tester asked it to create the missing sign-up sheet and add a link to the email that was already drafted. After a few minutes of figuring it out. Spark did complete that task: it created a spreadsheet and returned to the draft email text to drop in the link.

In other words, Spark could recover. But it also made the user pay for the recovery in time and attention.

Calendar color, family messages, and the parts that didn’t work

Woodward’s final demo at I/O centered on asking Spark to handle multiple items: making his meetings with CEO Sundar Pichai hot pink on his calendar. writing a note to a new neighbor to invite him to his block party. and creating a document to help with to-dos for his kids for the end of the school year.

The tester adapted the request with their own life. They asked Spark to make a calendar event each month ahead of their wife’s birthday and make it hot pink. draft an email to their family about sending them the first episode of the latest season of Taskmaster. and create a document with the top things their wife and they need to know about getting their toddler ready for preschool.

They started the request at 3:35PM PT on Friday. During the I/O keynote. Woodward made a point of doing a bit of a show about putting his phone down and promising to check results later. which he did. After one hiccup—Spark wanted to access the tester’s contacts. and the tester declined—the task finished about four minutes later.

The outcome was good, but not seamless.

The tester’s Google calendar received events from 9–10AM on the correct day of each month leading up to their wife’s birthday. The reminders were in what Google calls “flamingo,” which the tester said isn’t exactly “hot pink,” but close enough.

Spark also grabbed emails of the tester’s immediate family and put them in a draft email. But the draft oddly didn’t include the wife.

The email text got the first episode of the latest season of Taskmaster right, but the link pointed to a trailer instead of the actual episode. The draft also included “loool,” which is something the tester writes in casual written conversation.

image

For the preschool planning document, Spark made a Google Doc in the tester’s Drive with a preschool preparation checklist. However, it only gave access to the tester. When asked if it could grant access to the wife, Spark said it isn’t currently able to do that.

The real question isn’t “can it do it?”—it’s “should it do it alone?”

The tester’s overall takeaway is less about whether Spark can deliver useful results and more about where the control boundaries should be.

Like all AI tools, they say, you still have to check the output to make sure it’s accurate. And that becomes harder when the system is pulling from personal information to prepare things you’re going to share with people you know.

Even though Google pitches Spark as something that can operate on its own, the tester found themselves constantly watching it or checking notifications coming to their phone. The unease is blunt: what good is an assistant if you end up micromanaging it instead of trusting it?

There’s also an energy question in the background. The tester asks why a power-hungry data center should be used for “relatively inconsequential tasks” when the same tasks could be done manually—just with more time.

What you pay, where it’s available, and the ecosystem trap

Spark comes with concrete limitations that matter for anyone tempted by the “24/7” pitch.

Currently, Spark is only available to subscribers of Google’s AI Ultra plan, which starts at $99.99 per month. It’s also only available to users in the US and only in English.

Google provided the tester with free access to test Spark, and they don’t believe it’s good enough to justify springing for those expensive plans on its own.

The system also works best if you’re already deep in the Google ecosystem and have Personal Intelligence on. The tester says they’ve had a Google account for around two decades, giving Spark a large pool of historical data to draw from.

Google, the tester notes, promises that Gemini “doesn’t train directly” on a Gmail inbox with Personal Intelligence turned on. Still, the tester says you’re left to trust Google to be a “good steward” of the data.

For now, they’re not sure the combination of cost and risk is worth it.

Gemini Spark AI agent Google AI Ultra Personal Intelligence Gmail draft Google Drive privacy automation Taskmaster calendar events

4 Comments

  1. Not gonna lie this sounds like it’s basically reading my stuff. If it can pull from Drive and set calendar stuff, what’s the point of me even being there?

  2. I don’t get the “required oversight” part… like isn’t that the whole Google thing? If it linked to the wrong content then yeah, maybe it’s just buggy, but people act like it’s evil. Also how did it get the first name if it’s not in the email?? maybe the spreadsheet already had it.

  3. This is why I don’t trust “24/7” anything. They say it checks with you before major actions but then it drafts and schedules like it’s doing chores for you. Missed personalization and grabbed the wrong stuff… so it’s like my cousin trying to order groceries. If Google wants oversight, then they should be the ones babysitting it, not me.

Leave a Reply

Your email address will not be published. Required fields are marked *

Are you human? Please solve:Captcha


Secret Link