Siri AI on macOS 27 Golden Gate stumbles early

1 5 minutes read

Siri AI on macOS 27 Golden Gate stumbles early

My first – In just over 24 hours with the macOS 27 developer beta’s new Siri AI, the author finds it promising inside Apple’s own apps but unreliable across workflows—especially when it can’t clearly index files, can’t run actions inside apps, and misreads data in spread

When a voice assistant fails the first real task you give it, you feel it right away.

This time. it happened during just over 24 hours of testing Siri AI in the macOS 27 developer beta for macOS 27 Golden Gate. The preview is early, and the author expects “lots of runway for improvements” before the feature releases later this year. Even so. the testing begins with a basic problem: they don’t know whether Siri AI is indexing files and folders on their review devices. which are an M5 MacBook Air and an M5 Max MacBook Pro.

On iOS 27’s developer beta, there had been an “indexing in progress” box in settings, but on the macOS beta there’s no such box. When the author asks Siri if it can tell them, Siri says to click a button in Settings that isn’t there.

The author’s skepticism isn’t new. They had turned off Siri on the Mac years ago and “never looked back. ” and they found Apple Intelligence “fruitless” enough that they never engaged with it. Colleagues. however. got a head start testing Siri AI on the iPhone and Apple Watch and reported positive feedback on its general vibe. On the Mac, the author’s feelings are more mixed.

They try to translate the hype into practical work. Benchmarks are a recurring grind in their laptop reviews: they benchmark repeatedly, screenshot results, then average scores later before logging everything in a spreadsheet.

Siri AI can launch apps, the author finds, but it can’t take actions inside them. They also explore automations via Shortcuts—described here as a new part of Apple Intelligence rather than a Siri AI feature. The author asks Shortcuts to run a test in either Geekbench or Cinebench. capture results in a screenshot. wait a few minutes. and repeat the process two more times. The automations don’t actually run the benchmarks. Apple Intelligence creates a Geekbench shortcut that opens Geekbench and takes screenshots, but forgets to run the benchmark. A Cinebench shortcut includes “Wait for you to run the test” as a step.

The author’s verdict is blunt: something important is missing, and this shortcut feels “passive aggressive.” If Siri can’t automate the heavy work, they turn to the smaller pain point—logging.

WWDC’s keynote showed someone using Ask Siri in Spotlight to analyze data in local files. The author follows that lead. They select batches of screenshots in Finder and ask Siri to calculate the average scores for them. Most of the time, it works.

Siri is smart enough to separate single-core CPU scores from multicore CPU scores and GPU scores. It averages the test results and arranges them into easy-to-read tables.

But the limits show up when the inputs get messy. Siri can be thrown off if the screenshots include too many different test types—especially if the author mixes synthetic score results like Geekbench and PugetBench with time-based results such as Blender render tests and the author’s 4K video export test. Sometimes it’s also disrupted by CPU rankings data visible in Cinebench screenshots.

The author says that what would save real time is accuracy across “the 15 or so averages” drawn from dozens of screenshots at once. For now, Siri can at best help a little, and it can also make mistakes—screwing up the numbers by pulling the wrong data.

When the author tests Siri’s broader screen understanding, they find it behaves more like a tool that’s strong within the Apple ecosystem than outside it.

Asked to find “pictures of cats or babies,” Siri pulls results from Apple’s Photos and Messages apps. That may be enough for many people. but not for this workflow: most of the author’s messaging is handled in Signal. and photos from their phone are uploaded to Google Photos rather than iCloud. Siri also misses the thousands of images in the author’s Lightroom Classic catalog. even though the files are stored locally in the Pictures folder and the author keeps telling Siri to access them directly. The author can’t tell whether those files are simply not indexed yet.

The author draws a direct comparison to another earlier tool test: the vibe is similar to when they tested Copilot Vision last year. In that comparison, Siri’s Visual Intelligence can answer questions about what’s on screen, but it’s limited. When the author asks Siri to evaluate benchmark results on a spreadsheet in Google Sheets. Siri can’t see all the data if it isn’t visible on screen at once.

A workaround does exist: the author can download the Google Sheet as an Excel file. point Siri at it in Finder. and get better coverage. But when they ask for the laptop with the highest single-core Geekbench score. Siri gives multicore data instead—again. not great. and the columns aren’t more clearly handled.

The failures aren’t confined to numbers. The author tests Siri inside Lightroom Classic using a black-and-white photo from a Ricoh GR IV Monochrome review. They ask Siri how to make the image look more like street photographer Alan Schaller’s work. Siri offers specific adjustment suggestions—exposure and contrast among them—and applying those values produces a decent result.

Then comes the moment that highlights another kind of mismatch. When the author asks Siri to judge the result, Siri goes sycophantic, saying the author has “nailed the look” and achieved an “almost timeless feel.” That kind of response contradicts what Apple says it isn’t supposed to do.

A separate test uses a classic Garry Winograd photo. The author asks for Lightroom settings changes to match it. Siri recommends setting the exposure to the value it’s already at—meaning the advice doesn’t move the author forward.

The photo, in the author’s own view, barely made it into their Ricoh GR IV Monochrome review. Instead of that nuance, Siri offers praise the author calls undeserved. The overall feeling is that Siri isn’t built to be a careful critique partner. It may be trying too hard to be agreeable.

Still, the author repeatedly circles back to timing and context. It’s still very early days for Siri AI. and a lot can change before the final release later this year. What’s already clear. though. is that the experience is likely to be very different between iPhone and Mac—because on an iPhone. far more data sits inside Apple’s own apps. while on a Mac the author is bouncing between many apps and ecosystems that constrain what Siri can do.

For this reviewer. that makes the early version feel like something promising but incomplete—its most useful and helpful behavior showing up in the areas where Apple’s ecosystem makes the information easier for Siri to access. They land on a cautious conclusion: faint praise. but still the most useful and helpful Siri they’ve tested—“baby’s first real AI steps for Apple.”.

Siri AI macOS 27 Golden Gate Apple Intelligence developer beta benchmarking Geekbench Cinebench Shortcuts Ask Siri Spotlight Visual Intelligence Lightroom Classic Google Sheets Copilot Vision

Ana Souza 2 hours ago

1 5 minutes read

Leave a Reply Cancel reply