Share

AI brand recommendations vary, marketers warned on tracking tools

Large language models generate inconsistent brand and product recommendations, making precise ranking measurements unreliable, according to a study by market research firm SparkToro and AI tracking startup Gumshoe.ai.

The research found that repeated prompts to major AI systems produced widely varying lists of recommended brands, with differences in which brands appeared, their ranking order and the number of recommendations provided. Researchers said the results suggest marketers should be cautious about relying on tools that claim to track exact brand rankings in AI-generated responses.

The study was conducted by SparkToro cofounder Rand Fishkin and Gumshoe.ai cofounder and chief technology officer Patrick O’Donnell. The research has not been peer reviewed and the authors said neither researcher is a data scientist, according to WARC.

Researchers recruited 600 volunteers who asked a dozen questions across three AI systems — ChatGPT, Claude and Google’s AI search tools — generating 2,961 responses in total. Participants were given general instructions but were free to phrase prompts in their own language.

Chef’s knives and science-fiction novels

Questions covered sectors including healthcare, real estate, fashion and computing. Example prompts included requests for chef’s knives for an amateur cook with a budget under $300 and recommendations for recently published science-fiction novels.

If a large language model was asked for brand recommendations 100 times, nearly every response differed in the list presented, the order of recommendations and the number of brands suggested. Lists ranged from as few as two or three items to more than ten.

Researchers said there was less than a one-in-100 chance that ChatGPT or Google’s AI systems would produce identical brand lists across repeated queries, with Claude only slightly more likely to repeat results.

While rankings varied widely, the study found greater consistency in how frequently certain brands appeared in responses. The authors said measuring brand visibility — the percentage of responses in which a brand appears — may provide a more reliable metric than ranking position.

Don’t throw money at AI tracking products

Across the three models tested, the average visibility rate for the three most frequently mentioned brands was 64 percent for ChatGPT, 73 percent for Claude and 68 percent for Google AI.

In one example involving travel headphones, researchers ran 142 prompts generating 994 responses. Products from Bose, Sony, Sennheiser and Apple appeared in between 55 percent and 77 percent of responses.

Fishkin said marketers should be wary of AI tracking tools that claim to provide precise ranking data.

“If you do nothing else after reading this report, please, please, marketers, analysts, and execs: stop throwing money at AI tracking products that don’t provide stats-backed, publicly reviewable research,” Fishkin said.

READ MORE

View all