After months of hype, Google and Microsoft announced the imminent arrivals of Bard and a ChatGPT-integrated Bing search engine within 24 hours of one another. At first glance, both tech giants’ public demonstrations appeared to display potentially revolutionary products that could upend multiple industries. But it wasn’t long before even cursory reviews highlighted egregious flaws within Google’s Bard suggestions. Now, it’s Microsoft’s turn for some scrutiny, and the results are as bad as Bard’s, if not worse.
Independent AI researcher Dmitri Brereton published a blog post Monday detailing numerous glaring issues in their experience with a ChatGPT-powered Bing. Bing’s demo frequently contained shoddy information: from inaccurate recommended product details, to omitting or misstating travel stop details, to even misrepresenting seemingly straightforward financial reports. In the latter instance, Bing’s AI summation of basic financial data—something that should be “trivial” for AI, per Brereton—contained completely false statistics out of nowhere.
But even when correct, Bing may have grossly sidestepped simple ethical guardrails. According to one report from PCWorld’s Mark Hachman, the AI provided the Hachman’s children with a litany of ethnic slurs when asked for cultural nicknames. Although Bing prefaced its examples by cautioning that certain nicknames are “neutral or positive, while others are derogatory or offensive,” the chatbot didn’t appear to bother categorizing its results. Instead, it simply created a laundry list of good, bad, and extremely ugly offerings.
Microsoft’s director of communications, Caitlin Roulston told The Verge that the company “expect[ed] that the system may make mistakes during this preview period, and the feedback is critical to help identify where things aren’t working well so we can learn and help the models get better.”
As companies inevitably rush to implement “smart” chatbot capabilities into their ecosystems, critics argue it’s vital that these issues be tackled and resolved before widespread adoption. For Chinmay Hegde, an Associate Professor at NYU Tandon School of Engineering, the missteps were wholly unsurprising, and Microsoft debuted its technology far too early.
“At a high level, the reason why these errors are happening is that the technology underlying ChatGPT is a probabilistic [emphasis Hegde] large language model, so there is inherent uncertainty in its output,” he writes in an email to PopSci. “We can never be absolutely certain what it’s going to say next.” As such, programs like ChatGPT and Bard may be good for tasks where there is no unique answer—like making jokes or recipe ideas—but not so much when precision is required, such as historical facts or constructing logical arguments, says Hegde.
“I am shocked that the Bing team created this pre-recorded demo filled with inaccurate information, and confidently presented it to the world as if it were good,” Brereton writes in their blog post before admonishing, “I am even more shocked that this trick worked, and everyone jumped on the Bing AI hype train without doing an ounce of due diligence.”