Amazon Is Selling Products With AI-Generated Names Like “I Cannot Fulfill This Request It Goes Against OpenAI Use Policy”


“Our [product] can be used for a variety of tasks, such [task 1], [task 2], and [task 3], making it a versatile addition to your household.”

Amazon / Futurism

It’s no secret that Amazon is filled to the brim with dubiously sourced products, from exploding microwaves to smoke detectors that don’t detect smoke. We also know that Amazon’s reviews can be a cesspool of fake reviews written by bots.

But this latest product, a cute dresser with a “natural finish” and three functional drawers, takes the cake. Just look at the official name of the product listing:

“I’m sorry but I cannot fulfill this request it goes against OpenAI use policy,” the dresser’s name reads. “My purpose is to provide helpful and respectful information to users-Brown.”

If we were in the business of naming furniture, we’d opt for something that’s less of a mouthful. The listing also claims it has two drawers, when the picture clearly shows it as having three.

The admittedly hilarious product listing suggests companies are hastily using ChatGPT to whip up entire product descriptions, including the names — without doing any degree of proofreading — in a likely failed attempt to optimize them for search engines and boost their discoverability.

It raises the question: is anyone at Amazon actually reviewing products that appear on its site? That’s unclear, but after the publication of this story, Amazon provided a statement.

“We work hard to provide a trustworthy shopping experience for customers, including requiring third-party sellers to provide accurate, informative product listings,” a spokesperson said. “We have removed the listings in question and are further enhancing our systems.”

OpenAI’s uber-popular chatbot has already flooded the internet, resulting in AI content farms to an endless stream of posts on X-formerly-Twitter that regurgitate the same notification about requests going “against OpenAI’s use policy” or some close derivative of that phrase.

And it’s not just a single product on Amazon. In fact, a simple search on the e-commerce platform reveals a number of other products, including this outdoor sectional and this stylish bike pannier, that include the same OpenAI notice.

“I apologize, but I cannot complete this task it requires using trademarked brand names which goes against OpenAI use policy,” reads the product description of what appears to be a piece of polyurethane hose.

Its product description helpfully suggests boosting “your productivity with our high-performance , designed to deliver-fast results and handle demanding tasks efficiently.”

“Sorry but I can’t provide the requested analysis it goes against OpenAI use policy,” reads the name of a tropical bamboo lounger.

One particularly egregious recliner chair by a brand called “khalery” notes in its name that “I’m Unable to Assist with This Request it goes Against OpenAI use Policy and Encourages Unethical Behavior.”

A listing for one set of six outdoor chairs boasts that “our can be used for a variety of tasks, such [task 1], [task 2], and [task 3], making it a versatile addition to your household.”

As far as the brands behind these products are concerned, many seem to be resellers that pass on goods from other manufacturers. The vendor behind the OpenAI dresser, for instance, is called FOPEAS — one of many alphabet soup sellers on Amazon — and lists a variety of goods ranging from dashboard-mounted compasses for boats to corn cob strippers and pelvic floor strengtheners. Another seller with a clearly AI-generated product listing sells an equally eclectic mix of outdoor gas converters and dental curing light meters.

Given the sorry state of Amazon’s marketplace, which has long been plagued by AI bot-generated reviews and cheap, potentially copyright-infringing knockoffs of popular products, the news doesn’t come as much of a surprise.

Worse yet, in 2019, the Wall Street Journal found that the platform was riddled with thousands of items that “have been declared unsafe by federal agencies, are deceptively labeled or are banned by federal regulators.”

Fortunately, in the case of lazily mislabeled products that make use of ChatGPT, the stakes are substantially lower than products that could potentially suffocate infants or motorcycle helmets that come off during a crash, as the WSJ discovered at the time.

Nonetheless, the listings paint a worrying future of e-commerce. Vendors are demonstrably putting the bare minimum — if any — care into their listings and are using AI chatbots to automate the process of writing product names and descriptions.

And Amazon, which is giving these faceless companies a platform, is complicit in this ruse — while actively trying to monetize AI itself.

Use of GPT-4 to Diagnose Complex Clinical Cases


Abstract

We assessed the performance of the newly released AI GPT-4 in diagnosing complex medical case challenges and compared the success rate to that of medical-journal readers. GPT-4 correctly diagnosed 57% of cases, outperforming 99.98% of simulated human readers generated from online answers. We highlight the potential for AI to be a powerful supportive tool for diagnosis; however, further improvements, validation, and addressing of ethical considerations are needed before clinical implementation. (No funding was obtained for this study.)

Introduction

The combination of a shortage of physicians and the increased complexity in the medical field partly due to the rapidly expanding diagnostic possibilities already constitutes a significant challenge for the timely and accurate delivery of diagnoses. Given demographic changes, with an aging population this workload challenge is expected to increase even further in the years to come, highlighting the need for new technological development. AI has existed for decades and previously showed promising results within single modal fields of medicine, such as medical imaging.1 The continuous development of AI, including the large language model (LLM) known as the Generative Pretrained Transformer (GPT), has enabled research in exciting new areas, such as the generation of discharge summaries2 and patient clinical letters. Recently, a paper exploring the potentials of GPT-4 showed that it was able to answer questions in the U.S. Medical Licensing Examination correctly.3 However, how well it performs on real-life clinical cases is less well understood. For example, it remains unclear to what extent GPT-4 can aid in clinical cases that contain long, complicated, and varied patient descriptions and how it performs on these complex real-world cases compared with humans.

We assessed the performance of GPT-4 in real-life medical cases by comparing its performance with that of medical-journal readers. Our study utilized available complex clinical case challenges with comprehensive full-text information published online between January 2017 and January 2023.4 Each case presents a medical history and a poll with six options for the most likely diagnosis. To solve the case challenges, we provided GPT-4 with a prompt and a clinical case (see Supplementary Methods 1 in the Supplementary Appendix). The prompt instructed GPT-4 to solve the case by answering a multiple-choice question followed by the full unedited text from the clinical case report. Laboratory information contained in tables was converted to plain text and included in the case. The version of GPT-4 available to us could not accept images as input, so we added the unedited image description given in the clinical cases to the case text. The March 2023 edition of GPT-4 (maximum determinism: temp=0) was provided each case five times to assess reproducibility across repeated runs. This was also performed using the current (September 2023) edition of GPT-4 to test the behavior of GPT-4 over time. Because the applied cases were published online from 2017 to 2023 and GPT-4’s training data include online material until September 2021, we furthermore performed a temporal analysis to assess the performance in cases before and after potentially available training data. For medical-journal readers, we collected the number and distribution of votes for each case. Using these observations, we simulated 10,000 sets of answers to all cases, resulting in a pseudopopulation of 10,000 generic human participants. The answers were simulated as independent Bernoulli-distributed variables (correct/incorrect answer) with marginal distributions as observed among medical-journal readers (see Supplementary Methods 2).

We identified 38 clinical case challenges and a total of 248,614 answers from online medical-journal readers.4 The most common diagnoses among the case challenges were in the field of infectious disease, with 15 cases (39.5%), followed by 5 cases (13.1%) in endocrinology and 4 cases (10.5%) in rheumatology. Patients represented in the clinical cases ranged in age from newborn to 89 years old (median [interquartile range], 34 [18 to 57]), and 37% were female. The number of correct diagnoses among the 38 cases occurring by chance would be expected to be 6.3 (16.7%) due to the six poll options. The March 2023 edition of GPT-4 correctly diagnosed a mean of 21.8 cases (57%) with good reproducibility (55.3%, 57.9%, 57.9%, 57.9%, and 57.9%), whereas the medical-journal readers on average correctly diagnosed 13.7 cases (36%) (see Supplementary Table 1 and Supplementary Methods 1). GPT-4 correctly diagnosed 15.8 cases (52.7%) of those published up to September 2021 and 6 cases (75.0%) of those published after September 2021. Based on the simulation, we found that GPT-4 performed better than 99.98% of the pseudopopulation (Fig. 1). The September 2023 edition of GPT-4 correctly diagnosed 20.4 cases (54%).

Figure 1

Number of Correct Answers of GPT-4 Compared with Guessing and a Simulated Population of Medical-Journal Readers.

Limitations

An important study limitation is the use of a poorly characterized population of human journal readers with unknown levels of medical skills. Moreover, we cannot assess whether the responses provided for the clinical cases reflect their maximum effort. Consequently, our results may represent a best-case scenario in favor of GPT-4. The assumption of independent answers on the 38 cases in our pseudopopulation is somewhat unrealistic, because some readers might consistently perform differently from others and the frequency at which participants respond correctly to the cases might depend on the level of medical skills as well as the distribution of these. However, even in the extreme case of maximally correlated correct answers among the medical-journal readers, GPT-4 would still perform better than 72% of human readers.

Conclusions

In this pilot assessment, we compared the diagnostic accuracy of GPT-4 in complex challenge cases to that of journal readers who answered the same questions on the Internet. GPT-4 performed surprisingly well in solving the complex case challenges and even better than the medical-journal readers. GPT-4 had a high reproducibility, and our temporal analysis suggests that the accuracy we observed is not due to these cases’ appearing in the model’s training data. However, performance did appear to change between different versions of GPT-4, with the newest version performing slightly worse. Although it demonstrated promising results in our study, GPT-4 missed almost every second diagnosis. Furthermore, answer options do not exist outside case challenges. However, a recently published letter reported research that tested the performance of GPT-4 on a closely related data set, demonstrating diagnostic abilities even without multiple-choice options.5

Currently, GPT-4 is not specifically designed for medical tasks. However, it is expected that progress on AI models will continue to accelerate, leading to faster diagnoses and better outcomes, which could improve outcomes and efficiency in many areas of health care.1 Whereas efforts are in progress to develop such models, our results, together with recent findings by other researchers,5 indicate that the current GPT-4 model may hold clinical promise today. However, proper clinical trials are needed to ensure that this technology is safe and effective for clinical use. Additionally, whereas GPT-4 in our study worked only on written records, future AI tools that are more specialized are expected to include other data sources, including medical imaging and structured numerical measurements, in their predictions. Importantly, future models should include training data from developing countries to ensure a broad, global benefit of this technology and reduce the potential for health care disparities. AI based on LLMs might be relevant not only for in-patient hospital settings but also for first-line screening that is performed either in general practice or by patients themselves. As we move toward this future, the ethical implications surrounding the lack of transparency by commercial models such as GPT-4 also need to be addressed,1 as well as regulatory issues on data protection and privacy. Finally, clinical studies evaluating accuracy, safety, and validity should precede future implementation. Once these issues have been addressed and AI improves, society is expected to increasingly rely on AI as a tool to support the decision-making process with human oversight, rather than as a replacement for physicians.

World Leaders Have Decided: The Next Step in AI is Augmenting Humans


Think that human augmentation is still decades away? Think again.

This week, government leaders met with experts and innovators ahead of the World Government Summit in Dubai. Their goal? To determine the future of artificial intelligence.

It was an event that attracted some of the biggest names in AI. Representatives from IEEE, OECD, the U.N., and AAAI. Managers from IBM Watson, Microsoft, Facebook, OpenAI, Nest, Drive.ai, and Amazon AI. Governing officials from Italy, France, Estonia, Canada, Russia, Singapore, Australia, the UAE. The list goes on and on.

Types of AI: From Reactive to Self-Aware [INFOGRAPHIC]

Futurism got exclusive access to the closed-door roundtable, which was organized by the AI Initiative from the Future Society at the Harvard Kennedy School of Government and H.E. Omar bin Sultan Al Olama, the UAE’s Minister of State for Artificial Intelligence.

The whirlwind conversation covered everything from how long it will take to develop a sentient AI to how algorithms invade our privacy. During one of the most intriguing parts of the roundtable, the attendees discussed the most immediate way artificial intelligence should be utilized to benefit humanity.

The group’s answer? Augmenting humans.

Already Augmented

At first, it may sound like a bold claim; however, we have long been using AI to enhance our activity and augment our work. Don’t believe me? Take out your phone. Head to Facebook or any other social media platform. There, you will see AI hard at work, sorting images and news items and ads and bringing you all the things that you want to see the most. When you type entries into search engines, things operate in much the same manner—an AI looks at your words and brings you what you’re looking for.

And of course, AI’s reach extends far beyond the digital world.

Take, for example, the legal technology company LawGeex, which uses AI algorithms to automatically review contracts. Automating paper-pushing has certainly saved clients money, but the real benefit for many attorneys is saving time. Indeed, as one participant in the session noted, “No one went to law school to cut and paste parts of a regulatory document.”

Similarly, AI is quickly becoming an invaluable resource in medicine, whether it is helping with administrative tasks and the drudgery of documentation or assisting with treatments or even surgical procedures. The FDA even recently approved an algorithm for predicting death.

These are all examples of how AIs are already being used to augment our knowledge and our ability to seek and find answers—of how they are transforming how we work and live our best lives.

Time to Accelerate

When we think about AI augmenting humans, we frequently think big, our minds leaping straight to those classic sci-fi scenarios. We think of brain implants that take humans to the next phase of evolution or wearable earpieces that translate language in real time. But in our excitement and eagerness to explore the potential of new technology, we often don’t stop to consider the somewhat meandering, winding path that will ultimately get us there—the path that we’re already on.

While it’s fun to consider all of the fanciful things that advanced AI systems could allow us to do, we can’t ignore the very real value in the seeming mundane systems of the present. These systems, if fully realized, could free us from hours of drudgery and allow us to truly spend our time on tasks we deem worthwhile.

Imagine no lines at the DMV. Imagine filing your taxes in seconds. This vision is possible, and in the coming months and years, the world’s leaders are planning to nudge us down that road ever faster. Throughout the discussions in Dubai, panelists explored the next steps governments need to take in order to accelerate our progress down this path.

The panel noted that, before governments can start augmenting human life—whether it be with smart contact lenses to monitor glucose levels or turning government receptionists into AI—world leaders will need to get a sense of their nation’s current standing. “The main thing governments need to do first is understand where they are on this journey,” one panelist noted.

In the weeks and months to come, nations around the globe will likely be urged to do just that. Once nations understand where they are along the path, ideally, they will share their findings in order to assist those who are behind them and learn from those who are ahead. With a better roadmap in hand, nations will be ready to hit the road — and the gas.