- Microsoft, Amazon, and OpenAI have launched AI health tools, with Copilot handling 50 million daily health questions, indicating massive unmet demand.
- Researchers warn products are released before independent evaluations, posing potential risks in a high-stakes field like healthcare.
- Users lacking medical expertise may not know how to effectively use health chatbots, a gap that lab-based testing might miss.
- The absence of third-party data leaves it unclear if these tools help more than they harm, driving calls for transparency standards.
The artificial intelligence sector is witnessing an unprecedented surge in health-focused tool development. In recent weeks, tech giants including Microsoft, Amazon, and OpenAI have unveiled products designed to deliver medical advice through advanced chatbots. Microsoft introduced Copilot Health, a dedicated space within its app where users can link their medical records and ask specific health questions. Amazon expanded access to Health AI, a large language model-based tool previously limited to members of its One Medical service. These join OpenAI's ChatGPT Health, launched in January, and Anthropic's Claude, which can access user health records with permission.
This matters because AI is reshaping healthcare access, but without rigorous evaluations, millions of users might rely on unproven tools, with critical implications for public safety.
Demand Drives the Boom
The driving force behind this trend is massive, unmet demand. Microsoft discloses that its Copilot platform fields 50 million health-related questions daily, making it the most popular discussion topic on its mobile app. Karan Singhal, who leads OpenAI's Health AI team, confirms a rapid increase in ChatGPT usage for medical queries even before specialized products debuted. This flood of inquiries reflects an uncomfortable reality: access to traditional healthcare systems is difficult, costly, and, for many populations, nearly inaccessible. Girish Nadkarni, chief AI officer at the Mount Sinai Health System, notes that these tools find their niche precisely because they fill a critical gap in care.
The Independent Evaluation Gap
Despite corporate enthusiasm, six academic researchers interviewed for this analysis expressed uniform concerns: products are going public before independent experts can rigorously assess their safety and efficacy. Andrew Bean, a doctoral candidate at the Oxford Internet Institute, argues that while it's plausible models have reached a point worth deploying, the evidence base must be solid. The risk lies in trusting companies to evaluate their own tools in a high-stakes area like health, especially if those evaluations aren't available for external review. Even with rigorous internal research, such as that conducted by OpenAI, blind spots may exist that the broader scientific community could identify.
Without trusted third-party evaluations, it remains genuinely unclear whether today's AI health tools help more than they harm.
Limitations of Lab Testing
Current studies suggest real users, lacking medical expertise, might not know how to phrase questions to get useful answers from health chatbots. This gap between controlled lab conditions and real-world use is a challenge some evaluations may miss. Dominic King, vice president of health at Microsoft AI and a former surgeon, attributes Copilot Health's launch to advances in generative AI capabilities for answering health questions. However, he admits demand is the other half of the equation. The ideal vision is that these chatbots improve user health while reducing pressure on the healthcare system, such as by assisting with triage tasks to decide if urgent medical attention is needed.
Implications and What's Next
The proliferation of AI health tools raises urgent questions about regulation and transparency. Without trusted third-party evaluations, it remains genuinely unclear whether today's tools help more than they harm. No one demands perfection, but the lack of independent data creates fertile ground for potential risks. As more companies join the race, pressure to establish testing standards and result publication will intensify. The future of AI in health will depend on balancing accelerated innovation with the responsibility to ensure these systems are not only popular but also safe and effective for all users.