AI chatbot maker Babylon Health attacks clinician in PR stunt after he goes public with safety concerns

U.K. start-up Babylon Health pulled app data on a critical user in order to create a press release in which it publicly attacks the UK physician who has spent years raising patient security issues about the sign triage chatbot service.

In the press release launched late Monday Babylon describes Dr. David Watkins– by means of his Twitter manage– as a “troll” and claims he’s “ targeted members of our personnel, partners, clients, regulators and reporters and tweeted defamatory material about us.”

It likewise writes that Watkins has actually clocked up “hundreds of hours” and 2,400 tests of its service in a quote to challenge his safety issues– stating he’s raised “ fewer than 100 test results which he thought about worrying”.

Babylon’s PR also claims that only in 20 instances did Watkins discover “genuine errors in our AI,” whereas other circumstances are couched as “misrepresentations” or “errors,” per an unnamed “ panel of senior clinicians,” which the startup’s PR states “investigated and re-validated every single one” — suggesting the error rate Watkins determined was simply 0.8%.

Screengrab from Babylon’s press release which refers to Dr Watkins’ “Twitter giant tests”

Reacting to the attack in a telephone interview with TechCrunch, Watkins explained Babylon’s claims as “absolute rubbish”– saying, for example, he has not brought out anywhere near 2,400 tests of its service.

When asked how many tests he thinks he did complete, Watkins recommended it’s most likely to be in between 800 and 900 full runs through “complete triages” (” Many” of which, he stated, would have been repeat tests to confirm a problem; record the issue for the function of publishing it to Twitter or to inspect if the company had fixed issues he ‘d formerly noticed. In addition, Watkins pointed out that some of the triage runs would have been carried out to show issues to reporters who wished to inspect issues he was flagging for themselves.)

He said he recognized issues in about one in three instances of checking the bot– though in 2018 states he was finding much more issues, declaring it was “one in one” at that phase for an earlier version of the app.

Watkins recommends that to get to the 2,400 figure Babylon is most likely counting instances where the chatbot was opened however a complete triage wasn’t carried out– for a variety of reasons, such as lagging/glitching and errors in the actions made.

” They’ve controlled data to attempt and reject somebody raising client safety concerns,” he said.

He stated has undertaken “reasonably concentrated screening mostly concentrated on typical presentations or red flag cases.”

” I understand what I’m looking for since I have actually done this for the previous three years, and I’m searching for the exact same concerns which I’ve flagged before to see have they repaired them. So trying to recommend that my screening is in fact any indication of the precision of the chatbot overall is absurd in itself,” he added.

In another pointed attack Babylon writes Watkins has “ published over 6,000 misleading attacks”– without specifying precisely what sort of attacks it’s referring to (or where they’ve been posted).

Watkins told us he hasn’t even tweeted 6,000 times in overall considering that joining Twitter 4 years ago; however, he has actually spent three years utilizing the platform to raise concerns about medical diagnosis problems with Babylon’s chatbot.

Such as this series of tweets where he shows a triage for a female patient stopping working to get a prospective heart attack.

Watkins told us he has no idea what the 6,000 figure describes, accusing Babylon of having a culture that is “trying to silence criticism” rather than engage with real clinician concerns.

” Not as soon as have Babylon in fact approached me and said, ‘Hey Dr Murph or Dr Watkins, what you’ve tweeted there is misleading’,” he added. “Not when.”

Rather, he stated the startup has regularly taken a “dismissive approach” to the security issues he’s raised. “My overall interest in the manner in which they’ve approached this is that yet again they have actually taken a dismissive method to criticism and once again tried to smear and reject the person raising concerns,” he said.

Watkins, a specialist oncologist at The Royal Marsden NHS Structure Trust– who has for a number of years gone by the online (Twitter) name of @ DrMurphy11, tweeting videos of Babylon’s chatbot triage he says illustrate the bot stopping working to properly determine client discussions– made his identity public on Monday when he attended an argument at the Royal Society of Medicine.

There he offered a presentation calling for less buzz and more independent confirmation of claims being made by Babylon as such digital systems continue to elbow their way into the health care area.

In the case of Babylon, the app has a major cheerleader in the present U.K. Secretary of State for Health, Matt Hancock, who has exposed he’s an individual user of the app.

Concurrently, Hancock is pushing the National Health Service to overhaul its facilities to enable the plugging in of “healthtech” apps and services, so you can find the political synergies.

Watkins argues the sector requires more of a focus on robust proof gathering and independent testing vs meaningless ministerial support and collaboration “endorsements” as a stand in for due diligence.

He points to the example of Theranos– the disgraced blood screening start-up whose co-founder is now facing charges of fraud— stating this should supply a significant red flag of the requirement for independent testing of ‘unique’ health item claims.

“[Over hyping of products] is a tech industry issue which unfortunately appears to have contaminated healthcare in a couple of situations,” he informed us, referring to the startup ‘phony it til you make it’ playbook of buzz marketing and scaling without waiting on external verification of greatly marketed claims.

When it comes to Babylon, he argues the business has actually stopped working to support puffy marketing with robust evidence of the sort of comprehensive clinical screening and recognition which he states need to be essential for a health app that’s out in the wild being utilized by clients. (Referrals to academic studies that have been made have not been stood by offering outsiders with access to data-sets so they can separately confirm the claims, is the recommendation.)

” They have actually got backing from the founders of Google DeepMind, [partnerships with] Bupa, Samsung, Tencent … the Saudis have actually invested hundreds of millions and they’re a [2] billion dollar business. They’ve got the [personal] support of Matt Hancock … Everything looks trustworthy,” Watkins went on. “However there is no basis for that trustworthiness. You’re basing the trustworthiness on the ability of a business to partner. And you’re making the assumption that those partners have carried out due diligence.”

Babylon has likewise recently tattooed a 10- year deal with the Royal Wolverhampton NHS Trust— which includes remote access to GPs and medical facility experts, live tracking of clients with chronic conditions and customised care plans underpinned by its AI.

For its part Babylon claims the opposite– stating its app fulfills existing regulative requirements and pointing to high “client fulfillment rankings” and a lack of reported harm by users as evidence of security, composing in the same PR in which it lays into Watkins:

Our track record speaks for itself: our AI has been utilized millions of times, and not one single patient has actually reported any harm (a far better safety record than any other health assessment in the world).

But proposing to judge the effectiveness of a health-related service by a patient’s capability to grumble if something fails appears, at least, an unorthodox method– flipping the Hippocratic oath concept of ‘first do no damage’ on its head. (Plus, speaking theoretically, someone who’s dead would actually be not able to grumble– which might plug a rather large loophole in any ‘safety bar’ being declared by means of such an evaluation approach.)

On the regulative point, Watkins argues that the current UK regime does not supply appropriate assurances that an advancement like AI chatbots are as trusted and safe as promoted.

He states grievances he’s filed with the MHRA (Medicines and Healthcare products Regulatory Firm) appear to lead to it simply asking Babylon to look into problems. The actions have actually frequently been really sluggish with minimal feedback concerning any modifications, he includes. While he notes that privacy clauses limit what can be revealed by the regulator.

( On that point, when TechCrunch contacted the MHRA in 2018– to ask about issues being raised about the Babylon chatbot at that time– we were told the regulator could not discuss the app because details concerning the compliance of medical devices with the Medical Gadgets Regulations 2002 and underlying European legislation is personal in between the MHRA and the company. It would only provide us with the following general declaration: “We regularly perform post-market monitoring and maintain dialogue with manufacturers about their compliance with the Regulations– this types part of our routine work at MHRA. Patient security is our greatest priority and ought to anything be determined throughout our post-market surveillance we act as suitable to protect public health.”)

All of that might appear like a plum chance for a certain sort of startup ‘disruptor’, of course.

And Babylon’s app is one of several now applying AI type innovations as a diagnostic help in chatbot type, across a number of worldwide markets.

Yet, says Watkins, if you read particular headings and claims made for the company’s item in the media you might be forgiven for coming away with a very various impression– and it’s this level of hype that has him fretted.

Other less hype-dispensing chatbots are readily available, he recommends– name-checking Berlin-based Ada Health as taking a more thoughtful approach on that front.

Asked whether there specify tests he would like to see Babylon do to stand its hype, Watkins informed us: “The beginning point is getting an innovation which you feel is safe to really remain in the general public domain.”

Especially, the European Commission is dealing with risk-based regulatory framework for AI applications— consisting of for use-cases in sectors such as health care– which would require such systems to be “transparent, traceable and assurance human oversight”, as well as to utilize impartial data for training their AI models.

” Because of Babylon’s previous hyperbolic claims [with regard to the capabilities and safety of the chatbot] there’s a big problem,” Watkins suggested, raising issues about what he stated is deceptive phrasing used in the app and about media protection of the AI that’s streamed from Babylon’s hyped marketing. “How do they now roll back and make certain individuals comprehend the purpose of the chatbot?”

Specifically he finds the disclaimer statement on the chatbot confusing, which states it’s for ‘info only’, and notes: ‘The service does not detect your own health condition or make treatment suggestions for you. Our triage and information service is not a replacement for a physician or other healthcare specialist.’

” The chatbot presents itself as offering clients suggested medical diagnosis and suggests what action they ought to think about. However at the very same time they have a disclaimer stating this isn’t providing you any health care details, it’s just info– it doesn’t make sense. I don’t understand what a client’s suggested to consider that,” he added, suggesting various phrasing should be used to warn patients what the technology need to be used for.

” Babylon always present themselves as very patient-facing, extremely patient-focused, we listen to clients, we hear their feedback.

” There are other chatbots which I think have defined their function far more clearly [he also cites MayaMD]– where they are very clear in their intent stating we’re not here to provide you with healthcare guidance; we will provide you with information which you can take to your doctor to permit you to have a more educated conversation with them. And when you put it in that context, as a client I believe that makes ideal sense. This machine is going to provide me info so I can have a more informed discussion with my medical professional. Great. So there’s basic things which they just have not done.”

Watkins likewise aired his disappointment at sensation he needs to raise these problems: “It drives me nuts. I’m an oncologist– it should not be me doing this.”

He suggested Babylon’s reaction to his raising “excellent faith” client security concerns is a stressing indication, suggesting a much deeper despair within the culture of the company.

” What they have actually done is utilize their chatbot app data to intimidate me as a recognizable individual,” he said of the company’s attack on him, including that Babylon’s usage of the term “giant” is an improper term for someone raising safety issues and “constitutes an individual attack on me as a specific”.

” I’m concerned that there will be clinicians in that company who, if they see this taking place, they’re going to think twice about raising concerns– due to the fact that you’ll just get challenged in the organization. Which’s truly harmful in healthcare,” Watkins included. “You need to be able to speak out when you see issues since otherwise patients are at threat of damage and things do not alter. You need to learn from mistake when you see it. You can’t just carry on doing the very same thing once again and once again and once again.”

” It is disappointing that Babylon have actually chosen to attack me as a specific, rather of detailing their strategies to deal with the alarmingly flawed triage algorithms,” he included.

Others in the medical neighborhood have actually been quick to slam Babylon for targeting Watkins in such an individual way and for exposing information about his usage of its (medical) service.

As one Twitter user, Sam Gallivan– also a physician– put it: “Can other high frequency Babylon Health users eagerly anticipate having their medical questions broadcast in a press release?”

Can other high frequency @babylonhealth users eagerly anticipate having their private medical queries transmitted in a press release?

— Sam Gallivan (@samgal) February 25, 2020

The act certainly raises concerns about Babylon’s approach to delicate health information, if it’s accessing client details for the purpose of attempting to steamroller notified criticism.

We’ve seen likewise unsightly stuff in tech previously, naturally– such as when Uber kept a ‘god-view’ of its ride-hailing service and used it to keep tabs on critical journalists In that case the abuse of platform information pointed to a toxic culture problem that Uber has needed to spend subsequent years sweating to reverse (including altering its CEO).

Babylon’s selective data dump on Watkins is also an illustrative example of a digital service’s ability to gain access to and shape specific data at will– indicating the highlighting power asymmetries between these data-capturing innovation platforms (which are getting increasing agency over our choices) and their users who only get highly moderated, hyper regulated access to the databases they help to feed.

Watkins, for instance, told us he is no longer able to access his query history in the Babylon app– supplying a screenshot of a mistake screen (below) that he says he now sees when he tries to gain access to chat history in the app. He stated he does not know why, over current months, he has no longer been able to access his historical usage details– disallowing access to a recommendation point that could aid with additional testing.

If it’s a bug it’s a convenient one for Babylon PR …

We got in touch with Babylon to ask it to react to criticism of its attack on Watkins. The company defended its usage of his app data to create journalism release– arguing that the “volume” of queries he had run suggests the typical data protection guidelines do not apply, and further declaring it had actually only shared “non-personal analytical data”, although this was connected in the PR to his Twitter identity (and for that reason, given that Monday, to his genuine name).

In a declaration the Babylon representative informed us:

If safety associated claims are made about our innovation, our medical specialists are needed to look into these matters to ensure the precision and security of our items. The information shared by us was non-personal statistical data, and Babylon has complied with its data defense commitments throughout.

We also asked the UK’s information defense watchdog about the episode and Babylon making Watkins’ app usage public. The ICO told us: “Individuals can anticipate that organisations will manage their individual information properly and securely. If anyone is concerned about how their data has actually been managed, they can call the ICO and we will check out the information.”

Babylon’s medical innovation director, Dr Keith Grimes, attended the exact same Royal Society argument as Watkins today– which was entitled Recent advancements in AI and digital health 2020 and billed as a conference that will “cut through the buzz around AI”.

So it seems no mishap that their attack press release was timed to follow difficult on the heels of a discussion it would have known (because at least last December) was coming that day– and in which Watkins argued where AI chatbots are worried “recognition is more crucial than assessment”.

Last summer Babylon announced a $550 M Series C raise, at a $2BN evaluation.

Investors in the business consist of Saudi Arabia’s Public Mutual fund, an unnamed U.S.-based health insurance business, Munich Re’s ERGO Fund, Kinnevik, Vostok New Ventures and DeepMind co-founder Demis Hassabis, to name a few helping to fund its marketing.

” They came with a story,” said Watkins of Babylon’s message to the Royal Society.

The clinician’s counter message to the occasion was to position a concern EU policymakers are just beginning to think about– calling for the AI maker to reveal data-sets that stand its security claims.

This report was updated with extra remark

Find Out More Protection Status