If used, these algorithms should not be binding: healthcare professionals should always decide who should or could be treated and who should not.

In the collective imagination, talking about artificial intelligence inevitably takes us back to those science fiction movies that show us hyperrational beings, devoid of emotions that obscure their analytical skills, free of biases and preferences, 100% objective. This imaginary can make us look favorably on using algorithms to make difficult decisions. It is widespread to assume that an algorithm will not only make the best decisions and never be wrong, but that it will also be incapable of favoritism.

Unfortunately, things are not that simple. Roughly speaking, we can speak of two types of algorithms: machine learning algorithms, which use mathematical models to find patterns and regularities in a large data set, and those designed by a team of experts who themselves specify the criteria for making a decision. . The common aspect of both cases is that it is a human creation, where a series of rules to be followed, objectives and results that are considered minimally acceptable are defined. No algorithm is neutral.

When these algorithms are used to make critical decisions about the well-being of human beings, we must ensure that beyond their functionality at the technical level, these algorithms are also fair. The use of automatic systems in a hospital to decide, in a situation of scarce resources, which people are the best suited to receive treatment against Covid-19 and which are not, is a very clear example of this situation.

If, when developing the algorithm, a database has been used to “train” it (something inevitable if it is machine learning), and even more so if these data have been collected in a hurry and in a pressing situation like the current one, our algorithm will probably be biased.

A database is never a mirror of reality. It is always, and necessarily, a partial selection, carried out with some objectives and based on a series of preconceived ideas and structured for specific purposes.

Thus, the database may be biased because a relevant part of the population was not included in it or because another part was over-represented. This type of bias is the result of sampling error and although it is a frequent and very serious problem, it is conceptually less complex and well known to statistical experts.

We are more concerned with the bias that results from the attitudes and beliefs (implicit or explicit) of people who decide on how an algorithm “learns”, and on what is a “good” result. A very clear example of this problem is COMPAS, a program currently in use by US judges that calculates the probability of recidivism to decide whether a person can be released on bail while awaiting trial. COMPAS is currently under suspicion for making biased decisions as a result of many of the decisions present in the historical series had been made by judges who, contrary to the evidence, tend to consider that people of African-American ethnicity have a greater tendency than of Caucasian origin to re-offend if they are released on bail.

Methods

How can we establish whether an algorithm is biased or not? Fortunately there are various techniques to establish it. But to be sure that we are dealing with a fair algorithm, it is necessary for experts to have access to both the algorithm’s source code and the database and the mathematical model that has been used to train the program. This request often collides with the intellectual property rights of the algorithm developers. Naturally, we consider that when evaluating programs that decide on people’s lives, guaranteeing justice or respect for rights is more important than the protection of intellectual property.

A few days ago, Fernando Simón, director of the Center for Health Alerts and Emergencies, acknowledging the pressure that hospital ICUs are and will probably be subjected to, commented that work was being done to establish algorithms and criteria for accessing them.

It is not clear to us what kind of algorithms he was referring to. If Simon was referring to classical algorithms that are not the result of machine learning, such as decision trees that serve to structure medical action protocols, we have little to add to the vast discussion that takes place in the field of bioethics.

Rather, we are concerned that he was referring to algorithms that use machine learning with databases. In other words, we are concerned that less transparent tools are introduced that, for example, result in the assignment of an indicator or score to a specific patient based on a multitude of statistical data. This indicator would reflect, for example, a probabilistic estimate of the benefit that a patient could obtain from a treatment based on a series of statistical criteria. These systems already exist and are widely used every time we ask for a loan or a mortgage. Through scoring systems ,banks determine whether a customer will have a good chance of returning the money, or whether they are more likely to become delinquent. Often, the indicator is not merely informative, but acquires a prescriptive nature that results in the automatic denial (or granting) of the requested loan: Sorry, but the system does not allow us to grant you the loan.

Based on this brief analysis, considering that there is no guarantee that this type of intelligent algorithms is really objective, neutral and fair, we propose the following measures in the hypothetical case that algorithms are sought to be applied in crisis situations such as current ones, for example, for the triage of infected patients or for the prescription of treatment due to a shortage of key resources:

  1. Establish two types of audits: first, a technique that confirms that, as far as possible, the database represents the general population and that the results do not discriminate based on sampling error. And secondly, an ethic, which confirms that in addition to being statistically correct, the decision taken is also fair.
  2. Implement a final result of the algorithm that is not simply a score. A score can be tricky and it is usually endowed with a halo of supposed neutrality and objectivity and therefore it acquires a prescriptive character that is difficult to ignore, even for the experts themselves, as observed during tests with Watson, the artificial intelligence of IBM designed to assist doctors. Instead, we propose the specification of a series of criteria or informative dimensions that help healthcare personnel make better decisions. This information could be visualized by means of infographics, for example, in a kind of control panel, to facilitate the decision, but without preconceiving it and conditioning it by means of a score.
  3. To achieve the previous point, it is key to check that the way the algorithm makes decisions or makes a recommendation is explainable. In other words, the criteria and rules used by the algorithm can be made explicit. If the algorithm “decides” or “recommends” that one person should be treated and another should not, it is imperative that the reason for that decision or recommendation be explained. In other words: on what criteria does the algorithm make a recommendation? How are the common good, costs, justice, individual rights, or equity weighted and ordered? On the one hand, health personnel must know the reasons why a patient, for example, is considered to have a chance of survival or not. This would allow professionals to question the adequacy of the recommendation. On the other hand, The humanity of these patients and that of their loved ones, but also that of society as a whole, demands a minimum and inalienable requirement: that of obtaining a justification for the reason why a patient is deprived of treatment and, eventually, he is left to die. The inability to offer an explanation on these issues would be completely incompatible with the principle of human dignity as a fundamental ethical aspect.

Given the haste with which these algorithms would be developed and implemented, it is difficult to expect that they will be able to explain their decisions in a minimally acceptable way. Our recommendation is modest: if used, these algorithms should not be binding. It should always be health professionals, and in no case algorithms, who decide who should or can be treated and who should not. When it comes to crucial decisions for human well-being, and especially in cases of life and death, we cannot let “the system” decide.

Ariel Guersenzvaig is a professor at ELISAVA Barcelona University School of Design and Engineering.

David Casacuberta is a professor in the Department of Philosophy at the Autonomous University of Barcelona.

LEAVE A REPLY

Please enter your comment!
Please enter your name here