A Beginner’s Guide To Responsible Machine Learning & A.I.

Vachan Anand
7 min readMay 7, 2022

--

Photo by h heyerlein on Unsplash

Quite frequently, as a data scientist, I have been asked a question inspired by Sci-Fi movies, “Can A.I. powered robots take on the world and rule the human race?”. As interesting as this question sounds, we are a long way before artificial intelligence systems could have the “intelligence” or the capabilities to think beyond the mathematical functions as they do now. Although these capabilities ( if ever possible ) should be dreaded in future, right now we are facing a danger that requires all our attention immediately.

In this blog, we explore two major areas. Firstly, we look at some concerns with artificial intelligence (A.I.) in general. Additionally, we will try to understand the reason behind the failure of certain A.I. applications and ponder a bit about how those instances could have been avoided.

As we all might know, A.I. systems can add great value to the business. It can help give recommendations on what a customer Anthony might want to watch, or if Anthony bought shoes, it would be worthwhile recommending a nice pair of socks to go with his new shoes. In the financial industry, it is already being used quite extensively to predict people who could potentially default on loans. Moreover, it can be extremely useful for medical use, such as detecting tumours at an early stage to treat cancer patients. However, even with so many use cases, A.I. systems are not magic bullets to solve all problems and in no case should be treated like one.

Note :

By no means, this article tries to discredit A.I. applications. The purpose of the blog is to make people aware of some of the issues with artificial intelligence and machine learning so that we can take steps towards ethical use of it.

Steps toward responsible artificial intelligence (A.I.)

There have been numerous times when an A.I. system built to resolve a problem, blew up in the face of mankind and made matters worse. In this blog, we are going to look at some of those instances and uncover what led to some of those disasters and how they could have been avoided.

1. Right Sample Distribution of Data for Model Training

Firstly, let us have a look at Google’s image recognition system that classified black couples as being gorillas.

To understand the problem, we have to get a bit into understanding A.I.

A.I. systems are usually trained on huge amounts of data. More data the AI system is trained on better are the results! Although this statement widely floats around the world, it is not as simple as it seems.

One of the potential reasons behind such a wrongful classification could be that even though that Google might have trained the algorithm with a huge amount of data, the model may not have been trained on enough samples of people from different races, making it difficult for the machine learning model to classify people with colour. Hence, A.I. systems need to be trained on a good sample distribution to reduce the chances of such ‘racist’ errors.

Likewise, another A.I. system was found to be ‘sexist’ when it classified men and women in different categories. To know more, click on the link below.

2. Well Defined Problem

Although the above problem was unfortunate, it was not as bad as this case we will discuss now, where the application of A.I. almost got a man killed, TWICE!

https://www.theverge.com/c/22444020/chicago-pd-predictive-policing-heat-list

In the article one of the key challenges that the A.I. application was trying to solve was reducing crime. Although as good and noble as the application sounds, not having a well-defined problem, i.e. whether the model predicts the possibility of a crime faced by a victim or committed by a criminal, had a catastrophic result. It is important to understand that artificial intelligence is not a magic bullet to solve any or all problems and hence should not be treated like one.

As the above article reads:

An algorithm built by the Chicago Police Department predicted — based on his proximity to and relationships with known shooters and shooting casualties — that McDaniel would be involved in a shooting. That he would be a “party to violence,” but it wasn’t clear what side of the barrel he might be on. He could be the shooter, he might get shot. They didn’t know.

3. Model / Data Bias

Bias in a model can be defined as the systematic error that occurs in the model. Higher bias would imply low accuracy of the model since the error is high.

However, in a classification problem, even if the model has a low overall bias, it is advisable to check for model bias within each category so as to detect if the model is operating in all fairness. Not doing so may lead to issues described in the article below.

The above article talks about a program aimed at easing the return of criminals to society. However, the release is regulated by a machine learning model. This particularly becomes an issue when the data used to train the model is inherently biased. Therefore, the model was found to have a racial bias.

As the above article quotes :

But thousands of others may still remain behind bars because of fundamental flaws in the Justice Department’s method for deciding who can take the early-release track. The biggest flaw: persistent racial disparities that put Black and brown people at a disadvantage.

4. Model Explainability

One of the things we have to be very careful about when using an application is that all A.I. systems produce results however, just because a result is being produced by no means implies that the problem is solved.

A thorough evaluation is very crucial for any successful A.I. application. This implies we need to dissect the application and understand the reason behind the produced results.

This is made more evident by the article attached below that quotes :

After an audit of the algorithm, the resume screening company found that the algorithm found two factors to be most indicative of job performance: their name was Jared, and whether they played high school lacrosse.

Thanks to the evaluation of the model, the application was scrapped.

While some organisations are working towards more ethical use of A.I., others are still not making enough efforts in that direction. The problem becomes even more significant when the A.I. application is used for policing. This is evident from the articles listed below where a man was arrested purely because the model suggested a crime was committed by the man with no further evidence.

ShotSpotter’s proprietary algorithms are the company’s primary selling point, and it frequently touts the technology in marketing materials as virtually foolproof. But the company guards how its closed system works as a trade secret, a black box largely inscrutable to the public, jurors and police oversight boards.

Whereas A.I. in policing should be used as a way to enable or support the ongoing investigation, the black-box nature of the application made the application and its outcomes inscrutable.

5. Training Application Users

One of the things often not stressed after an A.I. application is successfully built and evaluated is the importance of user training.

More importantly, for the users, it is very useful to understand that A.I. applications are usually based on the concept of correlations and not causation.

Correlation is not the same as Causation.

This becomes more crucial in the area of policing. This claim can be supported by the facial recognition algorithm used for policing below where a man was jailed without a thorough investigation just because the A.I. system matched a fake I.D. to the actual man.

As the article below quotes

According to a police report obtained by CNN, the evidence presented by the police officers that led to Parks’ arrest was a “high profile comparison” from a facial recognition scan of a photo from what was determined to be a fake ID left at the crime scene that witnesses connected to the suspect. The facial recognition match was enough for prosecutors and a judge to sign off on his arrest.

Evidently, it is very important for organisations using A.I. systems to train the application users so as to avoid unethical or unintended use of it. The above case could have been simply avoided by an investigation by the police officers. However, blind trust in such systems by the officers and jury affected an innocent man's life gravely.

Conclusion

Artificial intelligence systems although touted as the answer to any or all business problems should be used with care. Moreover, while using A.I. in domains that are critical such as healthcare, policing, et cetera it should have a framework around its development, evaluation and use.

As many believe technology itself is very beneficial however, it falls upon the application developer and organisations to enforce an ethical use of it.

--

--

Vachan Anand

A consultant with an interest in Data Science, Data Engineering and Cloud Technology.