How To Identify Less Discriminatory Lending Alternatives and Why the Search Matters

By Tobias Schaefer and Dmitry Lesnik

2/6/2024

In a nutshell, an LDA is an alternative decisioning model for lending that maintains accuracy, but is less discriminatory, and ultimately less biased.

A new generation of machine learning tools is available to help lenders build fair lending models. While fairness in lending has become a topic of broad interest, it is particularly crucial for compliance and fair-lending teams to be aware of novel methods that can help them succeed in their roles. This article explains the inner workings of these new approaches and analyzes why reducing bias in decision making is critical.

When it comes to fair lending, regulators have sent a clear message to lenders with their recent actions: Demonstrate that you have made a reasonable effort to find “less discriminatory alternatives” (LDAs). In a nutshell, an LDA is an alternative decisioning model for lending that maintains accuracy, but is less discriminatory, and ultimately less biased.

The Equal Credit Opportunity Act (ECOA), together with Regulation B—which detail how to proceed with credit applications—require lenders to provide explanations in cases of credit denial and other adverse credit decisions, and ensure there is no discrimination against protected groups.

These days most lenders use sophisticated decisioning models based on hundreds or even thousands of input variables. All too often, these variables include information about a protected group that can make it a challenge to be sure there was no disparate treatment in the decision process.

Lenders are interested in employing advanced machine learning algorithms, given their high performance in terms of predictive power, to help. But regulators require the lender to search for LDAs that ensure bias is reduced to a minimum¹ and that their chosen approach is sufficiently interpretable to provide transparent explanations in case of adverse actions. In addition, financial institutions are subject to UDAAP (Unfair, Deceptive, or Abusive Acts or Practices) regulations. Regulators such as the Consumer Financial Protection Bureau (CFPB) use this regulatory framework to set expectations regarding fairness in lending.

Quality of a Decisioning Model

The quality of any decisioning model can be assessed in various ways, and for business reasons, companies strive to make their decisioning models as accurate as possible. In lending, this means that the model computes its best estimate of the likelihood of customer default. Based on this, the lender may deny loans to customers who are likely to default and grant loans to customers who are likely to not default.

As more loans mature, the lender will have more data on a large number of loans, including some that have been fully repaid and some that have defaulted. Using this data set, the lender can assess the accuracy of its model and, if necessary, reassess its modeling approach within a framework of profitability goals and risk appetite.

Accuracy is not the only measure of model quality. Now fairness has become a critical goal of lender models. A way to quantify fairness is by analyzing lending decision rates for disparities between protected groups and a reference group. Here, the term “protected group” refers to those protected by law from discrimination in the context of credit decisions, for example regarding their race, gender, and age.² The term “reference group” typically refers to a majority group or the complementary part of the protected group in question.

In assessing fairness, lenders typically need to analyze their decisioning system to see if there is disparate impact in the process. If they find any, they need to assess if it can be justified—for example by demonstrating that it originates from business necessity. Then, they need to look for LDAs. After all, there might be algorithms that are equally accurate and in line with the business goal, but less discriminatory, with a much lower (or no) disparity of acceptance rates. While disparity of acceptance rates is a common measure to assess fairness, there are many different ways to quantify bias in general.³

Accuracy and Fairness Together

Imagine a company with a decisioning tool that has great accuracy but shows disparity in acceptance rates. Regulators want to know whether the company could use a different model that is similarly accurate but with less disparate acceptance rates. Finding this LDA can be daunting without the right tools.

Over the last decade, there has been a big increase in the use of machine learning-based decisioning models. Consequently, lenders have a variety of algorithms to choose from when developing their own, often proprietary, approach. While, discussing them in detail is beyond the scope of this article, it is noteworthy that almost all approaches possess some tuning parameters.⁴ These tunable parameters, combined with the fact that the models all compute the default likelihood based on the input data for a particular customer, have given rise to ideas on how to search for LDAs. Let’s discuss some basic examples.

Dropping Variables

In this approach, excluding input variables one by one can generate a family of models to evaluate. Here you are trying to isolate and drop variables that correlate to a protected class (zip code, for instance, is sometimes correlated with race). Although regulations prohibit identifying the protected class in a model, correlated variables can serve as proxy of the protected class, allowing the model to introduce disparity. The (naive) idea is that by removing proxy variables from the model, one could eliminate the model’s disparity. This approach, however, is often problematic.

Imagine that in a biased model, one of the input variables used to estimate creditworthiness is a customer’s annual income and that dropping this feature significantly reduces bias. Typically in this situation, leaving out the annual income as an explanatory variable will greatly reduce the model’s accuracy. The lender would have to choose between a model that is fair but inaccurate and the original model, which is accurate, but not fair. In fact, most model alternative scoring models that are generated in this way will be either much less accurate than the original model or, if the accuracy is close to the original model, the lender will likely see only marginal improvements regarding bias in the decision-making. In today’s world, this and similar approaches are hardly sufficient to demonstrate reasonable effort to search for an LDA.

Tuning Parameters

We’ll explain this approach using one particular algorithm, logistic regression, which is used by a wide range of companies and institutions because of its relative simplicity and interpretability. Typically, in logistic regression, a model is created by choosing a set of parameters (“regression coefficients”) intended to make the resulting model as accurate as possible. Once the optimal configuration of these coefficients is found, investigation can determine whether changing the parameter configuration can reduce bias without drastically changing accuracy.⁵

It is a more sophisticated approach than dropping variables. The downside is that accuracy often falls short, particularly in the presence of nonlinear decision boundaries. In other words, it is highly likely that more attractive model approaches in terms of fairness and accuracy exist (random forests, gradient boosted trees, and support vector machines, to name a few).

And it is important to note that accuracy and fairness are not the only aspects lenders consider when choosing a modeling approach. Interpretability, the comfort of regulators, and implementation constraints might also play a role.

Probabilistic Rules Models - A New Generation of Tools

There are a variety of other approaches to seek out LDAs. A common one is to train multiple models and compare their performance in terms of accuracy and fairness. A second, quite different approach is adversarial debiasing: A second model is trained to predict a protected group based on the output of the first model. If the first model can be modified in a way that it becomes impossible to perform this prediction, the first model is debiased.

Clearly, considering a variety of modeling algorithms is necessary to craft a modeling approach that (a) is flexible enough to cope with nonlinear decision boundaries to ensure accuracy; (b) possesses tunable parameters to allow for bias mitigation; and (c) is genuinely transparent so that lenders will feel confident in their decision-making when using it.

A class of algorithms that ticks these boxes are models built on probabilistic rules. While still unknown to some, these models have become increasingly popular over the last decade and offer an elegant and efficient way to search for LDAs.

Rooted in the mathematical theory of probabilistic graphical networks (in particular, so-called Markov logic and Markov random fields⁶), models built on probabilistic rules connect the input variables (loan amount, annual income, debt-to-income ratio) with the output (for example, the likelihood of default) via rules that, in structure, resemble business rules used by many companies. A possible example of such a rule could be:

IF (income is < $30,000 AND debt to income is> 40%) THEN (default with certainty factor of 10%)

A probabilistic model can include many such rules, some of which—unlike typical knock-out criteria—won’t actually disqualify customers from loans immediately but rather lower their lendability score. Other rules might reflect favorably on the decision score. Together, these rules determine the overall creditworthiness of the customer.

The probabilistic rules approach satisfies the acceptance criteria mentioned above by providing flexibility to define rules accounting for many non-linear decision surfaces; by offering tunable parameters, like the 10% in the rule above, to search for LDAs; and by defining the rules in business terms, which makes the model interpretable.

Probabilistic Rules in Replica Models

For many reasons, lenders might not be interested in immediately rebuilding their models using probabilistic rules. They can still get the advantages of probabilistic rules models in a variety of ways.

They can use their current model to train a probabilistic rule model as a replica model and work from this replica to mitigate bias, with minimal impact on performance. The replica probabilistic rule model mimics the decision of the model used by the lender. Therefore, it can be used to investigate whether it is possible to mitigate disparities in the acceptance rate.

Note that when people talk about replica models, they usually imply a simplified model that only approximately represents the original model. In the case of probabilistic rule models built as replicas, we are talking about an interpretable model that is as accurate as the original.

Still, there are practical challenges with this approach. As mentioned previously, acceptance rate disparities do not always mean discrimination. Alternative metrics are, for instance, equalized odds or equal opportunity, and some of these metrics require not only protected-class information but also performance information (for example, who actually defaulted on the loan). Class information is not always present in the data, which leads to additional uncertainties in the bias estimation.

If it turns out that bias mitigation via the replica model is successful, there are ways to deploy this model to make the decision process fairer—for example, by providing a “second look” alternative for a certain group of customers who were declined by the lender’s primary model. The probabilistic rules model can indicate which customers to choose and provide further information on how the results will impact the bias of the overall process.

Appropriately recalibrating the probabilistic rule model—or even redesigning the model as a whole—might even increase accuracy (while still being fair). This might happen if the lender has not recently updated its model or if its model is based on less advanced algorithmic techniques. In this case, the search for an LDA can not only make the decisioning process more equitable but also increase profitability.

Taking the Search Further

So far, our analysis of the search for LDAs focused on models operating with the same data as the lender’s original model. Remember that the model’s accuracy is determined by a data set obtained from given loans. Data from customers rejected by the decision maker are not part of this set because they were not given a loan in the first place.⁷

From this point of view, it is entirely possible that if loans were extended to some of these rejected candidates, this might have an impact on the disparity in acceptance rates. In addition, this might open the possibility of obtaining loans for customers who were excluded by the decision-making algorithms that are in place. From the perspective of the lender, however, it is crucial to be able to assess the risk when opening up loans for a different borrower market. Using traditional models can be challenging, because lenders do not necessarily possess actual data for this market segment that can be used to recalibrate their model.

A crucial, and maybe less obvious, aspect of probabilistic rule models is that the rules used in the model do not necessarily have to stem from data but can also be specified by domain experts. This opens an entirely different avenue to look for LDAs, again with the potential to gain access to new markets and, therefore, end up with an improved model that is both profitable and fair.

Final Thoughts

When it comes to searching for less discriminatory alternatives (LDAs), many lenders might rely on third-party models. Being able to ask the right questions of those models is crucial. Modern decisioning models need to be accurate, fair, and interpretable. It is worth spending the time to understand how these models meet these goals, in particular regarding which alternative models have been considered as LDAs. After all, making the right choice among these models might not only ensure fairness but also increase revenue.

Footnotes

Note that the presence of some (often residual) bias may be justified by a legitimate non-discriminatory business purpose.
In practice, information about these attributes is often not part of the data and needs to be estimated. A commonly used method to address this is Bayesian Improved Surname Geocoding (BISG) which predicts the race or ethnicity of an individual using the individual's surname and geocoded location.
This metric is also called “demographic parity.” Examples of other metrics are “equalized odds” or “equal opportunity.”
Note that, in general, there are two types of parameters we are talking about—hyper-parameters and intrinsic model parameters. Hyper-parameters (available to the user) define the high level model configuration, whereas the intrinsic parameters (typically not visible to the user, and automatically determined by the model fitting process) ultimately determine the model properties, such as accuracy and bias.
Note that, in the context of logistic regression, several parameters and hyper-parameters are tunable. Most models use a regularization hyper-parameter (typically known as L1-parameter or L2-parameter characterizing the strength of the chosen regularization method). The output of the model are parameters (beta coefficients) that depend on the regularization. These coefficients can also be tuned, but changing them from the optimal values results in loss of accuracy.
For more details about Markov random fields and their applications in the context of machine learning, in particular probabilistic graphical networks, we refer the reader to the book by Koller and Friedman [Koller, D. and Friedman, N, (2009): Probabilistic Graphical Models: Principles and Techniques. Cambridge, MA, MIT Press.] as well as work by Domingos et al. [e.g. Richardson, M., Domingos, P. (2006): Markov Logic Networks. Mach Learn 62, 107–136.].
The process of incorporating this data into the analysis is known as reject inference. It has been studied widely over the recent years, but much less been implemented in practice.

Tobias Schaefer’s research focuses on modeling complex systems with applications in applied mathematics, data science, and finance. He is a professor in mathematics and physics as well an advisor of Stratyfy Inc., a company that focuses on accelerating financial inclusion for people and mitigating risk for financial institutions.

Dmitry Lesnik is a co-founder and chief researcher at Stratyfy Inc. His main research area covers probabilistic logic, symbolic reasoning, and interpretable machine learning. Dmitry has a background in theoretical physics and financial mathematics.

How To Identify Less Discriminatory Lending Alternatives and Why the Search Matters

In a nutshell, an LDA is an alternative decisioning model for lending that maintains accuracy, but is less discriminatory, and ultimately less biased.

Explore More

Optimizing RCSA: An Interview With the Creators of RMA and PwC’s Risk and Control Self-Assessment Survey

FBI Tips To Help Community Banks Tackle Cyber Risk

The Workout Window: Stressed Out About Office Space