Model Performance Monitoring Adjustments: A Framework to Respond to Covid-19

By: Liming Brotcke

The COVID-19 pandemic has caused unprecedented model failure and forecasting concerns across the financial industry. The widespread model deterioration cannot not be easily resolved by immediate redevelopment and recalibration effort, largely due to the unavailability of the outcome data before economic recovery reaches a stable state. Banks typically resort to overlays and management adjustments to correct large forecasting variances. This attempt could introduce additional noise and cause model estimates to become more volatile if shocks to the model cannot be self-absorbed in time or the overlay is poorly formed. In this article, I present a framework to evaluate overlay necessity using existing performance monitoring results and other factors, and identify opportunities for performance monitoring adjustments as transitory means for response to model deterioration before more permanent solutions can be implemented.


Model performance ongoing monitoring is a standard practice at large banks as an effective way to evaluate necessity for model adjustment, redevelopment, or replacement. Appropriately selected metrics and thresholds determine probable model break points upfront that enable timely capture of model degradation. A number of well-known changes attributing to model performance breaches include, but are not limited to, model inputs, assumptions, products, exposures, activities, clients, or market and environment conditions. Regardless of the sources, the impact of those changes is enlarged forecasting errors and hence under- or over-prediction. 

The COVID-19 pandemic has had a dramatic effect on economies across the globe and impacted the United States with unprecedented speed and severity. The record swing of key economic indicators such as unemployment  and GDP has triggered massive emergency intervention from the government. The Federal Reserve cut the federal funds rate effectively to zero with a $700 billion round of quantitative easing (QE), and numerous emergency lending programs were launched. The U.S. Senate unanimously passed HR 748, the Coronavirus Aid, Relief, and Economic Security Act (CARES Act) on March 25, the largest economic stimulus in U.S. history with a $2 trillion emergency relief bill that attempted to arrest the financial disruption caused by COVID-19. Lenders also quickly reacted to the CARES Act and announced payment deferral or forbearance plans to assist customers with financial stress. 

COVID-19, and the subsequent recession it triggered, is generally considered by the financial industry as the leading factor of the widespread model degradation. A great deal of effort has been put in place by bankers and regulators to mitigate the unprecedented model deterioration, among which are identification of specific root causes and quantification of the impact at the individual model level as well as the aggregate model. 

This article first discusses the dynamic relationship among model error, model risk, ongoing performance monitoring, and overlay and adjustments. Then the three key components that are essential to an effective ongoing monitoring evaluation are explained. A framework of leveraging performance results based upon an established monitoring plan and additional factors is provided to identify overlay necessity, with suggestions on how to modify current monitoring plans as an alternative means to cope with continued performance deterioration.

Dynamic Relationship Among Model Error, Model Risk, Ongoing Performance Monitoring, and Overlays and Adjustments 

Model risk occurs during model development and use, demonstrated by model errors before being exemplified in other forms. Fundamental errors can take place in any stage of a model build from target design, data sampling, theory application, variable selection, model estimation, and diagnostic testing to implementation. The presence of model errors takes various forms. A series of weak or failed hypothesis tests may suggest multicollinearity, autocorrelation, selection bias, overfitting, etc. If not properly corrected, those errors will cause the model to produce incorrect or less accurate estimates. Models with such errors are prone to faster deterioration and even breakdown. Application of erroneous estimates to business decision-making not only elevates model risk but, through their supporting role to the business, could also increase other risk such as credit, market, interest rate, liquidity, and so on. Model risk stemming from fundamental errors can be mitigated during the development via sample selection, data manipulation, testing alternative model specification, or applying mechanical and technical solutions, though it can never be eliminated. Understanding and quantifying model risk helps improve model performance and increase model longevity. The magnitude of such model inherent risk is a common factor of performance deterioration, and determines the speed and size of breaches. 

A regression model is developed to construct relationships between target and explanatory variables using realized events. Put differently, we analyze the past in order to predict the future. The degree of difference between future and past is the primary reason most models degrade over time and must eventually be replaced. Properly identified metrics and thresholds create the ability to measure the difference between the current data and historical observations and subsequent timely capture of material deterioration.

Rapid advancement of machine- learning algorithms in recent years has caused the financial industry to begin adoption of alternative modeling methods that allow nonconventional pattern recognition and in-production live adjustment. Significant information technology improvement enables computing power to grow exponentially (Moore’s Law). Recent developments in cloud-based computing platforms offer the ability to process and store massive data that was not available for financial modeling five years ago. Although such progress has the potential to make a model more robust, more accurate, and more stable, it does not change the fact that future prediction continues to largely rely on historical experiences, nor does it significantly mitigate model risk. Therefore, performance monitoring becomes more important in the new era of model development. 

Model overlays and adjustments are used by banks to compensate for model, data, or other known limitations. Overlays and adjustments take different formats but ultimately modify model outputs in order to minimize the forecasting variances between model estimates and actuals. Overlays and adjustments can be created along with a model’s development or during a model’s production use, when limitations and weaknesses result in material backtest errors or performance deterioration but the model lacks the ability to self-correct. The COVID-19 pandemic has resulted in greater application of overlays as a compensating factor to mitigate performance breaches.

Key Components of  Ongoing Monitoring 

Effective model ongoing monitoring, analysis, and reporting are critical to identifying, controlling, and managing risk. A comprehensive and effective ongoing monitoring includes evaluation of estimate accuracy, input and output stability, and model robustness. 

The role of statistical models can be described as substantive, empirical, and indirect, even though combinations of those roles are probable in a specific application. The majority of models developed and used by the financial industry are probability based, and, therefore, fall into the empirical category. Accuracy measurement, therefore, is critical for models with a predictive nature. The smaller the variance between model estimates and the actual observations, the more accurate the model. Accuracy can be gauged by several well-known metrics such as Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Deviation (MAD)/ Mean Absolute Percentage Error (MAPE)/Mean Absolute Percentage Deviation (MAPD) etc. MAE is the most natural measure of the average error magnitude while the RMSE is an unambiguous measure of the average error magnitude.

A model, or equation, is “stable” if it can be applied to different time periods of data without significant loss in its prediction accuracy. A stability measure evaluates if the distribution and value range of inputs and outputs have shifted over time, such as between the production data and the development sample, or between different production data. Population Stability Index (PSI) is commonly used to evaluate model stability and is also a useful tool to detect population shift due to business strategy change or shocks to the macroeconomy drivers. Coefficient re-estimation using production data is another common approach to assess model stability, which is particularly useful for models that require regular refitting with updated input data.

Model robustness analysis is also important. It ensures developers’ conclusions hold under different assumptions. Put differently, robustness is the model’s ability to perform effectively under internal and external disturbances. Robustness can be viewed as a model's resilience or sensitivity given dramatic changes to model inputs. Model robustness assessment is typically performed as part of the diagnostic testing during the model development. Some statistical tests should be performed again periodically during a model’s ongoing use to measure changes in model robustness.

Leveraging Performance Monitoring and Other Factors to Determine Overlay Necessity and Performance Monitoring Enhancement Need 

COVID-19 has caused many models used by banks to break in record time, casting doubt on model reliability and banks’ reliance on models when navigating through the crisis. Econometric models are developed with historical data to infer relationships between the target and explanatory variables. When indicators of labor force performance and economic growth suddenly experience massive divergence from recent and past trends, prevalent uncertainty will be introduced into forecasts of those variables. If such forecasts are leveraged as input variables or key assumptions by downstream models, model performance deterioration is likely to worsen as a result of reduced input data reliability. Uncertainties will also make consumers and commercial borrowers alter their behavior triggered by changes in their financial status or expectation of the economic outlook. Conventional statistical regressions and machine-learning models are losing credibility because they rely on the same underlying data and assumptions that fall out of the normal range with the recent economic upheaval. 

Model deterioration resulting from economic shock does not necessarily mean the conceptual soundness of a model is in doubt, but does call for immediate attention to confirm the model remains viable. Large banks tend to conduct sensitivity analyses to quantify the magnitude of forecasting variance, and then leverage the analyses to form model overlays and other adjustments, including in-model adjustments that directly change the value or range of the explanatory variables. The most basic sensitivity analysis typically implies estimating changes in model output by shocking individual or combined independent variables, including macroeconomic indicators and portfolio risk drivers. Model developers can also perform a more sophisticated sensitivity analysis to quantify business risk exposure change due to portfolio mix change driven by the overall outlook of market and economic conditions. This exercise is considered as assumption re-evaluation. For example, model developers can re-estimate the portfolio probability of default (PD) by increasing the percentage of high-risk applicants if anticipating higher losses from newly originated accounts during the pandemic. Similarly, a sensitivity analysis can be designed to help quantify loss magnitude changes by assuming higher default rates for the same underlying risk score bands. 

It is generally recognized that the underlying economic theories are not fundamentally broken. Rather, their impacts based upon past empirical analyses are heavily distorted, diluted, or delayed in the presence of massive government and bank intervention to keep the economy stable. In the recent COVID-19 response, federal and state stimulus checks and generous application of forbearance programs by lenders created sustainable cash flow for obligors and, hence, disguise, to a certain extent, the magnitude of financial stress on borrowers. Among several attempts to address the loss of model accuracy and reliability, the most frequently sought-after approach is the application of overlays and adjustments to correct the widening forecasting errors.

Current model performance results can be used along with other factors to effectively evaluate the necessity of overlays and adjustments. The framework proposed in this article is transparent and repeatable, which can also be incorporated into the aggregate model risk assessment.

The overlay necessity evaluation framework is comprised of three factors and a qualitative adjustor. The three factors are model materiality, performance monitoring results, and pre-existence of an effective overlay. The qualitative adjustor includes additional information that is likely to impact the model performance via direct dependencies or spillover effect. In other words, the evaluation stems from known facts but also incorporates a forward-looking expectation. In order to apply the framework effectively, it is important to first define materiality, which should be tied to specific business objectives such as capital planning, fraud identification, loan origination, etc. This evaluation framework can be used as a one-time assessment or a continuous monitoring tool. The idea is depicted in the diagram below.

model image 1

Overlay Necessity Evaluation Framework

Carefully thought-out overlay and adjustment should be accompanied with a reasonable reporting and monitoring plan to ensure timely evaluation of the effectiveness of the overlay in order to determine when to retire the adjustment. Limitations addressed by overlays and adjustments eventually should be incorporated into the future effort of model recalibration and redevelopment.

It is worth noting that some shocks introduced to econometric models are short lived and self-correcting. Careless overlay application can further worsen the model performance and increase model risk. Performance outcome evaluation becomes more critical during the epidemic. Instead of reacting to the breaches, model developers and users should first assess if existing metrics and thresholds, and their monitoring and reporting frequency, need to be modified to prevent overreaction to severe but transitory shocks to the model. For example, model developers and model users may need to incorporate ultra-short-term metrics or change thresholds in order to manifest impact of short-term shocks to the model estimates. Changing monitoring frequency will not mitigate model deterioration but may offer more timely detection of model misbehavior. For example, if the state level unemployment rate is a model input and has been assessed on a quarterly basis, a month-over-month measure can be added to provide granular information on the unemployment rate each within a quarter. Additionally, model performance with and without adjustments should be compared as well to further evaluate the overall effectiveness of model overlays and overrides.


If the 2007–09 financial crisis raised the awareness of complexity when modeling probability of default, the COVID-19 epidemic has led to increased suspicion of the accuracy of econometric models’ estimates and the reliability of those models among financial institutions. Before outcomes data becomes available and enables model redevelopment, performance monitoring can be leveraged to evaluate overlay and adjustment necessity following the framework proposed in this article. Additionally, re-evaluation of the current monitoring plan can identify opportunities for enhancements in order to more accurately capture source and magnitude of model deterioration.

The views presented in this research are solely those of the author and do not necessarily represent those of the Ally Financial Inc. (AFI) or any subsidiaries of AFI.


1. Cox, D. Role of Models in Statistical Analysis, Statistical Science, 1990, 5:2, pp. 169-174

LIMING BROTCKE, PHD leads the Model Validation Group at Ally. Her extensive industry financial modeling and model validation experience is enriched by a deep understanding of regulatory expectations on large bank supervision, stress testing, model risk management, etc. Prior to joining Ally, she worked at the Federal Reserve Bank of Chicago, Citi Group, and Discover Financial Services.

How the World’s Biggest Lenders Use Machine Learning

Read More

Community Banks and Fintech: A Complex Relationship

Read More

Key Takeaways from GCOR XV

Read More

comments powered by Disqus