loader

Intelligent Pricing Model: Powered by ML, Guided by GenAI

Introduction

Assessing an individual’s creditworthiness has always relied on a complex blend of financial, behavioral, and market-driven factors. These signals shift constantly, making manual prediction both time-consuming and inconsistent. Modern ML models offer lenders and underwriters a more scalable alternative providing fast, explainable, and maintainable credit insights that balance fair pricing for borrowers with profitable decisioning for institutions.

To ground these concepts in a real example, this case study explores mortgage pricing using Freddie Mac’s 2024 Q1 Single-Family Loan-Level Dataset. While external economic forces also influence rate accuracy, this dataset provides a strong foundation for demonstrating how an ML-driven pricing pipeline operates in practice.

How we will carry out our investigation:
  1. Data exploration
    First Step is to understand the data we have, understand their correlations and any missing values. The overall objective of this phase is to simplify the data enough to its essence to help us with the feature engineering phase.
  2. Feature Engineering
    This stage will help to prepare the features we decide on for the model training phase. This is where we curate our dataset for the mortgage pricing model, selecting the most prominent features.
  3. Model Training, Evaluation & GenAI
    This is where the work so far comes together. Using the engineered feature set, we train and evaluate multiple machine learning models to identify the strongest performer. Each model is tracked through MLflow so we can compare accuracy, stability and overall fit.

Once the champion model is selected, we integrate it into a GenAI layer designed specifically for mortgage lenders. This final step transforms raw model outputs into tailored, easy-to-understand explanations that support real-time pricing conversations and decision-making.

Data Exploration

Before handing anything to a model, we need to get a feel for what the raw Freddie Mac data is actually telling us. This step is all about sanity-checking the dataset, understanding how rates are distributed, and seeing how the main drivers (credit score, LTV, DTI, etc.) behave.

Summary

  • Dataset size: 214,929 records
  • Total features: 31

Target variable – Interest Rate (%)

  • Min: 2.250%
  • Max: 9.250%
  • Mean: 6.736%
  • Std: 0.541%

From the histogram and box plot of interest rates, we can see that most loans are clustered tightly around the ~6.7% mark, with a relatively small spread (standard deviation just over half a percent). The distribution looks roughly bell-shaped with a slight right tail, which lines up with a few higher-rate outliers pushing towards 9% on the upper end and a handful of older/legacy low-rate loans on the lower end.

Target variable – Interest Rate (%)

  • Min: 2.250%
  • Max: 9.250%
  • Mean: 6.736%
  • Std: 0.541%

From the histogram and box plot of interest rates, we can see that most loans are clustered tightly around the ~6.7% mark, with a relatively small spread (standard deviation just over half a percent). The distribution looks roughly bell-shaped with a slight right tail, which lines up with a few higher-rate outliers pushing towards 9% on the upper end and a handful of older/legacy low-rate loans on the lower end.

The box plot confirms this story, the interquartile range (IQR) is fairly narrow, meaning most borrowers are being priced in a tight band. There are visible outliers both below and above the main cluster. These are important to flag because they could represent special programs, data entry issues, or edge-case borrowers that may distort the model if we don’t handle them carefully.
Next, we start looking at how some key features relate to rate: 

  • Credit Score vs Interest Rate:
    The scatter shows the classic pattern we’d expect as credit scores improve, rates generally trend lower, but with noise. There are pockets where similar scores receive slightly different pricing, likely due to other risk factors (LTV, DTI, product type, etc.) that the model will eventually capture. For this this reducing the credit score axis would have yielded better visualisation.
  • LTV (Loan-to-Value) vs Interest Rate:
    As LTV goes up (borrower has less equity), rates tend to creep higher. The scatter is more “cloud-like” than a sharp line, but you can still see a gentle upward slope at higher LTVs, especially close to 80–97% where risk is higher.
  • DTI (Debt-to-Income) vs Interest Rate:
    DTI shows a similar soft relationship: higher DTIs are generally associated with slightly higher rates, but again with a lot of overlap in the middle. This tells us DTI matters, but it’s not the only thing driving pricing.
At this stage, the main takeaways are:
  1. Our target variable is well-behaved (no wild multimodal distribution, reasonable spread).
  2. Credit Score, LTV, and DTI all show meaningful but noisy relationships with rate, which makes them strong candidates for our feature set.
  3. We’ve identified outliers and potential edge cases that we’ll want to treat or at least track when we move into feature engineering and model training.

Feature Engineering

With the raw Freddie Mac tape explored, the next step is to reshape it into something a model can actually learn from. The aim here is simple: keep the economic story of the loan, strip away noise, and add structure where underwriters naturally think in buckets and interactions.

We start from the original loan-level fields and separate them into two groups: original features that we keep largely as-is, and engineered features that encode underwriting logic.

Feature Type

Count

Notes

Original features

11

Core credit, loan size, term, and high-level loan attributes

Engineered features

13

Risk score, interactions, buckets, and simplified categories

Total (ex-target)

24

Before final selection / pruning

The feature engineering stage reduces the raw loan tape into a clean, structured view the model can learn from.

  • Core numeric fields
    credit_score, original_ltv, original_dti, original_upb, num_units and original_loan_term
    These describe borrower strength, leverage and basic loan structure.
  • Core categorical fields
    occupancy_status, property_type, loan_purpose and property_state
    These define how the property is used and the context of the loan.
  • Engineered risk features
    risk_score, ltv_dti_interaction, credit_ltv_interaction and loan_per_unit
    These capture how multiple risk factors combine and help the model recognise higher-risk profiles.
  • Category groupings
    credit_score_category, ltv_category, dti_category and loan_size_category
    These reflect the breakpoints underwriters use in real pricing decisions.
  • Simplified label fields
    property_type_simple, occupancy_simple and loan_purpose_simple
    These reduce complexity while keeping economic meaning.
  • Timing features
    first_payment_year and first_payment_quarter
    These help the model learn changes in market conditions over time.

Once combined, these features form a final set of 20 predictors plus the target original_interest_rate. After removing any remaining nulls, the dataset is saved as a Delta table and becomes the foundation for model training.

Final Modeling Dataset

Value

Records

214,929

Features (predictors)

20

Target

original_interest_rate

Table name

mortgage_data_features

This gives us a compact, model-ready view of each loan that still feels very close to how a human underwriter would describe the file.

Correlation Analysis – What Drives Rate?

Before throwing models at the data, it’s worth sanity-checking how these features move with the interest rate. The simple Pearson correlations with original_interest_rate look like this:

Feature

Correlation with Rate

original_loan_term

+0.251

risk_score

+0.124

original_ltv

+0.114

ltv_dti_interaction

+0.098

num_units

+0.078

credit_ltv_interaction

+0.073

original_dti

+0.046

original_upb

−0.037

loan_per_unit

−0.046

credit_score

−0.184

The picture is reassuring. Longer terms tend to price higher, which comes through as the strongest direct linear relationship. Credit score behaves exactly as expected: as scores improve, rates come down, giving us a clear negative correlation.

Leverage and affordability show up with positive correlations: higher original_ltv, higher original_dti and, more importantly, their interaction ltv_dti_interaction all point towards higher pricing. The interaction terms are doing what they were designed to do, highlight the stacked-risk files where a borrower is both highly leveraged and already carrying a heavy debt load. risk_score pulls these ingredients together, and the positive correlation with rate confirms that this composite view is aligned with the way loans are priced.

Overall, the correlation analysis tells us two important things:

  • the engineered features behave in a way that matches domain intuition; and
  • no single feature completely dominates the target, leaving room for the model to pick up richer, non-linear combinations.

With feature engineering complete and the relationships to rate looking sensible, we’re in a good position to move on to the model training and MLflow tracking phase.

 

Model Training, Evaluation & GenAI

 

Model Training

 Prepare features and target variables, handle categorical encoding, and create train/test splits. 

 

# Define feature groups

numeric_features = [

    'credit_score','original_ltv', 'original_dti', 'original_upb','num_units','original_loan_term','risk_score','ltv_dti_interaction','credit_ltv_interaction','loan_per_unit'

]

categorical_features = [   'credit_score_category','loan_size_category','ltv_category','dti_category','property_type_simple','occupancy_simple','loan_purpose_simple','property_state']

 

temporal_features = [ 'first_payment_year','first_payment_quarter']

 

target = 'original_interest_rate'

len(temporal_features)}")

 

# Create feature dataframe

X = df[numeric_features + categorical_features + temporal_features].copy()

y = df[target].copy()

Split data into training (80%) and testing (20%) sets ensuring balanced representation.

# Train-test split with random state for reproducibility

X_train, X_test, y_train, y_test = train_test_split(

    X, y,

    test_size=0.2,

    random_state=42

)

Then setup MLflow experiment

Initialise MLflow experiment tracking to log all model training runs, parameters, metrics and artifacts. 

# Set MLflow experiment

experiment_name = "/Users/xxxxx/mortgage_pricing_models" 

# Try to create experiment, or use existing

try:

    experiment_id = mlflow.create_experiment(experiment_name)

    print(f" Created new experiment: {experiment_name}")

except:

    experiment = mlflow.get_experiment_by_name(experiment_name)

    experiment_id = experiment.experiment_id

    print(f" Using existing experiment: {experiment_name}")

 

mlflow.set_experiment(experiment_name)

 

print(f"   Experiment ID: {experiment_id}")

print(f"\n📊 All runs will be tracked in MLflow UI")

print(f"   Access at: https://{your_url}/ml/experiments/{experiment_id}")

)

 

Evaluation

For the MLflow experiment I compared three models for predicting mortgage interest rates: a baseline Linear Regression, and two gradient boosting models, XGBoost and LightGBM. MLflow was used to track runs, metrics and parameters so we could pick a champion model based on out-of-sample performance.

Model

Test RMSE

Test MAE

Test R²

Test MAPE (%)

XGBoost

0.468013

0.357783

0.261302

5.417410

LightGBM

0.468177

0.357707

0.260783

5.416796

Linear Regression

0.503227

0.385939

0.145956

5.828756

XGBoost edges out the others with the lowest Root Mean Squere Error (RMSE) and Mean Absolute Error (MAE) and the highest R², so it is selected as the champion model. The difference between XGBoost and LightGBM is extremely small and not practically meaningful, but both clearly outperform the linear baseline, which struggles to capture the complexity in the pricing relationships.

Practical impact of model accuracy

  • XGBoost’s MAE of ~0.36 percentage points means that, for a typical 6.5% mortgage rate, the model is usually within ±0.36% of the actual rate.
  • With an RMSE of ~0.47, most predictions fall within roughly ±0.9–1.0 percentage points of the true rate.
  • On a $300,000 mortgage, a 0.36% error equates to around $65 per month difference in payment.
  • This level of accuracy is good enough for preliminary pricing and underwriting discussions, while still leaving room for a human underwriter to fine-tune the final offer.

Limitations and what the model misses

  • An R² of 0.26 shows that the model explains only about 26% of the variation in interest rates.
  • The remaining 74% is likely driven by factors not captured in this dataset, such as:
    • real-time market conditions (e.g. bond yields, macro signals),
    • lender-specific pricing overlays and strategy,
    • local negotiation and competitive behaviour,
    • and long-term borrower relationship history.
  • This underlines that the model is a decision support tool, not a full replacement for pricing desks, credit policy, or human judgment.

The reason the gradient boosting models outperform Linear Regression comes down to how mortgage pricing really works. The relationship between credit score, LTV, DTI and rate is highly non-linear and full of thresholds: a small change around an 80% LTV or a particular FICO band can move the price more than a simple linear slope would suggest. Gradient boosting handles these kinks and feature interactions naturally, whereas a linear model can only fit straight lines unless we manually engineer a large number of interaction and non-linear terms.

Overall, the takeaway from this MLflow run is that our dataset needs to expand beyond the current features to use external sources to help with our accuracy, however our model particularly XGBoost provides a solid, business-interpretable starting point for a mortgage pricing engine: accurate enough to guide offers, transparent enough to monitor, and still complemented by human oversight for final rate setting.

 

Explainable Pricing with SHAP & GenAI

To move beyond “black box” predictions, we add an explainability layer on top of the XGBoost model. This combines SHAP values for technical transparency with a GenAI chat interface that turns those numbers into human language for brokers and borrowers.

SHAP-Based Explainability

SHAP (SHapley Additive exPlanations) values quantify how each feature pushes a prediction up or down relative to the portfolio average rate. For our XGBoost model, SHAP lets us see both global patterns and the story behind a single quote.

  • At the portfolio level, SHAP highlights the features that most consistently move pricing: LTV, credit score, DTI, occupancy, loan purpose and term.
  • At the individual loan level, each prediction is decomposed into a base rate (6.736% in this dataset) plus a series of feature contributions that sum to the final quoted rate.

 

To show how this works in practice, we walk through three real loan applications. In each case, the model starts from the average rate of 6.736% and then adjusts up or down based on the borrower profile.

Sample 1 – High LTV, standard owner-occupied file
The predicted rate is 6.889%, about 0.15% above the base rate. The main upward pressure comes from a 95% LTV and a mid-tier credit score around 706, both of which increase perceived risk. This is partially offset by the loan being owner-occupied with a standard purpose and a 360-month term, which pull the rate back down a little. Overall, this is priced as a higher-risk, high-leverage loan with some positive mitigating factors.

Sample 2 – Strong borrower offsetting high LTV
Here the model predicts 6.519%, roughly 0.22% below the base rate. The borrower’s DTI of 50% and excellent credit score of 782 both have strong negative SHAP values, reducing the rate. Although the LTV is again 95% and the term is 360 months, which push the rate up, the combination of very strong credit and behaviourally acceptable DTI more than compensates, leading to a cheaper rate than average.

Together, these examples show that the model’s behaviour is consistent with underwriting intuition: riskier leverage and purposes push rates up; strong credit, equity and owner-occupancy pull them down.

GenAI-Powered Explainability Layer

SHAP gives us numbers and charts; the final step is to turn those into explanations a broker can read in a few seconds and repeat to a customer. For that, we add a GenAI layer on top of the XGBoost + SHAP pipeline.

The workflow is:

  • For each pricing request, we take the predicted rate, base rate and top SHAP features (values and impacts).
  • These are passed into a prompted Large Language Model (LLM), which is instructed to write a short, professional explanation: what rate was predicted, the main reasons it’s above or below the market base rate, and one actionable suggestion (for example, “reducing the LTV by increasing the deposit could lower the rate”).
  • The generated text is returned to the pricing application and stored as an MLflow artifact so that explanations are auditable and reproducible.

This turns the model into a conversational tool: a broker can request a quote, immediately see the numerical breakdown, and also receive a ready-made explanation that is consistent, compliant and easy to share with the borrower.

Conclusion

Together, the ML pricing model, SHAP explainability and the GenAI explanation layer give us a pricing system that is accurate, transparent and ready for real-world use. It allows brokers, auditors and borrowers to understand not just the rate, but the reasoning behind it, turning intelligent pricing into a clear, confident part of the lending process.

 

author profile

Author

Toyosi Babayeju