Intelligent Pricing Model: Powered by ML, Guided by GenAI

Written by Toyosi Babayeju | Dec 18, 2025 1:00:00 PM

Introduction

Assessing an individual’s creditworthiness has always relied on a complex blend of financial, behavioral, and market-driven factors. These signals shift constantly, making manual prediction both time-consuming and inconsistent. Modern ML models offer lenders and underwriters a more scalable alternative providing fast, explainable, and maintainable credit insights that balance fair pricing for borrowers with profitable decisioning for institutions.

To ground these concepts in a real example, this case study explores mortgage pricing using Freddie Mac’s 2024 Q1 Single-Family Loan-Level Dataset. While external economic forces also influence rate accuracy, this dataset provides a strong foundation for demonstrating how an ML-driven pricing pipeline operates in practice.

How we will carry out our investigation:

Data exploration
First Step is to understand the data we have, understand their correlations and any missing values. The overall objective of this phase is to simplify the data enough to its essence to help us with the feature engineering phase.
Feature Engineering
This stage will help to prepare the features we decide on for the model training phase. This is where we curate our dataset for the mortgage pricing model, selecting the most prominent features.
Model Training, Evaluation & GenAI
This is where the work so far comes together. Using the engineered feature set, we train and evaluate multiple machine learning models to identify the strongest performer. Each model is tracked through MLflow so we can compare accuracy, stability and overall fit.

Once the champion model is selected, we integrate it into a GenAI layer designed specifically for mortgage lenders. This final step transforms raw model outputs into tailored, easy-to-understand explanations that support real-time pricing conversations and decision-making.

Data Exploration

Before handing anything to a model, we need to get a feel for what the raw Freddie Mac data is actually telling us. This step is all about sanity-checking the dataset, understanding how rates are distributed, and seeing how the main drivers (credit score, LTV, DTI, etc.) behave.

Summary

Dataset size: 214,929 records
Total features: 31

Target variable – Interest Rate (%)

Min: 2.250%
Max: 9.250%
Mean: 6.736%
Std: 0.541%

From the histogram and box plot of interest rates, we can see that most loans are clustered tightly around the ~6.7% mark, with a relatively small spread (standard deviation just over half a percent). The distribution looks roughly bell-shaped with a slight right tail, which lines up with a few higher-rate outliers pushing towards 9% on the upper end and a handful of older/legacy low-rate loans on the lower end.

Target variable – Interest Rate (%)

Min: 2.250%
Max: 9.250%
Mean: 6.736%
Std: 0.541%

The box plot confirms this story, the interquartile range (IQR) is fairly narrow, meaning most borrowers are being priced in a tight band. There are visible outliers both below and above the main cluster. These are important to flag because they could represent special programs, data entry issues, or edge-case borrowers that may distort the model if we don’t handle them carefully.
Next, we start looking at how some key features relate to rate:

Credit Score vs Interest Rate:
The scatter shows the classic pattern we’d expect as credit scores improve, rates generally trend lower, but with noise. There are pockets where similar scores receive slightly different pricing, likely due to other risk factors (LTV, DTI, product type, etc.) that the model will eventually capture. For this this reducing the credit score axis would have yielded better visualisation.
LTV (Loan-to-Value) vs Interest Rate:
As LTV goes up (borrower has less equity), rates tend to creep higher. The scatter is more “cloud-like” than a sharp line, but you can still see a gentle upward slope at higher LTVs, especially close to 80–97% where risk is higher.
DTI (Debt-to-Income) vs Interest Rate:
DTI shows a similar soft relationship: higher DTIs are generally associated with slightly higher rates, but again with a lot of overlap in the middle. This tells us DTI matters, but it’s not the only thing driving pricing.

At this stage, the main takeaways are:

Our target variable is well-behaved (no wild multimodal distribution, reasonable spread).
Credit Score, LTV, and DTI all show meaningful but noisy relationships with rate, which makes them strong candidates for our feature set.
We’ve identified outliers and potential edge cases that we’ll want to treat or at least track when we move into feature engineering and model training.

Feature Engineering

With the raw Freddie Mac tape explored, the next step is to reshape it into something a model can actually learn from. The aim here is simple: keep the economic story of the loan, strip away noise, and add structure where underwriters naturally think in buckets and interactions.

We start from the original loan-level fields and separate them into two groups: original features that we keep largely as-is, and engineered features that encode underwriting logic.

Feature Type	Count	Notes
Original features	11	Core credit, loan size, term, and high-level loan attributes
Engineered features	13	Risk score, interactions, buckets, and simplified categories
Total (ex-target)	24	Before final selection / pruning

The feature engineering stage reduces the raw loan tape into a clean, structured view the model can learn from.

Core numeric fields
credit_score, original_ltv, original_dti, original_upb, num_units and original_loan_term
These describe borrower strength, leverage and basic loan structure.
Core categorical fields
occupancy_status, property_type, loan_purpose and property_state
These define how the property is used and the context of the loan.
Engineered risk features
risk_score, ltv_dti_interaction, credit_ltv_interaction and loan_per_unit
These capture how multiple risk factors combine and help the model recognise higher-risk profiles.
Category groupings
credit_score_category, ltv_category, dti_category and loan_size_category
These reflect the breakpoints underwriters use in real pricing decisions.
Simplified label fields
property_type_simple, occupancy_simple and loan_purpose_simple
These reduce complexity while keeping economic meaning.
Timing features
first_payment_year and first_payment_quarter
These help the model learn changes in market conditions over time.

Once combined, these features form a final set of 20 predictors plus the target original_interest_rate. After removing any remaining nulls, the dataset is saved as a Delta table and becomes the foundation for model training.

Final Modeling Dataset	Value
Records	214,929
Features (predictors)	20
Target	original_interest_rate
Table name	mortgage_data_features

This gives us a compact, model-ready view of each loan that still feels very close to how a human underwriter would describe the file.

Correlation Analysis – What Drives Rate?

Before throwing models at the data, it’s worth sanity-checking how these features move with the interest rate. The simple Pearson correlations with original_interest_rate look like this:

Feature	Correlation with Rate
original_loan_term	+0.251
risk_score	+0.124
original_ltv	+0.114
ltv_dti_interaction	+0.098
num_units	+0.078
credit_ltv_interaction	+0.073
original_dti	+0.046
original_upb	−0.037
loan_per_unit	−0.046
credit_score	−0.184

The picture is reassuring. Longer terms tend to price higher, which comes through as the strongest direct linear relationship. Credit score behaves exactly as expected: as scores improve, rates come down, giving us a clear negative correlation.

Leverage and affordability show up with positive correlations: higher original_ltv, higher original_dti and, more importantly, their interaction ltv_dti_interaction all point towards higher pricing. The interaction terms are doing what they were designed to do, highlight the stacked-risk files where a borrower is both highly leveraged and already carrying a heavy debt load. risk_score pulls these ingredients together, and the positive correlation with rate confirms that this composite view is aligned with the way loans are priced.

Overall, the correlation analysis tells us two important things:

the engineered features behave in a way that matches domain intuition; and
no single feature completely dominates the target, leaving room for the model to pick up richer, non-linear combinations.

With feature engineering complete and the relationships to rate looking sensible, we’re in a good position to move on to the model training and MLflow tracking phase.

Model Training, Evaluation & GenAI

Model Training

Prepare features and target variables, handle categorical encoding, and create train/test splits.

# Define feature groups

numeric_features = [

'credit_score','original_ltv', 'original_dti', 'original_upb','num_units','original_loan_term','risk_score','ltv_dti_interaction','credit_ltv_interaction','loan_per_unit'

]

categorical_features = [ 'credit_score_category','loan_size_category','ltv_category','dti_category','property_type_simple','occupancy_simple','loan_purpose_simple','property_state']

temporal_features = [ 'first_payment_year','first_payment_quarter']

target = 'original_interest_rate'

len(temporal_features)}")

# Create feature dataframe

X = df[numeric_features + categorical_features + temporal_features].copy()

y = df[target].copy()

Split data into training (80%) and testing (20%) sets ensuring balanced representation.

# Train-test split with random state for reproducibility

X_train, X_test, y_train, y_test = train_test_split(

X, y,

test_size=0.2,

random_state=42

)

Then setup MLflow experiment

Initialise MLflow experiment tracking to log all model training runs, parameters, metrics and artifacts.

# Set MLflow experiment

experiment_name = "/Users/xxxxx/mortgage_pricing_models"

# Try to create experiment, or use existing

try:

experiment_id = mlflow.create_experiment(experiment_name)

print(f"✅ Created new experiment: {experiment_name}")

except:

experiment = mlflow.get_experiment_by_name(experiment_name)

experiment_id = experiment.experiment_id

print(f"✅ Using existing experiment: {experiment_name}")

mlflow.set_experiment(experiment_name)

print(f" Experiment ID: {experiment_id}")

print(f"\n📊 All runs will be tracked in MLflow UI")

print(f" Access at: https://{your_url}/ml/experiments/{experiment_id}")

)

Evaluation

For the MLflow experiment I compared three models for predicting mortgage interest rates: a baseline Linear Regression, and two gradient boosting models, XGBoost and LightGBM. MLflow was used to track runs, metrics and parameters so we could pick a champion model based on out-of-sample performance.

Model	Test RMSE	Test MAE	Test R²	Test MAPE (%)
XGBoost	0.468013	0.357783	0.261302	5.417410
LightGBM	0.468177	0.357707	0.260783	5.416796
Linear Regression	0.503227	0.385939	0.145956	5.828756

XGBoost edges out the others with the lowest Root Mean Squere Error (RMSE) and Mean Absolute Error (MAE) and the highest R², so it is selected as the champion model. The difference between XGBoost and LightGBM is extremely small and not practically meaningful, but both clearly outperform the linear baseline, which struggles to capture the complexity in the pricing relationships.

Practical impact of model accuracy

XGBoost’s MAE of ~0.36 percentage points means that, for a typical 6.5% mortgage rate, the model is usually within ±0.36% of the actual rate.
With an RMSE of ~0.47, most predictions fall within roughly ±0.9–1.0 percentage points of the true rate.
On a $300,000 mortgage, a 0.36% error equates to around $65 per month difference in payment.
This level of accuracy is good enough for preliminary pricing and underwriting discussions, while still leaving room for a human underwriter to fine-tune the final offer.

Limitations and what the model misses

An R² of 0.26 shows that the model explains only about 26% of the variation in interest rates.
The remaining 74% is likely driven by factors not captured in this dataset, such as:
- real-time market conditions (e.g. bond yields, macro signals),
- lender-specific pricing overlays and strategy,
- local negotiation and competitive behaviour,
- and long-term borrower relationship history.
This underlines that the model is a decision support tool, not a full replacement for pricing desks, credit policy, or human judgment.

The reason the gradient boosting models outperform Linear Regression comes down to how mortgage pricing really works. The relationship between credit score, LTV, DTI and rate is highly non-linear and full of thresholds: a small change around an 80% LTV or a particular FICO band can move the price more than a simple linear slope would suggest. Gradient boosting handles these kinks and feature interactions naturally, whereas a linear model can only fit straight lines unless we manually engineer a large number of interaction and non-linear terms.

Overall, the takeaway from this MLflow run is that our dataset needs to expand beyond the current features to use external sources to help with our accuracy, however our model particularly XGBoost provides a solid, business-interpretable starting point for a mortgage pricing engine: accurate enough to guide offers, transparent enough to monitor, and still complemented by human oversight for final rate setting.

Explainable Pricing with SHAP & GenAI

To move beyond “black box” predictions, we add an explainability layer on top of the XGBoost model. This combines SHAP values for technical transparency with a GenAI chat interface that turns those numbers into human language for brokers and borrowers.

SHAP-Based Explainability

SHAP (SHapley Additive exPlanations) values quantify how each feature pushes a prediction up or down relative to the portfolio average rate. For our XGBoost model, SHAP lets us see both global patterns and the story behind a single quote.

At the portfolio level, SHAP highlights the features that most consistently move pricing: LTV, credit score, DTI, occupancy, loan purpose and term.
At the individual loan level, each prediction is decomposed into a base rate (6.736% in this dataset) plus a series of feature contributions that sum to the final quoted rate.

To show how this works in practice, we walk through three real loan applications. In each case, the model starts from the average rate of 6.736% and then adjusts up or down based on the borrower profile.

Sample 1 – High LTV, standard owner-occupied file
The predicted rate is 6.889%, about 0.15% above the base rate. The main upward pressure comes from a 95% LTV and a mid-tier credit score around 706, both of which increase perceived risk. This is partially offset by the loan being owner-occupied with a standard purpose and a 360-month term, which pull the rate back down a little. Overall, this is priced as a higher-risk, high-leverage loan with some positive mitigating factors.

Sample 2 – Strong borrower offsetting high LTV
Here the model predicts 6.519%, roughly 0.22% below the base rate. The borrower’s DTI of 50% and excellent credit score of 782 both have strong negative SHAP values, reducing the rate. Although the LTV is again 95% and the term is 360 months, which push the rate up, the combination of very strong credit and behaviourally acceptable DTI more than compensates, leading to a cheaper rate than average.

Together, these examples show that the model’s behaviour is consistent with underwriting intuition: riskier leverage and purposes push rates up; strong credit, equity and owner-occupancy pull them down.

GenAI-Powered Explainability Layer

SHAP gives us numbers and charts; the final step is to turn those into explanations a broker can read in a few seconds and repeat to a customer. For that, we add a GenAI layer on top of the XGBoost + SHAP pipeline.

The workflow is:

For each pricing request, we take the predicted rate, base rate and top SHAP features (values and impacts).
These are passed into a prompted Large Language Model (LLM), which is instructed to write a short, professional explanation: what rate was predicted, the main reasons it’s above or below the market base rate, and one actionable suggestion (for example, “reducing the LTV by increasing the deposit could lower the rate”).
The generated text is returned to the pricing application and stored as an MLflow artifact so that explanations are auditable and reproducible.

This turns the model into a conversational tool: a broker can request a quote, immediately see the numerical breakdown, and also receive a ready-made explanation that is consistent, compliant and easy to share with the borrower.

Conclusion

Together, the ML pricing model, SHAP explainability and the GenAI explanation layer give us a pricing system that is accurate, transparent and ready for real-world use. It allows brokers, auditors and borrowers to understand not just the rate, but the reasoning behind it, turning intelligent pricing into a clear, confident part of the lending process.

View full post