Transforming Customers, Predicting Value: A New Approach to Lifetime Value.

Published on 01/31/2025 by Joseph Ani
Introduction
Around 2019, as I pivoted my career toward human-centered-design, and having worked with customers across finance, hospitality and marketing. I came across the book "Who do you want your customers to become?" The book solidified my understanding that designing impactful solutions goes beyond features and services; instead true innovation involves customer transformation through empowering them with new behaviors, values, skills and aspirations. Advocating for a paradigm shift, it urges businesses to view customers not just as a transaction but investing in the customers' human capital by targeting the imagination and therefore redefining the business in the eyes of the customer.
"Transformation isn't merely about adopting new technologies or platforms. It requires a fundamental shift in how brands communicate, engage and provide value to their audience. By embracing their brand and focusing on consumer-centric startegies, marketers can position themselves for massive success in 2025 and beyond." - Vladimer Botsvadze
Models for Customer Transformation
Advertisers find it challenging when tailoring their commercials to individuals or persons sharing similar interests, missing the target of marketing to their most valuable customers. Invoking the Pareto Principle, or 80/20 rule. Predicting that 20% of customers often generate 80% of sales. By leveraging this principle, companies can improve efficiency and target high-value customers. Focusing on a smaller impactful group of customers using actionable strategies to identify top performers, enhancing engagement using its most valuable resources improves customer lifetime value (CLV).
The types of customer transformation
- Predictive models for CLV calculationa) Probabilistic Models
- BG/NBD Model (Buy till you die):This model predicts the number of future purchases and assumes customers value can be inferred from historical data. It's particularly effective for businesses with non-contractual customer relationships.
- Pareto/NBD:This model predicts which customers will make purchases at any time and the rate at which they stop patronizing the business. The model is especially useful for businesses looking to identify and target their most valuable customers based on their historical transaction patterns.
b) Gamma-Gamma modelThis model predicts monetary value of future transactions. The assumption is that values vary between customers but remain consistent over time for each individual customer.c) Customer segmentation integrated with RFMThis model provides a more nuanced approach of customer behavior segmentation based on Recency, Frequency and Monetary value. - BG/NBD Model (Buy till you die):
- Recency:The duration between the most recent purchase and current date, represented by the gap between the rightmost circle and the vertical dotted line marked "Now". The model predicts which customers are most likely to make purchases in the near future based on their recency score.
- Frequency:The time interval between successive purchases, depicted as the spacing between circles along the single horizontal line.
- Monetary:The model predicts which customers are most likely to make purchases in the near future based on their monetary score.The total amount spent by a customer, represented by the height of the circle. This amount may represent the average order value or the quantity of products the customer purchased.
- Machine Learning ModelsThis model uses regression analysis like linear regression, random forest, gradient boosting (XGBoost), incorporating multiple features for more accurate forecasting and neural networks for large datasets with nonlinear relationships.

Goal of Predictive Customer Lifetime Value
Predicting CLV is a powerful machine learning technique with a significant business impact. Combining a customer's forecasted future value and historical value, we can estimate the monetary value a customer will bring to the business in a defined future time frame based on their historical data. Businesses can develop positive return on investment (ROI) strategies about allocating resources in acquiring new customers and retaining existing ones, growing their revenue and profits by making informed decisions. A successful ML model use will result in identifying high-value customers more likely to spend more frequently than others and respond favorably to your offers and discounts. These high CLV customers should become the key focus of your marketing efforts, helping to maximize revenue and potential growth.
PART 1: PROBABILISTIC MODELS
Training the models
The models are trained on Online Retail data set from the UCI Machine Learning Repository, which is used to predict future customer behavior. The date (YYYY-MM-DD) 2010-12-01 and 2011-12-09 for a UK-based and registered non-store online retail company, which includes information about customers, their purchases, and other relevant data. The models are trained using the types of models mentioned above, which are used to predict future customer behavior.
Data Preparation
The data is cleaned and preprocessed to remove duplicates, missing values, and outliers in order to get a set of workable fields and records that are useful for probabilistic models. The data is then split into training and testing sets. The training set is used to train the models, and the testing set is used to evaluate the models. The models are trained using the types of models mentioned above, which are used to predict future customer behavior.
Data gathering and preprocessing
• Keep only records that have positive order quantities and monetary values - Removing negative or zero values ensures we're analyzing actual purchases rather than returns or errors.
• Aggregate transactions by Customer ID and compute recency, frequency, monetary features as well as the prediction target - This final step creates customer-level metrics:
- Customer identification to differentiate individual customers
- Purchase amount showing how much individual customers spent at a duration
- Dates showing each purchase
- Invoice number uniquely assigned to each transaction

Analysis of the Data Preparation Process
The above diagram illustrates a robust data preparation pipeline that consists of accuarate customer lifetime value (CLV) calculations with several key stages:
- Data Collection: Raw data is gathered from various sources including customer transactions and interactions
- Data Cleaning: Removal of duplicates, addressing missing values, and formatting data standardizing
- Feature Engineering: Creating relevant features and transforming data into useful formats
- Data Validation: Ensuring data quality and consistency before analysis
- Data Integration: Combining quality data and transformed data into a unified dataset
This systematic approach ensures that the data is properly prepared for accurate customer lifetime value (CLV) calculations and predictive modeling. The quality of data preparation directly impacts the accuracy of our predictive analytics models and subsequent marketing ROI optimization.
Average value of each customer

Key Insights & Potential Actions:
The visualization provides a comprehensive look at customer behavior data, allowing for the identification of key customer segments and areas of potential improvement. It enables data-driven decisions by providing various perspectives of the data.
Customer Segmentation: The data shows customers can be segmented into groups based on their RFM characteristics. For example, customers with high recency and high monetary value could be targeted with loyalty programs. The data also shows that customer 12349 is a great customer due to the high monetary value, however customer 12347 is also an important customer due to its high frequency and recency value.
Outlier Detection: The high monetary value of customer 12349 stands out. It should be checked if it is a data entry mistake, or a very valuable customer.
Possible correlation between variables: The plot of T vs Frequency, and the plot of Frequency vs Monetary show that there seems to be a direct relationship between T and Frequency, and Frequency and Monetary.
Making the frequency greater than 1

Distribution Patterns:
• Recency illustrates a bimodal distribution with groups near 0 and 300-350 days, indicating both recent buyers and inactive customers
• T-values cluster around 4745 with outliers at 4787, 4778 and 4716, showing a small variation in observation periods
• Frequency has distinct peaks at 2, 3, 4, 7, and 11 purchases per customer
• Monetary distribution reveals spending groups around 0, 500, 1000 and 2000
2D Distribution & Key Insights:
• Frequency vs Monetary shows a slight positive correlation
• Clear customer segments emerge based on purchase patterns
• Bimodal recency suggests need for inactive customer re-engagement
Recommendations:
• Implement targeted segmentation using RFM metrics
• Develop re-engagement campaigns for inactive customers
• Focus resources on high-value, frequent purchasers
• Conduct deeper analysis of purchase frequency drivers
• Update time series analysis to use actual timestamps instead of customer IDs
Converting the Recency value to weekly

Values
This section shows the progression of RFM values across different customers.
• Recency: The value of recency starts with low values for customer 12346 and increases, to fall back to average values for customer 12356.
• T: The value of T increases linearly across all customers.
• Frequency: The plot shows the frequency value across all customers, with high values for 12352.
• Monetary: The monetary value increases across all customers.
Key Observations from Values:
• Some customers with very low recency, such as customer 12346 have low values for other variables like frequency and monetary values.
• Some customer with high frequency also have high monetary values.
Overall Interpretation
This visualization offers a comprehensive look at customer behavior. The combination of raw data, distributions, 2-d relationships, and time series analysis provides valuable insights:
• Customer Segmentation: The data could be used to segment customers into groups based on their RFM scores for targeted marketing or customer relationship management.
• Identify Key Customers: Customers with high monetary values and frequent purchases should be prioritized for retention efforts.
• Understand Trends: The time series plots highlight changes in customer behavior, including purchase patterns and spending habits, over time.
• Business Insights: This analysis could inform business decisions related to product development, marketing strategies, and customer service improvements.
Possible Next Steps:
• RFM Scoring: Assign RFM scores to each customer to create segments (e.g., “High Value,” “At Risk”).
• Further Analysis: Investigate the reasons behind the bimodal distribution in recency and the specific peaks and dips in each plot.
Translating the age of the customer on a weekly basis

This section visualizes the values of recency, T, frequency, and monetary across time. It is hard to understand from the image the specific timestamp of these values, but it is possible that they are showing the values through the ID of the customer.
• Recency Time Series: Shows the recency of purchases over time, it shows that the recency has varied a lot, and it ended up quite high on the last customer.
• T Time Series: Shows that the T has varied a lot, and it ended up quite high on the last customer.
• Frequency Time Series: The frequency has some variations through time, but ended lower on the last customers.
• Monetary Time Series: The monetary also has high variations, and ended up lower on the last customer.
Key Takeaways & Potential Insights
• Customer Segmentation Potential: The data suggests the potential for customer segmentation based on RFM. For instance, the business might want to target customers with high monetary value with specific actions.
• Recency Varies Widely: The distribution and time series of recency shows that purchases are not very stable across time.
• Frequency Correlation: The 2d plot of frequency vs monetary value shows that these two variables have strong correlation.
• Time Series Trend: The time series plot allows us to see the variations in values of RFM, so it could help to understand how these variables change with time.
Recommendations for Further Analysis
• Deeper Segmentation: Consider applying clustering algorithms (like k-means) to group customers based on their RFM scores to create more targeted marketing strategies.
• Time-based Analysis: Exploring time-based trends, such as seasonal purchasing patterns.
• Correlation Exploration: A deeper look into the correlation between different variables.
• Further Exploration: Explore the data using other types of visualizations that could give even more information.
Establishing the BG/NBD Model
The BG/NBD model is a probabilistic model that predicts the number of future purchases and assumes customers value can be inferred from historical data. It's particularly effective for businesses with non-contractual customer relationships.
We will be using a model fitting of the frequency, recency and time to predict the number of future purchases and the customer value.

Predicting the customers we expect to buy more during the week

The visualization provides a useful overview of the data, but there are clear issues that need to be addressed to understand the data properly. The dataset is very small, and it contains unusual time series plots that show abrupt changes in recency, frequency, and monetary, which are usually expected to monotonically change over time. The expected number of purchases is also strangely 0 for all customers. With a few adjustments, this is a powerful tool for customer analysis.
Potential Improvements and Questions:
• Dataset Representation: The dataset is very sparse, and contains only 5 customers, which is not enough to make any meaningful predictions. And this may be why the predicted purchases are 0 for all customers.
• Expected Purchases: The model is not able to predict the number of purchases for the customers, and it is predicting 0 for all customers.
• Data Preprocessing: The data is not preprocessed properly, the abrupt changes in recency and frequency and monetary plots are very suspicious, and may indicate an issue in the data processing.
• Time Series Analysis: The time series analysis is not done properly, are there errors in the time-series data? The abrupt changes in recency and frequency and monetary plots are very suspicious, and may indicate an issue in the data processing.
• Visualization: The visualization is not done properly, and it is not showing the actual timestamp of the values.
Evaluation of forecast results

Overall Theme:
The chart displays a comparison between the actual number of customers and the number of customers predicted by a model based on the frequency of repeat transactions. It categorizes customers based on the number of transactions they've made during the "Calibration Period."
Observations:
• High Initial Frequency: Both the actual and modeled data show a high number of customers who made 1 transaction in the calibration period and the number of customers steadily declines with the increase in number of calibration period transactions. This suggests the majority of the customers make 1 or 2 transactions in the calibration period and very few make 7 or 8 transactions.
• Model's Underestimation of the initial frequency: The model seems to be underestimating the customer count for the first and second transaction categories. For the first category the actual number of customer is over 800 whereas the model estimates a value of around 750 and the actual count of customers with 2 transactions is also much higher than the model's estimate.
• Model's Overestimation of the later frequency: On the other hand, for later transaction categories the model tends to overestimate the customer counts.
• Similar Trends: Despite the differences in exact values, both the actual and model bars show a downward trend. This means that as the number of transactions goes up, the number of customers making that many transactions in the calibration period declines, as is to be expected.
Insights:
• Model Performance: The model appears to have some predictive power, following the decreasing trend in the data. However, it needs to be tuned better since it underestimates at lower frequencies and overestimates at higher frequencies.
• Customer Behavior: The chart reveals that most customers have very few transactions in the calibration period.
• Business Implications: The insights could be used for customer segmentation, predicting customer lifetime value, or targeting personalized offers/promotions.
Potential Improvements:
• Clearer Y-Axis Label: While "Customers" is fine, adding more context (e.g., "Number of Customers") could enhance clarity.
Establishing the Gamma-Gamma Model
This model predicts monetary value of future transactions. The assumption is that values vary between customers but remain consistent over time for each individual customer.
We will be using the Gamma-Gamma model fitting of frequency and monetary values to predict the customer lifetime value with 3059 subjects, so we could derive the expected average profit per customer.

Calculating the customer lifetime value with BG/NBD and Gamma-Gamma models

Possible Uses
These visualizations are likely being used to:
• Understand Customer Base: The distributions and scatter plots are crucial for gaining a broad understanding of the characteristics and behaviors of customers.
• Customer Segmentation: The results can help the business segment customers based on their value (e.g., high-frequency, high-monetary, recent purchasers vs. low-frequency, low-monetary, less recent purchasers).
• Marketing Strategy: These segmentation results could inform tailored marketing strategies to target different customer groups more effectively (e.g., reactivation campaigns for less recent customers, rewards for loyal customers).
• Predictive Modeling: Variables like these are commonly used as inputs for customer lifetime value (CLTV) or churn prediction models.
Further Exploration
If available, further analysis could be improved by:
• Time Series Analysis: Examine how customer purchasing behavior changes over time.
• Correlation Matrix: Calculate correlation coefficients between variables to quantify relationships.
• Clustering: Apply clustering techniques (e.g., K-means) to group customers based on their RFM profiles.
Customer Segmentation integrated with RFM
This is initialized by dividing the 6-month-old customers into 4 groups based on their RFM scores and labeled as "New", "Promising", "Loyal" and "At Risk".


Analysis of key customer profiles from the data:
• Customer 0: Shows a promising customer with very recent activity (recency: 0), with moderate purchase frequency (2), and low monetary value (1). Despite showing engagement through recent purchases, they currently have a low CLV of 0, suggesting potential for growth.
• Customer 1: Shows another promising customer characterized by high monetary value but moderate recency (50). Their CLV of 0 combined with high-value purchases indicates opportunity for relationship development.
• Customer 2: Categorized as a loyal customer with balanced metrics - medium recency and frequency, although low monetary value. Their CLV of 0 suggests potential for value increase through targeted engagement and promotions.
• Customer 3: Shows a new customer showing interesting patterns with medium recency but high frequency and medium monetary value. Their CLV of 0 indicates they're still in early relationship stages.
• Customer 4: Shows an at-risk customer profile with concerning metrics and liable to churn - low recency and frequency, medium monetary value, and minimal CLV. This segment requires immediate attention for retention strategies.
In Summary
The image shows a robust customer analysis workflow, combining tabular data with visualizations. The goal is to analyze customer behavior through the metrics of recency, frequency, and monetary value, to predict future value and segment customers accordingly. The combination of plots is useful to quickly visualize the main characteristics of the customer data and the fact that there are some promising, loyal, new and at-risk customers, depicts an indication that some segmentation is being performed, mostly driven by the RFM metrics. The ability to generate code and view recommended plots indicates that this is part of a larger analysis tool or pipeline.
PART 2: MACHINE LEARNING MODELS
Machine Learning Model
Machine learning platforms like Vertex AI, Azure ML, and AutoML are solutions to train and deploy machine learning models. These platforms provide a significant advantage in scaling workflow and decision making, as they offer a wide range of tools for data preprocessing, model training, and deployment.
Data Ingestion
Using the same dataset from the publicly availavble Online Retail dataset, containing 541,909 records and timeline, we ingest the data and upload it into the BigQuery platform using the following steps:

Data Preprocessing
The data preprocessing is done using BigQuery, and it is done in the following steps:
• Keep only records that have a Customer ID - This step filters out any transactions without a valid customer identifier, ensuring data quality and traceability.
• Aggregate transactions by day from Invoices - Daily transaction aggregation helps identify purchase patterns and frequency while reducing data noise.
• Keep only records that have positive order quantities and monetary values - Removing negative or zero values ensures we're analyzing actual purchases rather than returns or errors.
• Aggregate transactions by Customer ID and compute recency, frequency, monetary features as well as the prediction target - This final step creates customer-level metrics:
- Recency: Days or Timestamp since last purchase.
- Frequency: Number of purchases by each customer.
- Monetary: Total amount spent by customer.
These features form the basis for customer segmentation and lifetime value prediction.
Model Training
Deep neural networks are a type of machine learning model designed to learn complex patterns in data. The DNN model is used to predict a 3-month-ahead prediction of the customer lifetime value (CLV).
Factors that make the DNN model a good fit for this problem:
• Incorporate additional features: Incoporate additional features into the model, such as customer acquisition costs, discount rates, and profit margins to better predict future customer cash flow.
• Large amounts of data: DNN's are capable of learning from large amounts of data.
• Complex data: DNN's are able to learn from complex data, and they are able to identify patterns and relationships in the data that are not immediately apparent.
Defining and Training the Target Variable
Calculating the expected future profits from the customer's transactions over a 3-month period acts as a threshold for the model to predict the customer lifetime value (CLV). The lifetim library separates the orders into partitions of two.
The model is trained using the following steps:
• Define the target variable: Purchasing orders before the 3-month period are used as the training data.
• Train the model: Purchasing orders after the 3-month period are used as to compute the targeted value.

Exploring data analysis (EDA) in BigQuery
The data analysis is done using BigQuery to know the data and the features that are available for data cleanup and indentify opportunities for data transformation and feature engineering.
Recency: How recently have customers made purchases?

The customer recency is a critical metric for understanding customer behavior and identifying customers of high-value who are likely to churn.
Key Takeaways:
• Customer Recency Distribution: The data reveals that the distribution of "days since last purchase" is not uniform. There is a wide range (from 0 to 274 days), and the standard deviation is high, indicating a significant spread in customer recency.
• Average Recency: On average, the mean of 92.52 days suggests that customers last made a purchase approximately 3 months ago.
• Potentially Inactive Customers: The need for specific marketing or re-engagement strategies is required for customers that haven't purchased anything in a much longer period of time, up to 274 days (over 9 months).
• Data-Driven Analysis: Using BigQuery to retrieve data and Pandas for analysis demonstrates how data can be used to explore and quantify customer behavior.
Frequency: How often do customers make purchases?

The pandas data set shows there are 3330 customers, the minimum number of purchases is 1, 25% have made only 1 purchase, 50% have made 2 or fewer purchases (the median), 75% have made 3 or fewer purchases and the maximum number of purchases at 81, suggesting the existence of a sparse number of highly frequent purchasers.

Key Takeaways:
Data Skewness: The frequency data is highly skewed to the right. This is not unique but a common phenomenon in e-commerce, where a small percentage of loyal customers make a disproportionate number of purchases, while the majority makes a few, if any.
Customer Segmentation: The histogram suggests the existence of different customer segments based on purchase frequency:
• One-time Purchasers: A very large segment of customers only make one purchase.
• Low-Frequency Purchasers: The bulk of the customer base, made just a few purchases.
• High-Frequency Purchasers: A small segment of loyal customers that makes many purchases.
Implications for CLV Modeling:
• Model Complexity: The skewed data may require a different approach than normal models. A model that performs well on customers with many purchases will likely not perform well for customers with few purchases and vice-versa.
• Feature Engineering: Feature engineering is essential to capture the customer behavior accurately, including segmentation, bucketing/binning or creating connecting features for the model to properly learn the different patterns.
• Alternative Modeling: A different modeling approach might be needed for new customers with very little purchase history compared to loyal customers.
Actionable Insights:
• Targeted Marketing: Different customer segments should be targeted with tailored marketing strategies to increase the likelihood of purchase.
• Retention Strategies: A focus on retention efforts for high-value customers, can significantly impact revenue.
• Acquisition Strategies: Analyzing low-frequency purchasers and one-time purchasers to identify opportunities in increasing their engagement.
• Investigate Wholesalers: The “wholesale” customer segment might require separate analysis and potentially different business strategies.
Monetary: How much are customers spending?

Actionable Insights:
• Targeted Marketing: High-value customers should be targeted with specific marketing strategies to promote loyalty and increase purchase frequency.
• Personalization: The personalized customer experience on the platformshould be different for high-value customers vs low value customers, as their needs will be different.
• Retention Strategies: Specific retention strategies should be created to retain the high value customers, as the loss of those customers will have a huge impact on the overall revenue.
• High-value Customers: Like in the purchase frequency analysis, there might be some outliers or customers who are actually wholesalers, these would also be high spenders, so it might be a good idea to analyze them separately.
Predicted 3-month monetary value

The scatter plot tells us the lower range. Showing that there is a tendency for the model to underpredict the actual values in the high range and have some outliers.
Interpretation of the Plot
General Trend: The trend of the data points rising from bottom left to top right indicates a positive correlation, meaning that higher predicted values tend to correspond with higher actual values.
Model Accuracy: Majority of the data points are close to the bottom left corner of the plot suggesting the predictions are quite good for small values, but the farther away from the origin, the less accurate the model becomes. There also appears to be a tendency for the model to underpredict actual monetary values, specially in the high range, as we see points spread above the diagonal. There are also some outliers (the top left most points) that significantly underpredict the actual monetary value.
Establishing a simple performance baseline model

Model Performance Metrics:
• MAE of 1762.06: The model is off by about 1762 units on average. The metric shows that the model is significantly inaccurate.
• MSE of 81502420.93: The model is very far off in terms of its prediction.
• RMSE of 9027.87: RMSE provides a good idea about the error of the model.

Row Evaluation
Let's evaluate the values in the rows shown, one by one:
Row 0:
customer_id: 17597.0
predicted_monetary_value_3M: 4088.74
target_monetary_value_3M: 2044.37
The model significantly overpredicted the spending value of this customer
Row 1:
customer_id: 14680.0
predicted_monetary_value_3M: 3722.52
target_monetary_value_3M: 28754.11
The model significantly underpredicted the spending value of this customer
Row 2:
customer_id: 14688.0
predicted_monetary_value_3M: 513.67
target_monetary_value_3M: 4971.00
The model significantly underpredicted the spending value of this customer
Row 3:
customer_id: 13695.0
predicted_monetary_value_3M: 277.11
target_monetary_value_3M: 2504.13
The model significantly underpredicted the spending value of this customer
Row 4:
customer_id: 15433.0
predicted_monetary_value_3M: 533.30
target_monetary_value_3M: 533.30
The model accurately predicted the spending value of this customer
Overall Observations
Prediction Discrepancy: The predictions for the monetary values are noticeably different from the target values in most cases. This suggests the predictive model may have some accuracy issues. It is important to note that the predictions are not consistent. For some customers (such as customer with id 17597.0) the prediction was significantly overpredicted, while in other cases (customers 14680.0, 14688.0, 13695.0) the prediction was significantly underprediction.
Model Evaluation: Based on the first few rows, the model might not be highly reliable without improvement, as it either over or under predicts with large differences in most of the cases shown.
Monetary Value: Monetary values in the "target_monetary_value_3M" column vary significantly, suggesting different spending habits among customers.
Actionable Next Steps
Based on the initial data preview, here are potential follow-up actions:
• Model Improvement: Analyze the prediction errors on a larger dataset. We could try different machine learning algorithms, feature engineering, and/or hyperparameter optimization.
• Error Analysis: Explore the reasons for the model's under/over-predictions in greater detail. Look for any outliers or patterns in customers with high or low prediction errors.
• Feature Understanding: Gain a deeper insight into the features influencing the predictions to further improve the model.
• Data Preprocessing: Examine for any missing values or data quality problems that might be impacting the prediction quality.
Deploying a Tensorflow Model

We split the data into training, test and validation sets with 2638 examples for training, 339 examples for testing and 353 examples for validation.

General Observations
Model Behavior: The model appears to have a tendency of under-predicting the monetary value of high-value customers, as many points corresponding to high actual values tend to fall below the diagonal line. The model may be struggling to capture these specific cases. This may be due to a lack of data or a non-optimal model fitting for this type of data distribution.
Data Distribution: The data looks highly skewed, with a large number of low-value customers and a relatively few high-value customers, which is a fairly common situation in business contexts.
Generalization: The model tends to perform the best on the Test data, which seems to be closer to a “real-world” scenario, with slightly less accuracy on the Dev dataset.
Data Quality: The chart does not tell us anything about the data quality. There are a few outliers in each of the plots, that may or may not be errors.
Further Analysis: Based on these plots, further analysis is warranted. Analyzing the errors of higher-value customers will be helpful to understand if there is a systematic bias, or if those are just outliers.
Potential Next Steps
• Model Improvement: Investigate why the model struggles to predict high-value customers and potentially consider techniques such as weighting the higher values, or using a more appropriate model.
• Performance Metrics: Assess the model's performance using other metrics that account for imbalanced data, such as MAE or RMSE (mean average error or root mean squared error respectively).
• Alternative Approaches: Evaluate different modeling approaches to see if better performance on all data sets is possible.
Training Loss
The training and validation loss curves for a model below highlights the learning progress, showing a gradual decrease in both the training loss and validation loss. However the validation loss being lower than the training loss should be further investigated.

• Training Progress: The plot shows that the model is indeed learning, as the training loss is decreasing over 10 epochs.
• Validation Performance: The validation loss, also decreases over iterations of 10 epochs. This is a good sign because it indicates the model is generalizing well.
• Potential Overfitting: Since the training loss is higher than the validation loss, this is most likely not a case of overfitting, and the validation set used may have had significantly easier cases. Which suggests that the validation set is either not representative of the population or is significantly easier to solve.
We have trained a model using Vertex, BigQuery and Tensorflow locally. As indicated in the charts above, there is still additional feature engineering and data cleaning opportunities to improve the model's performance on customers with CLV. Some options may include handling these customers as a separate prediction task, applying a log transformation to your target, clipping their value or dropping these customers all together to improve model performance.
Conclusion
The models that are used in this series are not suitable for businesses in which customer attrition is directly observable. Instead, these models assume discretionary customer engagement, such as observed in e-commerce settings. For example, you shouldn't use these models for businesses that are based on subscriptions, customer accounts, or contracts that are cancelleable. In addition, the models that are described in this series are best applied to predict the future value for existing customers who have at least a moderate amount of transaction history.