When you start integrating data into your marketing strategy, the first questions that needs to be answered are often: who’s going to churn in the next couple months? To whom should we best sell what product? Does that person need this product?
To answer these types of questions one can build a model based on historical data. We look for customers that demonstrated the desired behavior in the past (churn, buying a product, etc) and how they looked like (characteristics and behavior). The assumption is that customers with similar characteristics will have similar behavior in the nearby future. These types of models are called propensity models. Then customers with the highest propensity scores are often used as target of a marketing campaign.
These models work very well to kick-off a marketing data driven strategy but they quickly show their limitations. We will discuss these limitations briefly and propose a different approach.
Multi-products or not multi-products?
Propensity models are very specific at predicting a task and that's it! Each specific prediction task requires its own model and it is quite difficult to combine multiple models into a decision making tool. Different models can lead to counter intuitive results, even sometimes contradictory. For example: If the customer is at risk of churn and scores high for this new product, how should we target him? In which order should we target the customers, can we combine the products, etc ? One way to solve this problem is by combining the models using business acumen and rules. Another way to solve this issue is to use a recommender system, which is able to recommend multiple products, in an integrated way.
Nudge or sludge?
Propensity models do not model the correct target! They predict the natural flow of adoption/churn. Indeed, these models are based on the assumption that similar people will have similar behavior. The second assumption is that if people with a high propensity score are the target of a marketing campaign, there is a strong probability that the desired behavior will happen. Ideally, the target of the model should be: given that we nudge the customer, will he/she answer positively (as expected by the marketeer)? Usually, we create a feedback loop to collect when and which customers have been targeted. Then within a timeframe, we check whether the action has led to the expected results (not churn, buy the product, etc). This allows to do some analytics on the success rate of a campaign and determine who should or not be nudged. To draw statistically relevant conclusions, these types of experiments should be carefully designed to extract most value out of it. These experiments fall under the umbrella of A/B testing: customer population is split into multiple groups and different actions are applied to the groups (treatment, control, random). Then it is easy to calculate the actual uplift of the targeting campaign: how much more response rate in the treatment group vs in the control group. This brings us one step closer to causal inference!
Propensity models do not make any recommendation on which channel to use! Should we contact this customer by phone? Should we contact him by mail? To solve this problem, one could repeat the above strategy: include the channel through which the customer was contacted in the feedback loop and have treatment and control groups for each channel. This approach quickly leads to a lot of possibilities and a complex framework.
Time is all
Propensity models do not take timing into account! Timing is included by design in the target: who is going to be inclined to buy this product within the next XX months. The assumption is that all customers react within a very similar time fashion. This construct leads to batch campaigns, targeting at once many customers. It’s actually very far from the extreme customization, extreme personalization era that we live in.
The future looks bright
Long term strategy is completely omitted in these types of models! Each model proposes candidates with the highest propensity to have the desired behavior in the close-by future. In marketing, each customer’s worth can be approximated and extrapolated into the future, to calculate its expected lifetime value. However, propensity models do not directly integrate the customer's value in their calculations, nor how multiple actions could lead to an increase in worth. Usually, the expected customer value is combined with the propensity score a posteriori and most marketing resources are concentrated on the worthiest customers. The potential that could be in dormant customers is usually untapped.
No risks, no reward
If we only target the customers that the model advises, and collect feedback about these, we are blind to the false negatives: the customers that the model predicts have a low propensity but actually given the right nudge will convert! To solve this problem, randomization needs to be introduced in the strategy, i.e. target randomly some customers with a low propensity score. Like everything in life, bigger risks can lead to bigger rewards. There might well be a niche market with some customers having very specific needs.
Change the paradigm
The dream, the final model, should optimize everything mentioned above, the right timing, the right action, the right channel, the right message, etc.
Why don’t we change the paradigm, the way of thinking?
Instead of answering the question: whom in the next XX months is going to buy this product? The question should be : given the current situation of this customer (state), what is the action that will optimize his/her future value?
By doing this, we model the flow of a customer through a journey rather than propose a single next best action.
So how do we create a marketing strategy that answers that question? And much more.
Reinforcement learning is a type of machine learning algorithm that creates a policy. The policy determines for each state (all information about the customer) what is the action that maximizes an objective (the future value of a customer). The reinforcement learning algorithm learns the best policy by finding a good balance between exploration (randomized action) and exploitation (actions bringing value).
Here are a few tips on the steps to build such a framework:
- Start launching actions with simple propensity models as guideline on whom to target
- Design the target group and control group to collect informative data
- Create a feedback loop as soon as possible and gather maximum information regarding the client targeted, the message, the purpose, the channel, etc
- Be willing to randomize and explore untapped potential (little randomization goes a long way) - for example, instead of launching all campaigns on the same day of the month, shift some customers, send a different message to some customers than others, etc
- In short, all the above serve to collect as much information as possible to be able to train a first reinforcement learning model
- Build an objective function that takes into account the end goal for each customer - the current value of a customer is its net worth plus its future worth minus the cost of the targeting, etc
- Build the datasets with the time frame that suits your business: do we want to be able to send messages every minute, day, week, on a customer trigger?
- Have some deterministic scenarii (customer journeys) with expected outcomes. This will be used to compare models and determine which model is currently the best one to roll out
- Be patient, the reinforcement learning model takes time to learn and the more you allow for randomization, the faster it’ll learn
- Create automatic retraining pipelines, to automatically retrain the reinforcement learning model, evaluate it on the scenarii and redeploy
- Think about a cold start strategy, for when there will be a new product, or a new customer for which there is little data to start with
- Be patient again, it’ll take time before you have captured enough variation and enough information for the best policy to emerge
It’s not easy and you need to design the steps carefully. Yet, it is worth it, if well implemented, a reinforcement learning framework will be able to handle your marketing strategy in an automatic and integrated way.
You can even think bigger and automatically generate the content of mails, add price sensitivity into the model, etc. We’ll not dive into this topic right now but maybe this is a good topic for a future post?
In this very interesting article, the authors go even further, stating that
Intelligence, and its associated abilities, can be understood as subserving the maximisation of reward
So definitely RL and agents models are just starting to be used and their full potential is far from being reached.
We'll have soon a post on how we analysed and built a model for next best actions problem.