From Behavioural Signals to Smarter Recommendations: Redesigning an Affinity Algorithm for Better Personalisation

Understanding User Behaviour to Drive Personalisation at Scale

Modern digital platforms thrive on their ability to understand and predict user preferences. Behind every personalised homepage, targeted notification or relevant recommendation is the ability to translate behavioural signals into meaningful representations of user interests.

This post explores how a large-scale media platform redesigned its affinity model, the core system responsible for mapping user behaviour to content preferences, to make it more stable, efficient and interpretable.

Capturing Behavior Through Touchpoints

Every time a user interacts with a digital ecosystem whether by watching a video, reading an article or clicking on a headline, a data signal is created. These signals, which are collected from various touchpoints such as video, news and sports platforms, provide the basis for understanding users.

To capture and unify these events, the data platform uses Opensnowcat, an open-source behavioural data pipeline. Developers configure backend systems to emit structured events for every significant interaction, such as:

Page impressions: when a user views a page or piece of content
List impressions and clicks: capturing engagement with content feeds
Session details: device type, browser version, screen size, and other contextual data

Each of these events adds to the behavioural dataset, providing a comprehensive picture of how users consume content.

Complementary Data: Resume Points and Engagement Tracking

In addition to event tracking, the platform also monitors resume points (signals that indicate when users begin, pause and finish consuming content). These are processed in near real time through event streams and stored in DynamoDB.

When combined with Opensnowcat events, this data creates a comprehensive behavioural timeline across all products, enabling the platform to accurately quantify user engagement patterns and preferences.

Translating Behavior into Insights: The Role of Affinities

All captured behavioural data feeds into a system that produces affinity scores, numerical representations of users' interests and habits.

These affinities describe how strongly a user is associated with particular topics, formats or creators. For instance, frequently interacting with political news will strengthen a user’s affinity for politics. These scores are widely used across the ecosystem to:

Power recommendation engines
Enable push notifications for relevant content
Support churn prediction models
Drive data analytics and segmentation

Behind the scenes, each user’s affinities are updated continuously through event-based pipelines. These pipelines process incoming events containing metadata such as tags, categories and authors, updating user profiles in the affinity store as they do so.

Using Affinities Across Systems

The affinity model acts as the central intelligence layer for personalization. Downstream systems consume affinity data for various use cases:

Recommender Systems: Graph-based models use affinities as input features to predict what content users are most likely to engage with next.
Segmentation Tools: An internal interface allows teams to define audience segments (“users who love documentaries,” “fans of a specific show”) using affinity percentiles.
Push Campaigns: Notifications are targeted to segments with high affinities when new seasons or special events go live.
Homepage Personalization: Affinities influence how each user’s landing page is constructed, prioritizing the most relevant content categories.

Through these mechanisms, affinities became a cornerstone of personalization across the organization.

When Growth Becomes a Problem: The Limits of Exponential Models

The original affinity model rewarded both recency and frequency of user interactions using an exponential inflate constant. While conceptually sound, this approach had a hidden flaw: numbers grew uncontrollably.

With each new event, affinity scores inflated exponentially, quickly reaching astronomical values, sometimes exceeding a trillion. This led to a cascade of issues:

Storage inefficiency: The affinity table ballooned to terabytes in size.
Index overload: Database indices no longer fit in memory.
Data explosion: Users averaged 700 affinities each, some surpassing 2,000 due to metadata proliferation.
Loss of meaning: Scores became too large to interpret or normalize for downstream models.

The model had become mathematically unstable, making it increasingly difficult to extract actionable insights or perform consistent analysis.

A Rethink: From Inflation to Decay

The solution came from flipping the logic entirely. Instead of inflating scores with each interaction, the new model applies exponential decay (where the importance of past interactions gradually decreases over time).

The approach uses a half-life of 30 days. Interactions that occurred today have full value, while those 30 days old are worth half as much. This ensures that scores reflect both engagement frequency and recency without ever growing unbounded.

Benefits of the Decay-Based Model

Numerical stability: Values remain within a manageable range.
Comparability: New and long-term users can be evaluated on the same scale.
Efficiency: Only recent (~200 days) data is required to maintain accuracy.
Simplicity: The system tracks minimal per-user data - just timestamps and content IDs, while retrieving metadata from cache when needed.

This inversion eliminated the runaway growth problem while retaining behavioral richness and temporal sensitivity.

Engineering Efficiency: Smarter Data Handling

The redesign also included major architectural optimizations:

Metadata caching: Content attributes are stored in-memory for instant retrieval.
Key-value storage: User interactions are saved in compressed byte arrays for lightweight querying.

These changes dramatically reduced storage costs and improved query performance, turning a once-terabyte-scale bottleneck into a scalable, fast pipeline.

Managing Change Across Teams

An algorithmic fix is only half the challenge; the real test lies in organizational adoption. Since affinities power numerous systems, updates required careful change management. Guardrails were introduced to control which affinity categories exist, ensuring consistency and removing unused definitions.

All values are now normalized between 0 and 1, simplifying integration with recommender systems, clustering models, and analytics tools. Migration is ongoing, with legacy pipelines gradually being replaced as teams transition to the new standard.

Outcomes and Learnings

Initial results demonstrate improved clustering behavior and richer signal density in downstream tasks. The refined decay-based approach preserves more granularity in mid-range affinity values, enabling models to detect subtle differences between user groups.

Beyond the numbers, the new system is simpler, faster, and easier to reason about, ensuring that personalization remains sustainable as data and traffic continue to grow.

Key Takeaways for Data and ML Teams

Behavioral data pipelines are the foundation of meaningful personalization.
Affinity models should balance recency and frequency, but avoid unbounded inflation.
Exponential decay provides mathematical stability and interpretability.
Minimal data storage paired with caching can achieve massive efficiency gains.
Change management and stakeholder alignment are critical when algorithms sit at the core of business processes.

By combining solid mathematical modeling with pragmatic engineering, it’s possible to transform behavioral data into scalable, transparent personalization systems that truly understand their users.