Building a Personalized Article Recommendation System with TensorFlow Recommenders
Have you ever spent 10 minutes scrolling a news or blog site, only to leave frustrated because none of the content matched your interests? Or conversely, found a hidden gem of an article that felt like it was written exactly for you? The difference between those two experiences is almost always a well-built personalized recommendation system.
For content platforms, personalized article recommendations drive 30-50% higher engagement, reduce churn by 20%, and increase ad revenue by up to 40% according to 2026 content industry benchmarks. But building a production-grade recommender from scratch is notoriously complex, requiring expertise in deep learning, distributed systems, and similarity search.
Enter TensorFlow Recommenders (TFRS): Google's open-source library that abstracts away most of the complexity of building, evaluating, and deploying recommender systems. In this guide, we'll walk you through building a hybrid article recommendation system end-to-end with TFRS, including production deployment tips, 2026 updates to the ecosystem, and best practices to avoid common pitfalls.
Table of Contents#
- What is TensorFlow Recommenders (TFRS)?
- Core Recommender System Concepts You Need to Know
- Our Use Case: Personalized Recommendations for a Developer Blog
- Step-by-Step: Build Your Article Recommendation System with TFRS
- Upgrade to a Multi-Task Model (Retrieval + Ranking) for Better Accuracy
- Production Deployment for Scale
- Best Practices & Common Pitfalls to Avoid
- 2026 Updates for Recommendation Systems
- Conclusion
- References
What is TensorFlow Recommenders (TFRS)?#
TensorFlow Recommenders (TFRS) is an open-source library built and maintained by Google, designed specifically for end-to-end recommender system development. Built on TensorFlow 2.x and Keras, it provides high-level, modular APIs that eliminate the need to write low-level code for training loops, similarity metrics, or deployment integrations.
TFRS supports the full lifecycle of recommender systems:
- Rapid prototyping of custom model architectures
- Built-in evaluation metrics for offline testing
- Native integration with scalable serving tools like ScaNN and TensorFlow Serving
- Support for multi-task learning to optimize multiple business objectives at once
As of 2026, TFRS remains one of the most widely used libraries for production recommendation systems, with new integrations for the recently released Keras 3 and multi-backend support.
Core Recommender System Concepts You Need to Know#
Before we dive into building our model, let's define key concepts that underpin all TFRS workflows:
1. User and Item Embeddings#
Embeddings are dense, low-dimensional vector representations of users and items (in our case, articles) that capture their core characteristics in a shared continuous vector space. For example, one dimension of a user embedding might represent their interest in Python content, while a matching dimension in an article embedding represents how focused the article is on Python. The closer a user and article embedding are in this space, the more relevant the article is to the user.
2. Two-Tower Model Architecture#
The standard architecture for modern recommenders, the two-tower model uses separate neural networks (towers) for users and items. Each tower outputs an embedding for its respective input, and the similarity (typically a dot product) between the two embeddings is used to calculate relevance.
3. Retrieval Stage#
The first step in a recommendation pipeline: filter a large catalog of 10k+ articles down to a small set of ~100 relevant candidates using fast approximate nearest neighbor (ANN) search. This stage prioritizes speed over precision.
4. Ranking Stage#
The second step: score each of the 100 retrieved candidates on the likelihood that the user will interact with it (e.g., click, read for >2 minutes, like). This stage prioritizes precision over speed, and can use richer user and article features.
5. TFRS Tasks#
Pre-built abstractions that encapsulate loss functions, metrics, and training logic for common recommendation use cases. TFRS includes tfrs.tasks.Retrieval for candidate generation and tfrs.tasks.Ranking for scoring.
6. Recommendation Approaches#
- Content-Based Filtering: Recommends items similar to what a user has previously interacted with, using item features (e.g., article text, category, tags). Works well for new items with no interaction history.
- Collaborative Filtering: Leverages user-item interaction patterns (clicks, reads, likes) to learn shared embeddings. Works well for users with existing interaction history.
- Hybrid: Combines both approaches for better performance and to mitigate cold start issues. We will use a hybrid approach for our article recommender.
Our Use Case: Personalized Recommendations for a Developer Blog#
We will build a recommendation system for a mid-sized developer blog with:
- 120k monthly active users
- 6,200 published articles across categories like Python, Machine Learning, DevOps, and Web Development
- Available data:
- User interaction data (user ID, article ID, click, read time, like status) from the past 12 months
- Article metadata (article ID, title, full text, category, tags, author)
- Optional user profile data (self-reported preferred topics, signup date)
Our goal is to show 10 personalized articles on each user's homepage that are most likely to drive long reads and engagement.
Step-by-Step: Build Your Article Recommendation System with TFRS#
We will start with a hybrid retrieval model, then upgrade to a multi-task retrieval + ranking model later.
Step 1: Install Dependencies#
pip install tensorflow-recommenders tensorflow-datasets tensorflow-hubStep 2: Import Required Libraries#
import tensorflow as tf
import tensorflow_recommenders as tfrs
import tensorflow_hub as hub
import pandas as pdStep 3: Load and Preprocess Data#
We use time-based train/test splits (not random splits) to avoid data leakage, since recommendation data is temporal: we train on data from the first 10 months, test on data from the last 2 months.
# Load raw data
interactions = pd.read_csv("user_interactions.csv")
articles = pd.read_csv("article_metadata.csv")
# Convert to TensorFlow Datasets
interactions_ds = tf.data.Dataset.from_tensor_slices({
"user_id": interactions["user_id"].astype(str),
"article_id": interactions["article_id"].astype(str),
"read_time": interactions["read_time"]
})
articles_ds = tf.data.Dataset.from_tensor_slices({
"article_id": articles["article_id"].astype(str),
"title": articles["title"],
"category": articles["category"]
})
# Get unique values for embedding layers
unique_user_ids = interactions["user_id"].astype(str).unique()
unique_article_ids = articles["article_id"].astype(str).unique()
unique_categories = articles["category"].unique()Step 4: Build User and Article Towers#
We build hybrid towers that use both collaborative (ID) and content (text, category) features. Note that both towers must output embeddings of the same dimensionality so their dot product is valid.
class UserModel(tf.keras.Model):
def __init__(self, embedding_dim=64):
super().__init__()
self.user_embedding = tf.keras.Sequential([
tf.keras.layers.StringLookup(
vocabulary=unique_user_ids, mask_token=None
),
tf.keras.layers.Embedding(
len(unique_user_ids) + 1, embedding_dim
)
])
def call(self, user_id):
return self.user_embedding(user_id)
class ArticleModel(tf.keras.Model):
def __init__(self, embedding_dim=64):
super().__init__()
# Collaborative embedding for article ID
self.id_embedding = tf.keras.Sequential([
tf.keras.layers.StringLookup(
vocabulary=unique_article_ids, mask_token=None
),
tf.keras.layers.Embedding(
len(unique_article_ids) + 1, embedding_dim
)
])
# Content embedding for article title using Universal Sentence Encoder
self.text_embedding = hub.KerasLayer(
"https://tfhub.dev/google/universal-sentence-encoder/4",
trainable=False,
input_shape=[],
dtype=tf.string,
name="use_encoder"
)
# Content embedding for article category
self.category_embedding = tf.keras.Sequential([
tf.keras.layers.StringLookup(
vocabulary=unique_categories, mask_token=None
),
tf.keras.layers.Embedding(len(unique_categories) + 1, 32)
])
# Project combined features to match user embedding dimension
self.dense_projection = tf.keras.layers.Dense(embedding_dim)
def call(self, inputs):
id_emb = self.id_embedding(inputs["article_id"])
text_emb = self.text_embedding(inputs["title"])
cat_emb = self.category_embedding(inputs["category"])
combined = tf.concat([id_emb, text_emb, cat_emb], axis=1)
return self.dense_projection(combined)Step 5: Define the Full Retrieval Model#
class ArticleRetrievalModel(tfrs.Model):
def __init__(self):
super().__init__()
self.user_model = UserModel()
self.article_model = ArticleModel()
# Use FactorizedTopK metric to measure retrieval accuracy
self.task = tfrs.tasks.Retrieval(
metrics=tfrs.metrics.FactorizedTopK(
candidates=articles_ds.batch(128).map(self.article_model)
)
)
def compute_loss(self, features, training=False):
user_embeddings = self.user_model(features["user_id"])
article_embeddings = self.article_model(features)
return self.task(user_embeddings, article_embeddings)Step 6: Compile and Train the Model#
model = ArticleRetrievalModel()
model.compile(optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.1))
# Prepare cached, batched datasets for fast training
train_ds = interactions_ds.take(800_000).shuffle(100_000).batch(8192).cache()
test_ds = interactions_ds.skip(800_000).batch(4096).cache()
# Train for 5 epochs
model.fit(train_ds, epochs=5)Step 7: Evaluate Offline#
We use recall@10 as our primary offline metric: it measures the percentage of times the article a user actually interacted with appears in their top 10 recommendations.
eval_results = model.evaluate(test_ds, return_dict=True)
print(
f"Retrieval Recall@10: "
f"{eval_results['factorized_top_k/top_10_categorical_accuracy']:.2f}"
)Step 8: Generate Recommendations for a User#
We use ScaNN (Scalable Nearest Neighbors) for fast, low-latency recommendation generation:
# Build ScaNN index
scann_index = tfrs.layers.factorized_top_k.ScaNN(model.user_model, k=10)
scann_index.index_from_dataset(
articles_ds.batch(100).map(
lambda x: (x["article_id"], model.article_model(x))
)
)
# Get top 10 recommendations for user ID "1234"
_, recommended_article_ids = scann_index(tf.constant(["1234"]))
print(
f"Recommended article IDs for user 1234: "
f"{recommended_article_ids.numpy()[0]}"
)Upgrade to a Multi-Task Model (Retrieval + Ranking) for Better Accuracy#
For even better performance, we can build a multi-task model that jointly optimizes for both retrieval (candidate generation) and ranking (predicting read time). This approach leverages both implicit (click) and explicit (read time, likes) feedback signals, and typically improves NDCG@10 by 15-20% compared to a single-task retrieval model.
Key benefits of multi-task models for article recommendations:
- Combine multiple objectives (e.g., maximize clicks and long read time)
- Reduce overfitting by sharing representations across tasks
- Improve performance for low-data users and items
Here is a sketch of how a multi-task model differs from our retrieval-only model:
class MultiTaskArticleModel(tfrs.Model):
def __init__(self):
super().__init__()
self.user_model = UserModel()
self.article_model = ArticleModel()
# Retrieval task
self.retrieval_task = tfrs.tasks.Retrieval(
metrics=tfrs.metrics.FactorizedTopK(
candidates=articles_ds.batch(128).map(self.article_model)
)
)
# Ranking task: predict read time
self.ranking_task = tfrs.tasks.Ranking(
loss=tf.keras.losses.MeanSquaredError(),
metrics=[tf.keras.metrics.RootMeanSquaredError()]
)
def compute_loss(self, features, training=False):
user_embeddings = self.user_model(features["user_id"])
article_embeddings = self.article_model(features)
retrieval_loss = self.retrieval_task(user_embeddings, article_embeddings)
# For ranking, compute predicted read time from embedding similarity
predicted_read_time = tf.reduce_sum(
user_embeddings * article_embeddings, axis=1, keepdims=True
)
ranking_loss = self.ranking_task(
labels=features["read_time"],
predictions=predicted_read_time
)
# Weighted combination of both losses
return retrieval_loss + 0.5 * ranking_lossYou can find a full implementation of a multi-task recommendation model in the official TFRS multi-task tutorial.
Production Deployment for Scale#
Once your model is performing well offline, deploy it to production with these tools and strategies:
Serving Infrastructure#
- ScaNN Index: For low-latency (sub-10ms) nearest neighbor search even for catalogs with 1M+ articles. ScaNN uses advanced quantization and partitioning techniques to achieve near-exact recall at a fraction of the brute-force cost.
- TensorFlow Serving: Expose your model as a REST or gRPC endpoint for integration with your blog backend. Supports model versioning and hot-swapping without downtime.
- TFLite: For on-device recommendations on mobile apps, enabling offline personalization without sending user data to the server.
Operational Practices#
- Real-Time Pipeline: Update user embeddings in near-real-time as users interact with content, to reflect changing interests immediately. Streaming frameworks like Apache Kafka or Apache Flink can feed new interaction events into your embedding update pipeline.
- A/B Testing: Run online A/B tests to compare new model versions against your existing system, tracking metrics like click-through rate (CTR), average read time, and bounce rate.
- Regular Retraining: Retrain your model every 1-2 weeks to handle new articles, new users, and data drift. Automate retraining with scheduled pipelines.
Best Practices & Common Pitfalls to Avoid#
Best Practices#
- Start simple, iterate: Begin with a basic collaborative filtering model, then add content features, then move to multi-task models. Don't build the most complex model first.
- Use time-based train/test splits: Random splits cause data leakage and overestimate model performance.
- Monitor both offline and online metrics: Offline metrics like recall@10 don't always correlate with real-world engagement. Always validate changes with A/B tests.
- Add diversity to recommendations: Include a small diversity penalty in your ranking score to avoid showing users the same type of content repeatedly.
- Handle cold start explicitly: For new articles with no interaction history, use only content embeddings (text, category) to generate recommendations. For new users, recommend popular trending items until you have interaction data.
Common Pitfalls#
- Popularity bias: Models tend to over-recommend popular articles, which reduces discovery of niche content. Mitigate this by down-weighting popular items in your loss function or applying inverse-propensity scoring.
- Data sparsity: Most users only interact with 2-3 articles per month. Add content features to fill in gaps in interaction data.
- Slow inference: Naive KNN search is O(n) per query, which is too slow for large catalogs. Always use ScaNN or another ANN library for production serving.
- Ignoring fairness: Models may under-recommend content from underrepresented categories or authors. Use fairness metrics to audit your model regularly.
2026 Updates for Recommendation Systems#
The recommendation system ecosystem has evolved significantly in the past two years:
- KerasRS: A new library built on Keras 3 with multi-backend support (TensorFlow, JAX, PyTorch). It includes state-of-the-art transformer-based recommendation models and built-in fairness tools. Google's 2025 developer blog demonstrates building a production-ready recommender in 10 minutes with KerasRS.
- Transformer-Based Recommenders: Models like RecBole 2.0 use transformers to capture long-range user interest patterns, improving performance for users with long reading histories.
- Real-Time Personalization: Integration with streaming frameworks like Apache Flink enables model updates in seconds, rather than days, for hyper-personalized experiences.
- Bias Mitigation Tools: Modern TFRS and KerasRS include built-in metrics to measure and reduce bias against niche content and underrepresented user groups.
Conclusion#
TensorFlow Recommenders makes building production-grade personalized article recommendation systems accessible to developers of all skill levels. By using a hybrid two-tower architecture, you get the best of both content-based and collaborative filtering, while multi-task learning lets you optimize for multiple business goals at once.
Key takeaways:
- Start with a simple retrieval model, then add ranking and multi-task capabilities as needed
- Always use time-based splits for training and testing
- Validate all model changes with online A/B tests
- Leverage 2026 tools like ScaNN for low-latency serving and KerasRS for multi-backend support
With these steps, you can build a recommendation system that boosts user engagement, reduces churn, and helps your readers find content they love.
References#
- TensorFlow Recommenders Official Documentation
- TensorFlow Recommenders GitHub Repository
- Basic Retrieval Tutorial with TFRS
- Multi-Task Recommenders Tutorial
- Google Developers Blog: Build a Recommender in 10 Minutes with KerasRS
- GeeksforGeeks TFRS Guide
- TensorFlow Recommendation Systems Resources
- TFRS PyPI Page