Data Skeptic

Data Skeptic

Author: Vários
Narrator: Vários
Publisher: Podcast
Duration: 316:03:14

More information

Synopsis

Data Skeptic is a data science podcast exploring machine learning, statistics, artificial intelligence, and other data topics through short tutorials and interviews with domain experts.

Show more

Episodes

News Recommendations

02/07/2026 Duration: 46min

News recommendation algorithms influence far more than what stories we click—they can shape our understanding of the world. In this episode, Kyle Polich speaks with Andreea Iana about responsible AI, filter bubbles, multilingual news recommendation, and her open-source NewsRecLib framework for evaluating recommender systems. They explore why bigger models aren't always better and how future recommendation systems can balance personalization with diversity and societal impact.

Listen

Listen
Give Users the Wheel

23/06/2026 Duration: 35min

What if you could simply tell a recommendation system what you want instead of relying on likes, dislikes, and watch history? Kyle Polich talks with Fuyuan Lyu about the DPR framework, which combines large language models and traditional recommender systems to give users direct control over recommendations through natural language. Together they explore how conversational interfaces could transform platforms like YouTube, TikTok, and news feeds while preserving the strengths of modern recommendation algorithms.

Listen

Listen
AutoLike

17/06/2026 Duration: 35min

How can researchers audit recommendation systems when the algorithms are hidden from view? Hieu Le joins Kyle Polich to discuss Auto-Like, a reinforcement learning framework that systematically explores how platforms like TikTok personalize content feeds. The conversation covers recommendation transparency, black-box auditing, and the future of platform accountability.

Listen

Listen
Student Spotlight: Aaron Payne, Data Analyst

01/05/2026 Duration: 25min

Aaron Payne, an MBA student at Georgia Tech studying business analytics and a Senior Insights Analyst at Chick-fil-A, joins Kyle Polich to talk about turning analytics into decisions that matter. They unpack a real-world forecasting project with Comfama in Colombia, including messy data realities, interpretability tradeoffs, and why "data science for good" starts with the people impacted.

Listen

Listen
The Future is Agentic in Recommender Systems

25/04/2026 Duration: 49min

Kyle Polich sits down with Yashar Deldjoo, research scientist and Associate Professor at the Polytechnic University of Bari, to explore how recommender systems have evolved and why trustworthiness matters. They unpack key dimensions of responsible AI, including robustness to adversarial attacks, privacy, explainability, and fairness, and discuss how LLMs introduce new risks like hallucinations. The episode closes with a look at "agentic" recommender systems, where tools and memory shift recommendations from ranked lists to end-to-end task completion.

Listen

Listen
Book Ratings and Recommendations

27/03/2026 Duration: 39min

Goodreads star ratings can be misleading as measures of "book quality," and research from Hannes Rosenbusch suggests that for many professionally published books, differences between readers often matter more than differences between books. The episode also explores how to model reader preferences, why reviews often reveal more about the reviewer than the text, and how LLMs can aid computational literary research while still falling short of human editors in creative writing.

Listen

Listen
Disentanglement and Interpretability in Recommender Systems

10/03/2026 Duration: 30min

Listen

Listen
Collective Altruism in Recommender Systems

27/02/2026 Duration: 54min

Ekaterina (Kat) Fedorova from MIT EECS joins us to discuss strategic learning in recommender systems—what happens when users collectively coordinate to game recommendation algorithms. Kat's research reveals surprising findings: algorithmic "protest movements" can paradoxically help platforms by providing clearer preference signals, and the challenge of distinguishing coordinated behavior from bot activity is more complex than it appears. This episode explores the intersection of machine learning and game theory, examining what happens when your training data actively responds to your algorithm.

Listen

Listen
Niche vs Mainstream

18/02/2026 Duration: 34min

Anas Buhayh discusses multi-stakeholder fairness in recommender systems and the S'mores framework—a simulation allowing users to choose between mainstream and niche algorithms. His research shows specialized recommenders improve utility for niche users while raising questions about filter bubbles and data privacy.

Listen

Listen
Healthy Friction in Job Recommender Systems

02/02/2026 Duration: 26min

In this episode, host Kyle Polich speaks with Roan Schellingerhout, a fourth-year PhD student at Maastricht University, about explainable multi-stakeholder recommender systems for job recruitment. Roan discusses his research on creating AI-powered job matching systems that balance the needs of multiple stakeholders—job seekers, recruiters, HR professionals, and companies. The conversation explores different types of explanations for job recommendations, including textual, bar chart, and graph-based formats, with findings showing that lay users strongly prefer simple textual explanations over more technical visualizations. Roan shares insights from his "healthy friction" study, which tested whether users could distinguish between real AI-generated explanations and randomly generated ones, revealing that participants often used explanations as information sources rather than decision-making tools. The discussion delves into the technical architecture behind these systems, including the use of knowledge graphs b

Listen

Listen
Fairness in PCA-Based Recommenders

26/01/2026 Duration: 49min

In this episode, we explore the fascinating world of recommender systems and algorithmic fairness with David Liu, Assistant Research Professor at Cornell University's Center for Data Science for Enterprise and Society. David shares insights from his research on how machine learning models can inadvertently create unfairness, particularly for minority and niche user groups, even without any malicious intent. We dive deep into his groundbreaking work on Principal Component Analysis (PCA) and collaborative filtering, examining why these fundamental techniques sometimes fail to serve all users equally. David introduces the concept of "power niche users" - highly active users with specialized interests who generate valuable data that can benefit the entire platform. We discuss his paper "When Collaborative Filtering Is Not Collaborative," which reveals how PCA can over-specialize on popular content while neglecting both niche items and even failing to properly recommend popular artists to new potential fans. David

Listen

Listen
Video Recommendations in Industry

26/12/2025 Duration: 38min

In this episode, Kyle Polich sits down with Cory Zechmann, a content curator working in streaming television with 16 years of experience running the music blog "Silence Nogood." They explore the intersection of human curation and machine learning in content discovery, discussing the concept of "algatorial" curation—where algorithms and editorial expertise work together. Key topics include the cold start problem, why every metric is just a "proxy metric" for what users actually want, the challenge of filter bubbles, and the importance of balancing familiarity with discovery. Cory shares insights on why TikTok's algorithm works so well (clean data and massive interaction volume), the crucial role of homepage curation, and how human curators help by contextualizing content, cleaning data, and identifying positive feedback loops that algorithms might miss. The conversation covers practical challenges like measuring "surprise and delight," the content deluge created by democratized creation tools, and why trust in

Listen

Listen
Eye Tracking in Recommender Systems

18/12/2025 Duration: 52min

In this episode, Santiago de Leon takes us deep into the world of eye tracking and its revolutionary applications in recommender systems. As a researcher at the Kempelin Institute and Brno University, Santiago explains the mechanics of eye tracking technology—how it captures gaze data and processes it into fixations and saccades to reveal user browsing patterns. He introduces the groundbreaking RecGaze dataset, the first eye tracking dataset specifically designed for recommender systems research, which opens new possibilities for understanding how users interact with carousel interfaces like Netflix. Through collaboration between psychologists and AI researchers, Santiago's work demonstrates how eye tracking can uncover insights about positional bias and user engagement that traditional click data misses. Beyond the technical aspects, Santiago addresses the ethical considerations surrounding eye tracking data, particularly concerning pupil data and privacy. He emphasizes the importance of questioning assumpti

Listen

Listen
Cracking the Cold Start Problem

08/12/2025 Duration: 39min

In this episode of Data Skeptic, we dive deep into the technical foundations of building modern recommender systems. Unlike traditional machine learning classification problems where you can simply apply XGBoost to tabular data, recommender systems require sophisticated hybrid approaches that combine multiple techniques. Our guest, Boya Xu, an assistant professor of marketing at Virginia Tech, walks us through a cutting-edge method that integrates three key components: collaborative filtering for dimensionality reduction, embeddings to represent users and items in latent space, and bandit learning to balance exploration and exploitation when deploying new recommendations. Boya shares insights from her research on how recommender systems impact both consumers and content creators across e-commerce and social media platforms. We explore critical challenges like the cold start problem—how to make good recommendations for brand new users—and discuss how her approach uses demographic information to create informa

Listen

Listen
Designing Recommender Systems for Digital Humanities

23/11/2025 Duration: 36min

In this episode of Data Skeptic, we explore the fascinating intersection of recommender systems and digital humanities with guest Florian Atzenhofer-Baumgartner, a PhD student at Graz University of Technology. Florian is working on Monasterium.net, Europe's largest online collection of historical charters, containing millions of medieval and early modern documents from across the continent. The conversation delves into why traditional recommender systems fall short in the digital humanities space, where users range from expert historians and genealogists to art historians and linguists, each with unique research needs and information-seeking behaviors. Florian explains the technical challenges of building a recommender system for cultural heritage materials, including dealing with sparse user-item interaction matrices, the cold start problem, and the need for multi-modal similarity approaches that can handle text, images, metadata, and historical context. The platform leverages various embedding techniques an

Listen

Listen
DataRec Library for Reproducible in Recommend Systems

13/11/2025 Duration: 32min

In this episode of Data Skeptic's Recommender Systems series, host Kyle Polich explores DataRec, a new Python library designed to bring reproducibility and standardization to recommender systems research. Guest Alberto Carlo Maria Mancino, a postdoc researcher from Politecnico di Bari, Italy, discusses the challenges of dataset management in recommendation research—from version control issues to preprocessing inconsistencies—and how DataRec provides automated downloads, checksum verification, and standardized filtering strategies for popular datasets like MovieLens, Last.fm, and Amazon reviews. The conversation covers Alberto's research journey through knowledge graphs, graph-based recommenders, privacy considerations, and recommendation novelty. He explains why small modifications in datasets can significantly impact research outcomes, the importance of offline evaluation, and DataRec's vision as a lightweight library that integrates with existing frameworks rather than replacing them. Whether you're benchm

Listen

Listen
Shilling Attacks on Recommender Systems

05/11/2025 Duration: 34min

In this episode of Data Skeptic's Recommender Systems series, Kyle sits down with Aditya Chichani, a senior machine learning engineer at Walmart, to explore the darker side of recommendation algorithms. The conversation centers on shilling attacks—a form of manipulation where malicious actors create multiple fake profiles to game recommender systems, either to promote specific items or sabotage competitors. Aditya, who researched these attacks during his undergraduate studies at SPIT before completing his master's in computer science with a data science specialization at UC Berkeley, explains how these vulnerabilities emerge particularly in collaborative filtering systems. From promoting a friend's ska band on Spotify to inflating product ratings on e-commerce platforms, shilling attacks represent a significant threat in an industry where approximately 4% of reviews are fake, translating to $800 billion in annual sales in the US alone. The discussion delves deep into collaborative filtering, explaining both u

Listen

Listen
Music Playlist Recommendations

29/10/2025 Duration: 52min

In this episode, Rebecca Salganik, a PhD student at the University of Rochester with a background in vocal performance and composition, discusses her research on fairness in music recommendation systems. She explores three key types of fairness—group, individual, and counterfactual—and examines how algorithms create challenges like popularity bias (favoring mainstream content) and multi-interest bias (underserving users with diverse tastes). Rebecca introduces LARP, her multi-stage multimodal framework for playlist continuation that uses contrastive learning to align text and audio representations, learn song relationships, and create playlist-level embeddings to address the cold start problem. A significant contribution of Rebecca's work is the Music Semantics dataset, created by scraping Reddit discussions to capture how people naturally describe music using atmospheric qualities, contextual comparisons, and situational associations rather than just technical features. This dataset, available on Hugging Fac

Listen

Listen
Bypassing the Popularity Bias

15/10/2025 Duration: 34min

Listen

Listen
Sustainable Recommender Systems for Tourism

09/10/2025 Duration: 38min

In this episode, we speak with Ashmi Banerjee, a doctoral candidate at the Technical University of Munich, about her pioneering research on AI-powered recommender systems in tourism. Ashmi illuminates how these systems can address exposure bias while promoting more sustainable tourism practices through innovative approaches to data acquisition and algorithm design. Key highlights include leveraging large language models for synthetic data generation, developing recommendation architectures that balance user satisfaction with environmental concerns, and creating frameworks that distribute tourism more equitably across destinations. Ashmi's insights offer valuable perspectives for both AI researchers and tourism industry professionals seeking to implement more responsible recommendation technologies.

Listen

Listen

|<
<<
>>
>|

page 1 from 31