Synopsis
Data science, big data, artificial intelligence, machine learning… they’re all the rage. In this podcast, Jessi Cisewski-Kehe and Susan Wang, 2 statisticians, give you a perspective on what’s happening in the realm of all things data. Random bantering included. Support this podcast: https://anchor.fm/databytes/support
Episodes
-
#50: Extreme Classification: All You Need Is Some Hash (Functions)
24/01/2020 Duration: 21minIn part 2 of this saga on extreme classification, we get into the weeds on how MACH is able to magically handle such massive classification problems. The title says it all -- hash functions are the magical ingredient. We provide a step-by-step view of how one might come up with the MACH algorithm from first principles. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support
-
#49: Extreme Classification: Going at MACH Speed (Part 1)
17/01/2020 Duration: 16minIn this episode, Dr. Derek Feng drops by to chat about a recent paper on a divide-and-conquer approach (Merged-Averaged Classifiers via Hashing) to massive classification problems. In part 1 (of 2 episodes), we describe the general problem solved by and strategy taken by MACH, wherein the original large classification problem is broken down into smaller-sized classification problems. Next week in the second episode, we talk about more technical details of how the division of labor works, and why it works. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support
-
#48: Where Moneyball Meets Footy
14/12/2019 Duration: 16minWe've long heard about the waves that statistics has made in baseball. But what about soccer? In this episode, we summarize a few applications of statistics in European football (or American soccer). --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support
-
#47: Domoic Acid Testing -- A Crabshoot?
30/11/2019 Duration: 18minDomoic acid has plagued shellfish and other wildlife along the Pacific coastline in recent years. Testing for domoic acid concentration in crabs on a regular basis has become important for determining when crabs and their viscera can be safely consumed. Unlike many other common hypothesis tests, the setup used for domoic acid testing is based on the sample maximum rather than the sample mean. In this episode, we critique the testing methodology. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support
-
#46: Finding Your (Niche) Board Games
08/11/2019 Duration: 12minIn this episode, we discuss how two statisticians used data from BoardGameGeek.com to put together their own board game recommendation engine, specifically designed to stay away from mainstream recommendations. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support
-
#45: Learning Publicly, with Private Data
01/11/2019 Duration: 16minIn this episode, Dr. Derek Feng discusses the general issue of data privacy in the age of big data, including topics of differential privacy and federated learning. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support
-
#44: A Conversation with Jon Krohn
25/10/2019 Duration: 33minWe sit down with Dr. Jon Krohn to chat about his work as a Chief Data Scientist at untapt, his newly published bestseller "Deep Learning Illustrated", and his teaching/research. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support
-
#43: To Google and Back
04/10/2019 Duration: 29minIn this episode, Professor Albert Y. Kim of Smith College describes his post-PhD journey, which included a stint at Google Adwords before academic posts at Reed College, Middlebury College, Amherst College, and Smith College. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support
-
#42: Black in the Box
27/09/2019 Duration: 22minDr. Derek Feng joins us again to discuss the two metrics by which we align all statistical/machine learning methods -- interpretability versus predictive ability. In a world where black box methods reign supreme, what does learning mean? --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support
-
#41: What to do with Outliers
20/09/2019 Duration: 22minGuest Dylan O'Connell joins us today to talk about a recent surprising, but legitimate Democratic primary poll result done by Monmouth University. We discuss different perspectives on how to approach a data point that doesn't fit in with the others. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support
-
#40: Making a DIY ML-Controlled Cat Door
13/09/2019 Duration: 10minOutdoor-cat owners know all too well the unpleasantries of dealing with what the cat dragged in. A self-proclaimed machine learning novice proves that you don't need to be a pro to set up a smart cat door that prevents the cat from bringing prey into your home. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support
-
#39: Rolling in the Deep Patient
06/09/2019 Duration: 25minWe take a deep dive into the poster child for black-box machine learning methods, namely Deep Patient: an unsupervised learning method that uses denoising auto-encoders as the means for extracting salient features in electronic health records, which in turn can then be used to predict health outcomes. We do our best to explain what on earth the previous sentence meant. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support
-
#38: The Misuse of Statistics in Court
30/08/2019 Duration: 11minIn this episode, we talk about how a statistical concept that you would learn about in an introductory course was misused in court. The error led to dire consequences in the case of Sally Clark who was charged in the deaths of two of her children. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support
-
#37: Susan Starts a New Job
23/08/2019 Duration: 14minIn this episode, we talk about Susan's new job as a Data Scientist! She recently transitioned from academia to industry and we discuss her experience with searching for positions, interviewing, and her first few weeks in her new role. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support
-
#36: What's New in Machine Learning Startups
16/08/2019 Duration: 11minIn this episode, we talk about some machine learning startups to pay attention to this year. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support
-
#35: You Look How You Sound
09/08/2019 Duration: 14minDeep learning has been useful for lots of applications when it comes to prediction. Yet another is the use of a short sound clip of speech to predict the face of the speaker. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support
-
#34: Protecting Kids' Digital Privacy
02/08/2019 Duration: 10minIn this episode, we talk about protecting kids' digital privacy. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support
-
#33: Statisticians Hate Post-Hoc Power
26/07/2019 Duration: 09minStatistics is key to demonstrating the effectiveness of new advancements in science and medicine, but when statistical significance is not achieved, is post-hoc power a valid justification? --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support
-
#32: Amazon's 3D Body Scan Study
21/07/2019 Duration: 13minIn this episode, we talk about Amazon's 3D body scan study. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support
-
#31: What Data Visualizations Do You Care About? It's Personal
12/07/2019 Duration: 13minIn this episode, we talk about how data are personal for those in a rural Pennsylvania community. --- Send in a voice message: https://anchor.fm/databytes/message Support this podcast: https://anchor.fm/databytes/support