PeRSonAl at ASPLOS 2020

Thank you for attending the inaugural PeRSonAl tutorial. We are very excited to host this tutorial in order to encourage interdisciplinary research — algorithms, datasets, systems — into efficient and responsible personalized recommendation.

The inaugural PeRSonAl tutorial will be held as an online, real-time webinar on Monday March 23, 2020 from 11:00am EST to 2:00pm EST. You can join the webinar with this link to a Zoom conference. In the meantime, we will upload pre-recorded talks from our invited speakers below and you can join the Slack discussion channel provided by ASPLOS 2020 (under the #tutorial-understanding-system-implications-for-neural-recommendation channel). Finally, we ask that you indicate your intent to participate on this Google sheet.

Program

Please find prepared slides and recordings of all the talks in this Google Drive folder. Contact info.personal.tutorial@gmail.com for any questions.

Welcome to Inaugural PeRSonAl Tutorial!

Time: 11:00am – 11:15am
Presenters: Carole-Jean Wu and Udit Gupta (Facebook AI Research and Harvard)
Abstract: Deep learning-based personalized recommendation systems and algorithms form the building block to produce high-quality content suggestions. As compared to other more classic approaches, such as collaborative filtering, deep learning-based approaches have demonstrated higher quality user engagement because of its ability to exploit a wider range of signals and to model meaningful interactions and histories more effectively. Despite an important class of deep learning workloads that has been widely used in industries for product and advertisement suggestions, research for deep learning-based personalized recommendation is severely under-invested in the system and architecture’s communities (more from our recent article on Deep Learning: It’s Not All About Recognizing Cats and Dogs, ACM SIGARCH Blog). To enable more interdisciplinary research into building efficient and responsible personalized recommendation, this tutorial on PeRSonAl: Personalized Recommendation Systems and Algorithms. PeRSonAl-asplos2020 features an exciting panel of speakers from academia and industry to cover the perspectives from algorithms, datasets, and system design.
Recorded Presentation: [Slides – PDF] [Video – MP4]

Recommendation Science in the Criteo AI Lab: From Practical Applications to Theoretical Research and Back

Time: 11:15am – 11:45am
Presenter: David Rohde (Criteo AI Labs)
Abstract: Criteo runs one of the largest scale recommender systems on the planet with approximately one billion items recommended to over one billion users. Recommendations must also be made in under 50ms. The Criteo AI Lab works on developing new large scale machine learning solutions so that we are able to make the right recommendation to the right user at the right time. This talk will cover the recommendation problem at Criteo and what we mean by ‘Criteo scale’ as well as discussing some of the academic and applied research we do. In particular I will outline how we are questioning the conventional notion of recommendation and are proposing solutions using the RecoGym simulation environment. I will also outline some of the new approaches to recommendation including the BLOB model which combines organic and bandit signals, as well as the doubly robust formulation which extends counterfactual risk minimization to include pessimism allowing us to mitigate optimizer’s curse. Finally I also explain how we can use online KNN so we can deploy as well as train our deep models.
Recorded Presentation: [Slides – PPT] [Video – MP4]

Design Implications of Memory Systems and Near-Memory Processing for Personalized Recommendation

Time: 11:45am – 12:15pm
Presenter: Xuan (Silvia) Zhang (Washington University at St. Louis)
Abstract: Personalized recommendation systems leverage deep learning models and account for the majority of data center AI cycles. Their performance can be dominated by memory-bound sparse embedding operations with unique irregular memory access patterns that pose a fundamental challenge to accelerate. In this talk, I will share initial results from our in-depth characterization of production-grade recommendation models. They show that embedding operations with high model-, operator- and data-level parallelism lead to memory bandwidth saturation, limiting recommendation inference performance. I will further discuss the design implications of such embedding access characteristics impose in order to gain more insights on potential optimizations of memory system design. Finally, I will explore the novel approach of applying near-memory processing to recommendations systems and lay out various practical strategies regarding its scalable implementation in production datacenter environments.
Recorded Presentation: [Slides – PDF]

Training Deep Learning Recommendation Models

Time: 12:15pm – 12:45pm
Presenter: Maxim Naumov (Facebook)
Abstract: Recommendation systems form the backbone of most internet services – search engines use recommendation to order results, social networks to suggest friends and content, shopping websites to suggest purchases, and video streaming services to recommend movies. In this talk we make a brief overview of the evolution of recommendation systems resulting in state-of-the-art deep learning recommendation models (DLRMs). Unlike their computer vision and natural language processing counterparts, we show that DLRMs exercises all parts of the HW infrastructure – memory, compute, storage and network. They are well suited for asynchronous and synchronous distributed training, the latter requiring high performance interconnects with optimal topology and efficient fabric to support All-to-all and All-reduce communication patterns. We finish with a discussion of the open-source implementation of DLRM in Pytorch framework and how it can be leveraged for HW/SW co-design.
Recorded Presentation: [Slides – PDF]

Building Production-ready Recommendation Systems at Scale with Microsoft Recommenders

Time: 12:45pm – 1:15pm
Presenter: Andreas Argyriou (Microsoft)
Abstract: Recent decades have witnessed a great proliferation of recommendation systems across many business verticals. From earlier algorithms such as similarity based collaborative filtering to the latest deep neural network based methods, recommendation technologies have evolved dramatically. As a result, it is often challenging for practitioners to select and customize the optimal algorithms for specific business scenarios. Moreover, the lifecycle of developing a recommendation system is broader than training an algorithm and includes operations such as data preprocessing, model evaluation, system operationalization etc. In this talk, I will review the key tasks in building recommendation systems and will discuss best practices on how to make recommendation systems accessible to every organization and the wider community. The talk will be based on extensive experience in productization of recommendation systems in a variety of real-world application domains, and on an open source GitHub repository, Microsoft/Recommenders, our team has developed. This repository is designed to help data scientists quickly grasp basic concepts in a hands-on fashion and has gained significant visibility within the community with 7,000 stars on GitHub. We believe that the best practice examples shared in this repository will help developers / scientists / researchers to quickly build production-ready recommendation systems as well as to prototype and evaluate novel ideas using the provided utility functions.
Recorded Presentation: [Slides – PDF] [Video – MP4]

At-scale Inference for Recommendation Systems

Time: 1:15pm – 1:45pm
Presenter: Udit Gupta (Harvard University/Facebook AI Research)
Abstract: The widespread application of deep learning has changed the landscape of computation in the data center. In particular, personalized recommendation for content ranking is now largely accomplished leveraging deep neural networks. This talk will cover the unique system and hardware challenges, compared to vision and NLP use cases, of deploying recommendation models for at-scale inference. Throughout the talk we will highlight new opportunities for future systems and hardware solutions that are customized specifically for deep learning recommendation inference.
Recorded Presentation: [Slides – PDF] [Video – MP4]

Concluding Remarks

Time: 1:45pm – 2:00pm
Presenters: Carole-Jean Wu and Udit Gupta (Facebook AI Research and Harvard)
Abstract: Thank you for attending the inaugural PeRSonAl webinar. We would also like to thank our speakers for sharing their work and perspectives on personalized recommendation systems. The slidedeck below provides a list of open-source materials that were shared throughout the webinar. Finally, PeRSonAl will be hosted in conjunction with ISCA 2020!
Recorded Presentation: [Slides – PDF]

Speakers Bio

Frequently Asked Question

Here we summarize frequently asked questions by attendees during the online webinar.

Question: Can you recommend tutorials or introductory materials on probability math to better understand recommendation algorithms?
Answer: Chris Bishop’s book is a good one, also David Mackay’s book is worth a read

Question: Are algorithms still changing rapidly for recommendation systems?.
Answer: Yes, it is still an open question for what the best algorithm is.

Question: Regarding RecNMP – Are all tasks and latency measurements about inference? How do the trends hold for training?
Answer: Yes, the tasks and latency measurements described are for inference. It is an interesting direction to apply RecNMP to training recommendation as well where we would have to consider different memory demands and datatype precisions.

Question: Regarding RecNMP – What are typical datatypes or precisions for the operations supported by RecNMP?
Answer: The data type of sparse-lengths-family operators have two types: (1) float-point 32 in sparselengthssum and (2) quantized-int8 in SparselengthsSumFused8BitRowwise. Both types are supported in RecNMP.

Question: Regarding RecNMP – In your production workload study, were you able to observe the impact of Rank-NMP on memory access by non recommendation operations (I don’t mean FC – other non-AI memory access)
Answer: Rank-NMP is supporting the Gather-Reduce compute patterns. Since we mainly focus on recommendation workloads, RecNMP is mainly targeting embedding tables. But if the Gather-Reduce pattern exists, the benefits of Rank-NMP could be observed, for example, MapReduce may also have the similar Gather-Reduce computation.

Question: Are algorithms and modeling techniques still evolving rapidly for personalized recommendation? What do you think is the time cadence for taking the latest recommendation algorithms from research to production deployment?
Answer: Recommendation systems seem to have accelerated more recently than the advancement cadence 10 years ago. There is a lot of work using DNNs and there is diversity of the use cases in practical scenarios. The choice of recommendation algorithms is highly dependent on use cases — the type of data, KPIs, computational requirement and such. It is an open challenge to identify and be able to take the most appropriate algorithm from research to production deployment.

Question: What is the main difference between task and data level parallelism for recommendation inference serving?
Answer: Task level parallelism involves running queries across hardware resources (i.e., CPU cores) while data level parallelism is increasing the batch level parallelism, or size of individual query, for a single hardware unit (i.e., CPU core). In DeepRecSys we show that in order to maximize latency-bounded throughput, a key optimization target for neural recommendation inference, it is crucial to carefully balance task versus data level parallelism.

	Udit Gupta is a 4th year PhD student in CS at Harvard University and received his B.S. in ECE from Cornell University in 2016. His research interests focus on improving the performance and energy efficiency of emerging applications in computer systems and architecture by co-designing solutions across the computing stack. His recent work explores the characterization and optimization of at-scale deployment of deep learning based personalized recommendation systems.
	Carole-Jean Wu is a Research Scientist at Facebook AI Research. Her research interests are in computer architecture with particular focus on energy- and memory-efficient systems. More recently, her research has pivoted into designing systems for machine learning execution at-scale, such as personalized recommender systems, and mobile deployment. Carole-Jean chairs MLPerf Recommendation Benchmark Advisory Board and co-chairs MLPerf Inference. Carole-Jean holds tenure from ASU and received her M.A. and Ph.D. from Princeton and B.Sc. from Cornell. She is the recipient of the NSF CAREER Award, Facebook AI Infrastructure Mentorship Award, the IEEE Young Engineer of the Year Award, the Science Foundation Arizona Bisgrove Early Career Scholarship, and the Intel PhD Fellowship, among a number of Best Paper awards.
	Maxim Naumov is a research scientist at Facebook. His interests include deep learning, parallel algorithms and numerical methods. In the past, he held different positions at Nvidia Research, Emerging Applications and Platform teams. He has also worked at Intel Corporation Microprocessor Technology and Computational Software Labs. Maxim received his PhD in computer science (with specialization in computational science and engineering) in 2009 and BS in computer science and mathematics in 2003 from Purdue University – West Lafayette
	David Rohde is a research scientist at Criteo. His research focuses on Bayesian statistics and causality especially applied to marketing problems. He is one of the original creators of the RecoGym environment and he regularly presents at machine learning conferences such as the REVEAL workshop and the causality workshops at NeurIPS 2018 and 2019. David has numerous publications in applied and theoretical aspects of machine learning, from topics including variational approximations, causal inference, doubly intractable models to astronomy, analysing massive public transport datasets and evaluating recommender systems.
	Andreas Argyriou is a Senior Data Scientist with the Azure Global Commercial Industry team at Microsoft. Before joining Microsoft, he was a Senior Data Scientist at Kayak.com and held various positions in academic research. He has published work on multitask and kernel-based learning, sparse regularization and convex optimization, in top machine learning conferences and journals. He obtained a PhD in machine learning from University College London and a BSc and MEng in computer science from MIT. His current work focuses on algorithms for machine learning and their applications to real-life use cases across industry sectors.
	Dr. Xuan ‘Silvia’ Zhang (S’08, M’15) is an Assistant Professor in the Preston M. Green Department of Electrical and Systems Engineering at Washington University in St. Louis. She is currently visiting Facebook AI Research (FAIR), System for Machine Learning (SysML) team as a research scientist, investigating the fundamental implications of diverse AI workload on memory system design. Before joining Washington University, she was a Postdoctoral Fellow in Computer Science at Harvard University. She received her B. Eng. degree in Electrical Engineering from Tsinghua University in China, and her MS and PhD degree in Electrical and Computer Engineering from Cornell University. She works across the fields of VLSI, computer architecture, and autonomous cyber-physical systems. Her research interests include hardware/software co-design for efficient machine learning and artificial intelligence, adaptive power and resource management for autonomous systems, and hardware security primitives in analog and mixed-signal domain. Dr. Zhang is the recipient of NSF CAREER Award in 2020, DATE Best Paper Award in 2019, and ISLPED Design Contest Award in 2013, and her work has also been nominated for Best Paper Award at DATE 2019 and DAC 2017.