PeRSonAl at ISCA 2020

Thank you for attending the second iteration of the PeRSonAl tutorial. To watch the PeRSonAl tutorial:

Please sign up on this Google sheet.
You can find the recorded online tutorial at this Zoom link.

The program consists of three invited plenary talks and a Q&A session for submitted work. Authors, prior to the real-time webinar, will submit pre-recorded presentations of their talks (see call for participation below). During the Q&A session authors will have a chance to answer questions regarding their work. The real-time webinar will be free and open to all.

Program

Introduction to PeRSonAl

Time: 12:00pm – 12:15pm (EDT)
Presenters: Carole-Jean Wu and Udit Gupta (FAIR/Harvard)
Abstract: Personalized recommender algorithms are deployed widely to power a variety of production use cases. The primary goal of personalized recommender algorithms is to maximize user engagement by balancing immediate click-through with long-term values. Recommender algorithms have witnessed significant advancement in the past few decades, evolving from naive rule-based techniques to deep learning approaches. Significant compute cycles at cloud-scale infrastructures have been devoted to the wide adoption of personalized recommendation. In this talk, we will first examine the evolution of recommender systems and understand the implications on system design and optimization for at-scale recommendation deployment. I will introduce recent works on system and infrastructure development tailored for personalized recommendation at Facebook AI Research. Despite the importance of deep learning-based personalized recommendation, an underwhelming research attention from the system’s community has been invested to optimizing recommendation systems. We hope this tutorial advances innovations in the personalized recommendation space, with close academic-industry collaboration.

Plenary Talk 1: Time, Context and Causality in Recommender Systems

Time: 12:15pm – 12:45pm (EDT)
Presenter: Yves Raimond (Netflix)
Abstract: After a description of the recommendation problem in the context of Netflix, we will focus on one of the most important questions when designing a recommendation algorithm: what makes a good recommendation? A very powerful framing for that question is to focus on predicting the likelihood a user is going to interact with an item. We will discuss recent trends and research under that framing, but also highlight some fundamental limitations. In particular, these algorithms aren’t able to capture the impact a recommendation can have on the outcome, which might lead to a number of issues (e.g. offline/online mismatches, biases and confounds, …). We will then focus on a causal framing to the recommendation problem, as well as recent trends and research in that area.

Plenary Talk 2: Ins and Outs of Using GPUs for Training Recommendation Models

Time: 12:45pm – 1:15pm (EDT)
Presenter: Bilge Acun (FAIR)
Abstract: Use of GPUs has proliferated for machine learning tasks and considered mainstream for many deep learning models. Meanwhile for training state of the art personalization models, which consume the highest number of cycles at Facebook, use of GPUs is not straightforward. GPU performance and efficiency of training personalization models are largely affected by different model configurations such as batch size, dense and sparse features, size of the embedding tables. Furthermore, these models often use large embedding tables which do not fit into limited GPU memory. This talk will explain the intricacies of using GPUs for training ranking and recommendation models.

Plenary Talk 3: Training Massive Scale Deep Learning Ads Systems with GPUs and SSDs

Time: 1:15pm – 1:45pm (EDT)
Presenter: Weijie Zhao (Baidu Research)
Abstract: Neural networks of ads systems usually take input from multiple resources, e.g. query-ad relevance, ad features and user portraits. These inputs are encoded into one-hot or multi-hot binary features, with typically only a tiny fraction of nonzero feature values per example. Deep learning models in online advertising industries can have terabyte-scale parameters that do not fit in the GPU memory nor the CPU main memory on a computing node. For example, a sponsored online advertising system can contain more than $10^{11}$ sparse features, making the neural network a massive model with around 10 TB parameters. In this talk, we introduce a distributed GPU hierarchical parameter server for massive scale deep learning ads systems. We propose a hierarchical workflow that utilizes GPU High-Bandwidth Memory, CPU main memory and SSD as 3-layer hierarchical storage. All the neural network training computations are contained in GPUs. Extensive experiments on real-world data confirm the effectiveness and the scalability of the proposed system. A 4-node hierarchical GPU parameter server can train a model more than 2X faster than a 150-node in-memory distributed parameter server in an MPI cluster. In addition, the price-performance ratio of our proposed system is 4-9 times better than an MPI-cluster solution.

Research Talk Q&A Session

Time: 1:45pm – 2:15pm
Discussion chairs: Bahar Asgari, Ramyad Hadidi, Youngeun Kwon, Liu Ke, Samuel Hsia
Research paper 1: “Mixed Dimension Embedding with Application to Memory-Efficient Recommendation Systems” (Tony Ginart/Stanford University). [Video]
Research paper 2: “A Hands On Tutorial Using DeepRecSys to Optimize At-Scale Neural Recommendation Inference” (Udit Gupta/Harvard University and FAIR, Samuel Hsia/Harvard University). [Video]
Research paper 3: “Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems” (Hao-Jun Michael Shi/Northwestern University) [Video]

Speaker Bios

Call for Participation

Personalized recommendation is the process of ranking and recommending content based on users’ personal preferences. Recommendation algorithms are central to providing personalized search results, marketing strategies, e-commerce product suggestions, and entertainment content. Given the pervasive use of personalized recommendations across many Internet services, state-of-the-art recommendation algorithms are using increasingly more sophisticated machine learning approaches. These advances have led to personalized recommendation algorithms consuming a large fraction, and in many cases the majority, of AI cycles and datacenter capacity. Thus, the unique demands of recommendation algorithms must be met with innovative solutions across the computing stack.

The PeRSonAl tutorial invites submissions across all sub-areas in algorithms, datasets, and systems and hardware related to personalized recommendation. Topics of interest include but are not limited to:

Emerging algorithms for personalized recommendation
Datasets to train and test recommendation algorithms
Specialized systems and hardware
Novel applications of recommendation algorithms
Case studies and prototypes of training and deploying recommendation systems

As the tutorial will be hosted virtually it will comprise pre-recorded, 30-minute, presentations based on authors’ submissions. Submissions can be up to 2 pages (following the same formatting guidelines as the conference). Submissions should be sent to ugupta@g.harvard.edu.

Important Dates

Paper submission deadline: May 7, 2020
Paper notification: May 11, 2020
Pre-recorded presentation deadline: May 26, 2020
PeRSonAl tutorial: May 29, 2020

Organizers

Udit Gupta
Carole-Jean Wu
Gu-Yeon Wei
David Brooks

	Carole-Jean Wu is a Research Scientist at Facebook AI Research. Her research interests are in computer architecture with particular focus on energy- and memory-efficient systems. More recently, her research has pivoted into designing systems for machine learning execution at-scale, such as personalized recommender systems, and mobile deployment. Carole-Jean chairs MLPerf Recommendation Benchmark Advisory Board and co-chairs MLPerf Inference. Carole-Jean holds tenure from ASU and received her M.A. and Ph.D. from Princeton and B.Sc. from Cornell. She is the recipient of the NSF CAREER Award, Facebook AI Infrastructure Mentorship Award, the IEEE Young Engineer of the Year Award, the Science Foundation Arizona Bisgrove Early Career Scholarship, and the Intel PhD Fellowship, among a number of Best Paper awards.
	Yves Raimond is a Director of Machine Learning at Netflix, where he leads a mixed team of researchers and engineers building the next generation of Machine Learning algorithms used to drive the Netflix experience. Before that, he was a Lead Research Engineer in BBC R&D, working on information extraction from Multimedia content. He holds a PhD from Queen Mary, University of London.
	Bilge Acun is a Research Scientist at Facebook AI Research (FAIR). She is working on end-to-end performance optimization of distributed machine learning applications, i.e. computer vision, speech recognition, personalization models on large-scale data-centers with GPUs. She received her Ph.D. degree in 2017 at the Department of Computer Science at Universtiy of Illinois at Urbana-Champaign, advised by Professor Laxmikant V. Kale. Before joining Facebook, she worked at the IBM Thomas J. Watson Research Center as a Research Staff Member.
	Weijie Zhao is a researcher in Baidu Research USA. At Baidu, Weijie investigates exciting problems in scalable machine learning systems, approximate nearest neighbor search, scientific data processing and database systems. Before that, Weijie received his Ph.D degree from University of California, Merced in 2018.