PODS 2023: Keynote Talk
Databases as Graphs: Predictive Queries for Declarative Machine Learning
Speaker: Jure Leskovec (Stanford)
Abstract
The era of intelligent systems and applications is here and AI has been disrupting science and industry. The speed of change has been breathtaking. However, building AI-based solutions is hard, training Machine Learning models and putting them in production takes highly-skilled teams months if not years. There is a real need to drastically simplify machine learning workflow and bring AI closer to users, and make it accessible to a wide range of audiences. In this talk, I present Kumo, a no-code machine learning platform that enables data scientists to solve a wide range of machine learning problems over relational tables in a database in a simple declarative way. Kumo provides a SQL-like Predictive Query language that allows for declarative specification of machine learning problems in a wide array of applications from sales and marketing to customer retention and recommender systems. The key insight in Kumo is that relational schema can be represented as a heterogeneous hypergraph. Such graphs are emerging as an abstraction to represent complex data and Deep Graph Neural Networks (GNN) can then be used to learn optimal feature representation for any entity of interest. Automatic learning to encode graph structure into low-dimensional embeddings brings several benefits: (1) automatic learning from the entire data spread across multiple relational tables (2) no manual feature engineering as the system learns optimal embeddings; (3) state-of-the-art predictive performance. Building such a distributed system for GNN training and inference poses several interesting algorithmic and data processing challenges, which we address by innovative machine learning methods and careful algorithm/architecture codesign. Kumo has already been successfully deployed at several major companies.