Dr Julian Shun, Massachusetts Institute of Technology
14 January 2021
Abstract: This talk presents new parallel algorithms for density-based
spatial clustering (DBSCAN) on point sets and structural clustering
(SCAN) on graphs, two problems that have received significant
attention due to their applicability in a variety of data analysis
tasks.
Existing parallel algorithms for DBSCAN require much more work than
their sequential counterparts, making them inefficient for large
datasets. We bridge the gap between theory and practice of parallel
DBSCAN by presenting new parallel algorithms for Euclidean exact
DBSCAN and approximate DBSCAN that match the work bounds of their
sequential counterparts, and are highly parallel (polylogarithmic
depth). For graphs, we present the first parallel index-based SCAN
algorithm, based a recent sequential algorithm, which enables users to
efficiently explore many different parameter settings for cluster
generation. Our parallel algorithm has a better work bound than the
sequential algorithm, and achieves logarithmic depth. We also apply
locality-sensitive hashing to design a novel approximate SCAN
algorithm and prove guarantees for its clustering quality. We present
optimized implementations of our algorithms which achieve good
parallel scalability and outperform existing parallel implementations
by up to several orders of magnitude.
Bio: Julian Shun is the Douglas T. Ross Career Development Assistant
Professor of Software Technology in EECS and CSAIL at MIT. Prior to
coming to MIT, he was a Miller Research Fellow at UC Berkeley. He
received his Ph.D. from Carnegie Mellon University and his B.A. from
UC Berkeley. His research focuses on the theory and practice of
parallel algorithms and programming frameworks. He has received the
NSF CAREER Award, DOE Early Career Award, ACM Doctoral Dissertation
Award, CMU School of Computer Science Doctoral Dissertation Award,
Facebook Graduate Fellowship, Google Faculty Research Award, and best
paper awards at PLDI, SPAA, and DCC.