Abstract:
Data summarization is an effective approach to dealing with the “big data” problem. While data summarization problems traditionally have been studied is the streaming model, the focus is starting to shift to distributed models, as distributed/parallel computation seems to be the only viable way to handle today’s massive data sets. In this talk, I will show how some fundamental data summaries can be computed over distributed data with low communication costs.
Biography:
Ke YI is an Associate Professor in the Department of Computer Science and Engineering, Hong Kong University of Science and Technology. He obtained his B.E. from Tsinghua University and Ph.D. from Duke University, in 2001 and 2006 respectively, both in computer science. Before joining HKUST, he was a researcher in the database department at AT&T Labs. His research focus is on massive data algorithms and their applications in database systems.