时间:5月29日 下午1:30 - 3:00
地点:信息楼四层学术报告厅
报告人:Dr.Xiangliang Zhang
Title:
Large-scale Streaming Data Mining: Model Design and Application
Abstract:
In this “big-data" era, vast amount of continuously arriving data can be found in various fields, such as sensor networks, network management, web and financial applications. Keeping terabytes to petabytes of data in memory is unacceptable. Therefore, the development of algorithms for processing large-scale streaming data instantaneously becomes highly important. In this talk, the main problems of streaming data analysis will be discussed, including online data stream clustering, online linear approximation, online dynamic density estimation, online change detection etc. Novel models proposed for solving these problems will be introduced, more specifically, 1) StrAP, an online clustering algorithm, which is able to summarize streaming data and extract the main patterns with quasi-linear complexity; 2) online Piecewise Linear Representation (PLR), which constructs a number of consecutive line segments to approximate the data stream; 3) KDE-Track, an online density estimator, which builds models for characterizing the dynamic density of the data stream; and 4) CD-Area, an online change detection framework, which reports changes happened in data streams efficiently with low false alarm and delay. The applications to sensor networks and grid/cloud resource management will be demonstrated.
Bio:
Dr. Xiangliang Zhang is an Assistant Professor of Computer Science and directs the Machine Intelligence and kNowledge Engineering (http://mine.kaust.edu.sa) group at King Abdullah University of Science and Technology (KAUST), Saudi Arabia. Prior to joining KAUST, she was a European ERCIM research fellow in Norwegian University of Science and Technology, Norway, in 2010. She earned her Ph.D. degree in computer science from INRIA-Universite Paris-Sud, France, in July 2010. She received M.S. and B.S. degrees from Xi’an Jiaotong University, China, in 2006 and 2003, respectively. Dr. Zhang's research mainly focuses on learning from complex and large-scale streaming data.