Renmin University of China has won the 2024 ACM SIGMOD Research Highlight Award. This award is presented by the Special Interest Group on Management of Data (SIGMOD) of the Association for Computing Machinery (ACM) to recognize projects that exemplify core database research.
The first author of the award-winning paper, titled “Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data Integration,” is Jianhong Tu, a master student of the class of 2020 from the School of Information at Renmin University of China. Her advisors are Prof. Ju Fan and Prof. Xiaoyong Du from the same institution. This marks the first time Renmin University of China has won this award as the primary institution. The paper, which was originally published at SIGMOD 2023, is also a collaborative effort with the Beijing Big Data Center, and the tools developed based on this research have been successfully utilized in real-world applications in Beijing.
The SIGMOD Research Highlight Award, which was established in 2016, aims to showcase a set of research projects that exemplify core database research. In particular, these projects address an important problem, represent a definitive milestone in solving the problem, and have the potential of significant impact. The initiative of the SIGMOD Research Highlights also aims to make the selected works widely known in the database community, to the industry partners, and potentially to the broader ACM community. The selection process for the award is highly selective. The award selection committee requested nominations from all top database conferences held in 2023 as well as from the SIGMOD community. Then the award selection committee discussed all nominated papers and finally selected a subset of the papers that best met the selection criteria. Only about ten papers receive this award annually. Renmin University of China is the third mainland university to win this award, following Tsinghua University and Shanghai Jiao Tong University.
Paper Overview:
Data integration, a fundamental problem in data management, plays a crucial role in many applications, such as big data analytics, knowledge graph construction, and data preparation for AI systems. Data matching, is a most challenging problem in data integration, aiming to determine the semantic equivalence of heterogeneous data from multiple sources. Over the past four decades, various fields such as databases, AI, semantic web, and data mining have studied data matching from different perspectives, introducing a varity of tasks like schema matching, entity matching, ontology alignment, column type annotation, etc. However, current researches mainly focus on designing specialized models for single matching task or dataset, lacking a general solution for different types of matching tasks. This paper introduces Unicorn, a unified model supporting multiple matching tasks, with the advantage of integrating diverse matching tasks into an end-to-end model. Its multi-task learning strategy enables knowledge sharing and mutual enhancement among tasks. Experiments across seven common data matching tasks show that Unicorn outperforms task-specific and dataset-specific models in terms of both matching accuracy and generalization capability.
Introduction to the Research Group:
Professor Ju Fan's research group, affiliated with the School of Information and the Key Laboratory of Data Engineering and Knowledge Engineering of the Ministry of Education at Renmin University of China, has conducted extensive research on data management under the guidance of Professor Xiaoyong Du. The research interests of the research group are in general area of data management, and the current research focuses on building next-generation data preparation systems. In recent years, the team has published more than 60 papers at top conferences/journals, including SIGMOD, VLDB, ICDE and VLDB Journal, and has actively colloaborated with industrial partners, such as Beijing Big Data Center, Huawei, Alibaba and Tencent.