Poznań University of Technology, Poland
Long-tail and missing labels in extreme multi-label classification
In real-world machine learning applications we often encounter a problem of imperfect feedback information. The imperfection can have a form of noise or different type of bias. The labels can be delayed, missing or very sparse. In this talk, we will focus on a problem of extreme multi-label classification (XMLC) which is a task of selecting, for a given instance, a small subset of relevant labels from a very large set of possible labels. XMLC problems are characterized by a long-tailed label distribution, meaning that most of the labels have very few positive instances. Furthermore, relevant labels might be missing in observed training data, since it is nearly impossible to verify all labels when their number is very large. We will discuss different approaches to deal with long-tail and missing labels pointing out their advantages and drawbacks.
Prior to joining Yahoo Research in 2019 Krzysztof Dembczyński was an Assistant Professor at Poznan University of Technology (PUT), Poland. He has received his PhD degree in 2009 and Habilitation degree in 2018, both from PUT. During his PhD studies he was mainly working on preference learning and boosting-based decision rule algorithms. During his postdoc at Marburg University, Germany, he has started working on multi-target prediction problems with the main focus on multi-label classification. Currently, his main scientific activity concerns extreme classification, i.e., classification problems with an extremely large number of labels. He also works on such problems as noisy labels, counterfactual learning, and rare event estimation in such application domains as online advertising and mail security.
His articles have been published at the premier conferences (ICML, NeurIPS, ECML) and in the leading journals (JMLR, MLJ, DAMI) in the field of machine learning. As a co-author he won the best paper award at ECAI 2012 and at ACML 2015. He serves as an Area Chair for ICML, NeurIPS, and ICLR, and as an Action Editor for MLJ.