Apologies, but this page is still under construction…

Traditionally, relevance assessments for expert search have been gathered through self-assessment or based on the opinions of co-workers. In our 2013 CSTA workshop paper, we introduced three benchmark datasets for expert search that use conference workshops for relevance assessment. Our data sets cover entire research domains as opposed to single institutions and provide a larger number of topic-person associations.

This page contains links the test collections used in our submission on benchmarking domain-focused expert search:

  • collection.IR.xml.gz contains the filtered version of the augmented DBLP data set by Tang et al. (2008). It was filtered using a list of curated publications and has disambiguated author IDs matching the topic set included further down on this page, as well as full text for ~55% of all publications.
  • The ACL Anthology Reference Corpus contains the documents for the CL-focused test collection
  • The Semantic Web Conference Corpus contains the documents for the SW-focused test collection (update: unfortunately, this dataset no longer seems to be available)
  • topics.IR.xml contains the topic set for the IR-focused test collection
  • topics.CL.xml contains the topic set for the CL-focused test collection
  • topics.SW.xml contains the topic set for the SW-focused test collection

If you want to use this dataset, please cite the following paper for which these test collections were constructed:

  • Georgeta Bordea, Toine Bogers, Paul Buitelaar. Benchmarking Domain-specific Expert Search Using Workshop Program Committees. In: Proceedings of the 2013 CIKM Workshop on Computational Scientometrics: Theory & Applications, pages 19-24, October 2013