Filter by type:

Sort by year:

E-service Description, Discovery and Invocation: Issues, Tools and a Prototype

Master's thesis
Toine Bogers
Master's thesis, Tilburg University, September 2001

No abstract available.

Who’s who in E-service Land? Requirements and Tools for Developing E-services

Technical report
Toine Bogers, Bart Orriëns
CentER AR, Tilburg University, October 2001

No abstract available.

Dutch Named Entity Recognition: Optimizing Features, Algorithms, and Output

Master's thesis
Toine Bogers
Master's thesis, Tilburg University, September 2004

Abstract

Named entity recognition is a subproblem of information extraction and involves processing structured and unstructured documents and identifying expressions that refer to people, places, organization and companies, and so forth. For humans, named entity recognition is intuitively simple. Many named entities are proper names and have initial capital letters and can easily be recognized that way. Memorizing (lists of) names can assist the human recognition process in case of ambiguity.

However, this large amount of ambiguity in natural language makes it difficult to attain human levels of recognition performance. In this thesis we investigate how to optimize the performance of named entity recognition for Dutch texts: what are the best indicators of the named entity type and what approaches maximize the generalization performance? In order to determine the best features for named entity recognition we extract a large number of potentially useful features and performed feature selection experiments using the SFFS algorithm to discover the suboptimal feature sets. To investigate the best approaches to named entity recognition, we select two popular machine learning approaches, memory-based learning and maximum entropy modeling, for our experiments. We also investigate the benefits of splitting the recognition process into separate identification and classification phases as opposed to a one-step process. Finally, we examine the influence of seedlist features and classifier stacking on the generalization performance and attempt to boost this performance by using handcrafted error-correcting rules, essentially creating a hybrid approach with the emphasis on machine learning but with elements from handcrafted approaches.

Morphological features (prefixes and suffixes) turn out to be good indicators of named entity type as well as orthographic features that represent the capitalization characteristics of a word. Seedlist features and stacking features are also very helpful albeit that the actual improvement is highly dependent on the classification algorithm used. The best approach to named entity recognition is a one-step approach using each and every feature in combination with the memory-based learner which outperformed the maximum entropy algorithm in most of the experiments. Nevertheless, splitting the recognition process into separate identification and classification phases appears to be promising as well. Using handcrafted rules to correct frequent errors made by the classifier also turns out to be a successful technique for boosting the generalization performance on the task of named entity recognition.

Applying Spelling Error Correction Techniques for Improving Semantic Role Labelling

Conference paper
Erik Tjong Kim Sang, Sander Canisius, Antal van den Bosch, Toine Bogers
In: Proceedings of the Ninth Conference on Natural Language Learning (CoNLL-2005), pages 229-232, June 2005

Abstract

This paper describes our approach to the CoNLL-2005 shared task: semantic role labelling. We do many of the obvious things that can be found in the other submissions as well. We use syntactic trees for deriving instances, partly at the constituent level and partly at the word level. On both levels we edit the data down to only the predicted positive cases of verb-constituent or verb-word pairs exhibiting a verb-argument relation, and we train two next-level classifiers that assign the appropriate labels to the positively classified cases. Each classifier is trained on data in which the features have been selected to optimize generalization performance on the particular task. We apply different machine learning algorithms and combine their predictions.

Authoritative Re-ranking in Fusing Authorship-based Subcollection Search Results

Workshop paper
Toine Bogers, Antal van den Bosch
In: DIR 2006: Proceedings of the 6th Belgian-Dutch Information Retrieval Workshop, Enschede, pages 49-55, March 2006

Abstract

We examine the use of authorship information to divide IR test collections into subcollections and we apply techniques from the field of distributed information retrieval to enhance the baseline search results. We base an estimate of an author’s expertise on the content of his documents and use this knowledge to construct rankings of the different author subcollections for each query. We go on to demonstrate that these rankings can then be used to re-rank baseline search results and improve performance significantly. We also perform experiments in which we base expertise ratings only on first authors or on all except the final authors and find that these limitations do not further improve our re-ranking method.

Authoritative Re-ranking of Search Results

Conference paper
Toine Bogers, Antal van den Bosch
In: ECIR 2006: Proceedings of the 28th European Conference on Information Retrieval, Lecture Notes in Computer Science, vol. 3936, Springer Verlag, Berlin, pages 519-522, April 2006

Abstract

We examine the use of authorship information in information retrieval for closed communities by extracting expert rankings for queries. We demonstrate that these rankings can be used to re-rank baseline search results and improve performance significantly. We also perform experiments in which we base expertise ratings only on first authors or on all except the final authors, and find that these limitations do not further improve our re-ranking method.

Dependency Parsing by Inference over High-recall Dependency Predictions

Conference paper
Sander Canisius, Toine Bogers, Antal van den Bosch, Jeroen Geertzen, Erik Tjong Kim Sang
In: Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X), June 2006

Abstract

As more and more syntactically-annotated corpora become available for a wide variety of languages, machine learning approaches to parsing gain interest as a means of developing parsers without having to repeat some of the labor-intensive and language-specific activities required for traditional parser development, such as manual grammar engineering, for each new language. The CoNLL-X shared task on multi-lingual dependency parsing (Buchholz et al., 2006) aims to evaluate and advance the state-of-the-art in machine learning-based dependency parsing by providing a standard benchmark set comprising thirteen languages. In this paper, we describe two different machine learning approaches to the CoNLL-X shared task.

À Propos: Pro-Active Personalization for Professional Document Writing (long abstract)

Conference paper
Toine Bogers
In: IIiX 2006: Proceedings of the First IIiX Symposium on Information Interaction in Context, page 303, October 2006

Abstract

The goal of the À Propos project is to develop an IMA that supports users in the daily process of writing professional documents, such as scientific articles and technical or business reports. It aims to reduce the time spent searching for information by pro-actively searching for relevant, personalized, and trustworthy information.

À Propos focuses on developing methods for generating search profiles that enable effective, trustworthy, and high-precision information retrieval with regard to the user’s current information need. This information need is influenced not only by the document the user is writing, but also by the context: personal characteristics and those of the user’s workgroup.

Search profiles are generated on the basis of a collection of documents previ- ously written by the user and his or her workgroup. The profiles must also be able to adapt to the information needs of individual users and of their workgroup. The À Propos agent integrates these search profiles with a parallel interface to public domain and proprietary internal search engines, as well as the user’s own pool of documents. The final step is fusing and filtering the search results from all the different sources.

Expertise Classification: Collaborative Classification vs. Automatic Extraction

Workshop paper
Toine Bogers, Willem Thoonen, Antal van den Bosch
In: Proceedings of the 17th ASIS&T SIG/CR workshop on Social Classification, November 2006

No abstract available.

 

What a Proactive Recommendation System Needs: Relevance, Non-Intrusiveness, and a New Long-Term Memory

Conference paper
Mari Carmen Puerta Melguizo, Toine Bogers, Anita Deshpande, A. van den Bosch
In: Proceedings of ICEIS 2007, 2007

Abstract

The goal of the project À Propos is to develop a proactive, just-in-time recommendation system for professional writers. While authors are writing, the proactive system searches for relevant information to what is being written, and presents this information to the writers in a manner that is perceived as timely, non-intrusive, and trustworthy. In this paper we present our ideas and the first steps performed in order to reach this goal. Writing a professional document is a complex and highly demanding task that can be seriously affected by interruptions from the environment. Consequently, a proactive system should be 1) able to present highly relevant information consequently, 2) able to identify in what stage of writing the author is involved, and what are the moments in which information needs are more important and less disruptive, and 3) serve as an external long-term memory for the writer. In this paper we describe the steps and first results of our project À Propos in order to develop a proactive recommendation system that covers these goals.

UvT Expert Collection documentation

Technical report
Toine Bogers, Krisztian Balog
ILK Research Group Technical Report Series, no. 07–06, 2007

Abstract

The UvT Expert collection is based on the Webwijs (“Webwise”) system developed at Tilburg University (UvT) in the Netherlands. Webwijs is a publicly accessible database of UvT employees involved in research or teaching. Currently, Webwijs contains information about 1168 experts, each of whom has a page with contact information and, if made available by the expert, a research description and a list of publications. In addition, each expert can select expertise areas from a list of 1491 topics and can suggest new topics that need to be approved by the Webwijs editor. Each topic has a separate page that shows all experts associated with that topic and, if available, a list of related topics. The majority of the collection was crawled in October 2006 and each section lists when the crawling took place exactly.

Broad Expertise Retrieval in Sparse Data Environments

Conference paper
Krisztian Balog, Toine Bogers, Leif Azzopardi, Maarten de Rijke, Antal van den Bosch
In: SIGIR ‘07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, pages 551-558, July 2007

Abstract

Expertise retrieval has been largely unexplored on data other than the W3C collection. At the same time, many intranets of universities and other knowledge-intensive organisations offer examples of relatively small but clean multilingual expertise data, covering broad ranges of expertise areas. We first present two main expertise retrieval tasks, along with a set of baseline approaches based on generative language modeling, aimed at finding expertise relations between topics and people. For our experimental evaluation, we introduce (and release) a new test set based on a crawl of a university site. Using this test set, we conduct two series of experiments. The first is aimed at determining the effectiveness of baseline expertise retrieval methods applied to the new test set. The second is aimed at assessing refined models that exploit characteristic features of the new test set, such as the organizational structure of the university, and the hierarchical structure of the topics in the test set. Expertise retrieval models are shown to be robust with respect to environments smaller than the W3C collection, and current techniques appear to be generalizable to other settings.

Comparing and Evaluating Information Retrieval Algorithms for News Recommendation

Conference paper
Toine Bogers, Antal van den Bosch
In: RecSys ‘07: Proceedings of the 2007 ACM Conference on Recommender Systems, pages 141-144, October 2007

Abstract

In this paper, we argue that the performance of content-based news recommender systems has been hampered by using relatively old and simple matching algorithms. Using more current probabilistic retrieval algorithms results in significant performance boosts. We test our ideas on a test collection that we have made publicly available. We perform both binary and graded evaluation of our algorithms and argue for the need for more graded evaluation of content-based recommender systems.

Using Citation Analysis for Finding Experts in Workgroups

Workshop paper
Toine Bogers, Klaas Kox, Antal van den Bosch
In: DIR 2008: Proceedings of the 8th Belgian-Dutch Information Retrieval Workshop, pages 21-28, April 2008

Abstract

We compare expert finding approaches that use and combine different types of expertise evidence: content-based expert finding using academic papers, and expert finding using a social citation network between the documents and authors. We evaluate our approaches on a test collection that represents the research output of a typical average-sized academic workgroup. We find that expert finding using static rankings achieves the same performance as a query-dependent approach. Of the different approaches, the most effective method of performing expert finding in an academic workgroup is ranking workgroup members by citation indegree.

 

A Personalized Recommender System for Writing in the Internet Age

Workshop paper
Mari Carmen Puerta Melguizo, Olga Muñoz Ramos, Lou Boves, Toine Bogers, A. van den Bosch
In: Proceedings of the LREC 2008 workshop on Natural Language Processing Resources, Algorithms, and Tools for Authoring Aids, pages 21-28, May 2008

Abstract

Writing is a complex task and several computer systems have been developed in order to support writing. Most of these systems, however, are mainly designed with the purpose of supporting the processes of planning, organizing and connecting ideas. In general, these systems help writers to formulate external visual representations of their ideas and connections of the main topics that should be addressed in the paper, sequence of the sections, etc. With the advent of the world wide web, writing and finding information for the written text has become increasingly intertwined. Consequently, it is necessary to develop systems able to support the task of finding relevant information during writing, without interfering with the writing process proper. In this paper we present the Proactive Recommender System: À Propos. This system is being developed in order to support writers in the difficult task of finding appropriate relevant information during writing. We raise the question whether the tendency to interleave (re)search and writing implies a need for developing more comprehensive models of the cognitive processes involved in writing scientific and policy papers.

 

Integrating Contextual Factors into Topic-centric Retrieval Models for Finding Similar Experts

Workshop paper
Katja Hofmann, Krisztian Balog, Toine Bogers, Maarten de Rijke
In: Proceedings of ACM SIGIR 2008 Workshop on Future Challenges in Expert Retrieval, pages 29-36, July 2008

Abstract

Expert finding has been addressed from multiple viewpoints, including expertise seeking and expert retrieval. The focus of expertise seeking has mostly been on descriptive or predictive models, for example to identify what factors affect human decisions on locating and selecting experts. In expert retrieval the focus has been on algorithms similar to document search, which identify topical matches based on the content of documents associated with experts.

We report on a pilot study on an expert finding task in which we explore how contextual factors identified by expertise seeking models can be integrated with topic-centric retrieval algorithms and examine whether they can improve retrieval performance for this task. We focus on the task of similar expert finding: given a small number of example experts, find similar experts. Our main finding is that, while topical knowledge is the most important factor, human subjects also consider other factors, such as reliability, up-to-dateness, and organizational structure. We find that integrating these factors into topical retrieval models can significantly improve retrieval performance.

Efficient Context-Sensitive Word Completion for Mobile Devices

Conference paper
Antal van den Bosch, Toine Bogers
In: MobileHCI 2008: Proceedings of the 10th International Conference on Human-Computer Interaction with Mobile Devices and Services, IOP-MMI special track, pages 465-470, September 2008

Abstract

Word completion is a basic technology for reducing the effort involved in text entry on mobile devices and in augmentative communication devices, where efficiency and ease of use are needed, but where a low memory footprint is also required. Standard solutions compress a lexicon into a suffix tree with a small memory footprint and high retrieval speed. Keystroke savings, a measurable correlate of text entry effort gain, typically improve when the algorithm would also take into account the previous word; however, this comes at the cost of a large footprint. We develop two word completion algorithms that encode the previous word in the input. The first algorithm utilizes a character buffer that includes a fixed number of recent keystrokes, including those belonging to previous words. The second algorithm includes the complete previous word as an extra input feature. In simulation studies, the first algorithm yields marked improvements in keystroke savings, but has a large memory footprint. The second algorithm can be tuned by frequency thresholding to have a small footprint, and be less than one order of magnitude slower than the baseline system, while its keystroke savings improve over the baseline.

Using Language Models for Spam Detection in Social Bookmarking

Workshop paper
Toine Bogers, Antal van den Bosch
In: Proceedings of 2008 ECML/PKDD Discovery Challenge Workshop, pages 1-12, September 2008

Abstract

This paper describes our approach to the spam detection task of the 2008 ECML/PKDD Discovery Challenge. Our approach focuses on the use of language models and is based on the intuitive notion that similar users and posts tend to use the same language. We compare using language models at two different levels of granularity: at the level of individual posts, and at an aggregated level for each user separately. To detect spam users in the system, we let the users and posts that are most similar to incoming users and their posts determine the spam status of those new users. We first rank all users in the system by KL-divergence of the language models of their posts—separately and combined into user profiles—and the language model of the new post or user. We then look at the spam labels assigned to the most similar users in the system to predict a spam label for the new user. We evaluate on a snapshot of the social bookmarking system BibSonomy made available for the Discovery Challenge. Our approach achieved an AUC score of 0.9784 on an internal validation set and an AUC score of 0.9364 on the official test set of the Discovery Challenge.

Recommending Scientific Articles using CiteULike

Conference paper
Toine Bogers, Antal van den Bosch
In: RecSys ‘08: Proceedings of the 2008 ACM Conference on Recommender Systems, pages 287-290, October 2008

Abstract

We describe the use of the social reference management website CiteULike for recommending scientific articles to users, based on their reference library. We test three different collaborative filtering algorithms, and find that user-based filtering performs best. A temporal analysis of the data indexed by CiteULike shows that it takes about two years for the cold-start problem to disappear and recommendation performance to improve.

Using Language Modeling for Spam Detection in Social Reference Manager Websites

Workshop paper
Toine Bogers, Antal van den Bosch
In: DIR 2009: Proceedings of the 9th Belgian-Dutch Information Retrieval Workshop, pages 87-94, February 2009

Abstract

We present an adversarial information retrieval approach to the automatic detection of spam content in social bookmarking websites. Our approach focuses on the use of language modeling, and is based on the intuitive notion that similar users and posts tend to use the same language. We compare using language modeling at two different levels of granularity: at the level of individual posts, and at an aggregated user level, where all posts of one user are merged into a single profile. We evaluate our approach on two spam-annotated data sets based on snapshots of the social bookmarking websites CiteULike and BibSonomy, and achieve promising results.

Design and Implementation of a University-wide Expert Search Engine

Conference paper
Ruud Liebregts, Toine Bogers
In: ECIR 2009: Proceedings of the 31st European Conference on Information Retrieval, Lecture Notes in Computer Science, vol. 5478, Springer Verlag, Berlin, pages 587-594, April 2009

Abstract

We present an account of designing, implementing, and evaluating a university-wide expert search engine. We performed system-based evaluation on multiple query sets to determine the optimal retrieval settings and performed extensive user-based evaluation with three different user groups: scientific researchers, students looking for a thesis supervisor or topic experts, and outside visitors of the website looking for experts. Our search engine significantly outperformed the old search system in terms of effectiveness, efficiency, and user satisfaction.

Collaborative and Content-based Filtering for Item Recommendation on Social Bookmarking Websites

Workshop paper
Toine Bogers, Antal van den Bosch
In: Proceedings of the ACM RecSys ‘09 workshop on Recommender Systems and the Social Web, pages 9-16, October 2009

Abstract

Social bookmarking websites allow users to store, organize, and search bookmarks of web pages. Users of these services can annotate their bookmarks by using informal tags and other metadata, such as titles, descriptions, etc. In this paper, we focus on the task of item recommendation for social bookmarking websites, i.e. predicting which unseen bookmarks a user might like based on his or her profile. We examine how we can incorporate the tags and other metadata into a nearest-neighbor collaborative filtering (CF) algorithm, by replacing the traditional usage-based similarity metrics by tag overlap, and by fusing tag-based similarity with usage-based similarity. In addition, we perform experiments with content-based filtering by using the metadata content to recommend interesting items. We generate recommendations directly based on Kullback-Leibler divergence of the metadata language models, and we explore the use of this metadata in calculating user and item similarities. We perform our experiments on three data sets from two different domains: Delicious, CiteULike and BibSonomy.

Recommender Systems for Social Bookmarking

Ph.D thesis
Toine Bogers
Ph.D. Thesis, Tilburg University, December 2009

Abstract

Recommender systems belong to a class of personalized information filtering technologies that aim to identify which items in a collection might be of interest to a particular user. Recommendations can be made using a variety of information sources related to both the user and the items: past user preferences, demographic information, item popularity, the metadata characteristics of the products, etc. Social bookmarking websites, with their emphasis on open collaborative information access, offer an ideal scenario for the application of recommender systems technology. They allow users to manage their favorite bookmarks online through a web interface and, in many cases, allow their users to tag the content they have added to the system with keywords. The underlying application then makes all information sharable among users. Examples of social bookmarking services include Delicious,CiteULike, and BibSonomy.

In my Ph.D. thesis I describe the work I have done on item recommendation for social bookmarking, i.e., recommending interesting bookmarks to users based on the content they bookmarked in the past. In my experiments I distinguish between two types of information sources. The first one is usage data contained in the folksonomy, which represents the past selections and transactions of all users, i.e., who added which items, and with what tags. The second information source is the metadata describing the bookmarks or articles on a social bookmarking website, such as title, description, authorship, tags, and temporal and publication-related metadata. I compare and combine the content-based aspect with the more common usage-based approaches. I evaluate my approaches on four data sets constructed from three different social bookmarking websites: BibSonomy, CiteULike, and Delicious. In addition, I investigate different combination methods for combining different algorithms and show which of those methods can successfully improve recommendation performance.

Finally, I consider two growing pains that accompany the maturation of social bookmarking websites: spam and duplicate content. I examine how widespread each of these problems are for social bookmarking and how to develop effective automatic methods for detecting such unwanted content. Finally, I investigate the influence spam and duplicate content can have on item recommendation.

Design and Implementation of a University-wide Expert Search Engine (Abstract)

Workshop paper
Toine Bogers and Ruud Liebregts
In: DIR 2010: Proceedings of the 10th Dutch-Belgian Information Retrieval Workshop, pages 65-66, January 2010

Abstract

We present an account of designing, implementing, and evaluating a university-wide expert search engine. We performed system-based evaluation on multiple query sets to determine the optimal retrieval settings and performed extensive user-based evaluation with three different user groups: scientific re- searchers, students looking for a thesis supervisor or topic experts, and outside visitors of the website looking for experts. Our search engine significantly out- performed the old search system in terms of effectiveness, efficiency, and user satisfaction.

Contextual Factors for Finding Similar Experts

Journal article
Katja Hofmann, Krisztian Balog, Toine Bogers, Maarten de Rijke
Journal of the American Society for Information Science, vol. 61, no. 5, pages 994-1014, April 2010

Abstract

Expertise-seeking research studies how people search for expertise and choose whom to contact in the context of a specific task. An important outcome are models that identify factors that influence expert finding. Expertise retrieval addresses the same problem, expert finding, but from a system-centered perspective. The main focus has been on developing content-based algorithms similar to document search. These algorithms identify matching experts primarily on the basis of the textual content of documents with which experts are associated. Other factors, such as the ones identified by expertise-seeking models, are rarely taken into account. In this article, we extend content-based expert-finding approaches with contextual factors that have been found to influence human expert finding. We focus on a task of science communicators in a knowledge-intensive environment, the task of finding similar experts, given an example expert. Our approach combines expertise-seeking and retrieval research. First, we conduct a user study to identify contextual factors that may play a role in the studied task and environment. Then, we design expert retrieval models to capture these factors. We combine these with content-based retrieval models and evaluate them in a retrieval experiment. Our main finding is that while content-based features are the most important, human participants also take contextual factors into account, such as media experience and organizational structure. We develop two principled ways of modeling the identified factors and integrate them with content-based retrieval models. Our experiments show that models combining content-based and contextual factors can significantly outperform existing content-based models.

Assessors’ Search Result Satisfaction Associated with Relevance in a Scientific Domain

Conference paper
Peter Ingwersen, Toine Bogers, Birger Larsen, Marianne Lykke, Haakon Lund
In: IIiX 2010: Proceedings of the 3rd IIiX Symposium on Information Interaction in Context, pages 283-287, August 2010

Abstract

In this poster we investigate the influence between perceived ease of assessment of situational relevance by a four-point scale, perceived satisfaction with retrieval results and the actual relevance assessments made by test collection assessors based on their own genuine information tasks. Ease of assessment, satisfaction and number of relevant documents are cross tabulated with retrieval performance measured by Normalized Discounted Cumulated Gain. Results show that when assessors find small numbers of relevant documents they tend to regard the search results with dissatisfaction and, in addition, they obtain lower performance for all document types involved.

 

Physicists’ Information Tasks: Structure, Length and Retrieval Performance

Conference paper
Marianne Lykke, Peter Ingwersen, Toine Bogers, Birger Larsen, Haakon Lund
In: IIiX 2010: Proceedings of the 3rd IIiX Symposium on Information Interaction in Context, pages 347-352, August 2010

Abstract

In this poster, we describe central aspects of 65 natural information tasks from 23 senior researchers, PhDs, and experienced MSc students from three different university departments of physics. We analyze 1) the main purpose of the information task, 2) which and how many search facets were used to describe the tasks, 3) what semantic categories were used to express the search facets, and 4) retrieval performance. Results show variety in structure and length across task descriptions and task purposes. The results indicate effect of length and, in particular, of task purpose on retrieval performance of different document description levels that should be examined further.

On the Evaluation of Entity Profiles

Conference paper
Maarten de Rijke, Krisztian Balog, Toine Bogers, Antal van den Bosch
In: CLEF 2010: Proceedings of the Conference on Multilingual and Multimodal Information Access Evaluation, Lecture Notes in Computer Science, vol. 6360, 2010, pp 94-99, September 2010

Abstract

Entity profiling is the task of identifying and ranking descriptions of a given entity. The task may be viewed as one where the descriptions being sought are terms that need to be selected from a knowledge source (such as an ontology or thesaurus). In this case, entity profiling systems can be assessed by means of precision and recall values of the descriptive terms produced. However, recent evidence suggests that more sophisticated metrics are needed that go beyond mere lexical matching of system-produced descriptors against a ground truth, allowing for graded relevance and rewarding diversity in the list of descriptors returned. In this note, we motivate and propose such a metric.

DataTEL – Issues and Considerations regarding Sharable Data Sets for Recommender Systems in Technology Enhanced Learning

Journal article
Hendrik Drachsler, Toine Bogers, Riina Vuorikari, Katrien Verbert, Erik Duval, Nikos Manouselis, Guenther Beham, Stephanie Lindstaedt Hermann Stern, Martin Friedrich, and Martin Wolpers
Procedia Computer Science, vol. 1, no. 2, pages 2849-2858, September 2010

Abstract

This paper raises the issue of missing data sets for recommender systems in Technology Enhanced Learning that can be used as benchmarks to compare different recommendation approaches. It discusses how suitable data sets could be created according to some initial suggestions, and investigates a number of steps that may be followed in order to develop reference data sets that will be adopted and reused within a scientific community. In addition, policies are discussed that are needed to enhance sharing of data sets by taking into account legal protection rights. Finally, an initial elaboration of a representation and exchange format for sharable TEL data sets is carried out. The paper concludes with future research needs.

 

 

Movie Recommendation using Random Walks over the Contextual Graph

Workshop paper
Toine Bogers
In: Proceedings of the 2nd RecSys Workshop on Context-Aware Recommender Systems, September 2010

Abstract

Recommender systems have become an essential tool in fighting information overload. However, the majority of recommendation algorithms focus only on using ratings information, while disregarding information about the context of the recommendation process. We present ContextWalk, a recommendation algorithm that makes it easy to include many different types of contextual information. It models the browsing process of a user on a movie database website by taking random walks over the contextual graph. We present our approach in this paper and highlight a number of future extensions with additional contextual information.

Does Degree of Work Task Completion Influence Retrieval Performance?

Conference paper
Peter Ingwersen, Marianne Lykke, Toine Bogers
In: ASIST 2010: Proceedings of the 73rd ASIS&T Annual Meeting, vol 47, October 2010

Abstract

In this contribution we investigate the potential influence between assessors’ perceived completion of their work task at hand and their actual assessment of usefulness of the retrieved information. The results indicate that the number of useful documents found by assessors does not influence their perception of task completion. Also, with the exception of full text records and across all document types, both measured at rank 10, no statistically significant correlation is observed with respect to retrieval performance influenced by degrees of perceived work task completion or individual types of documents.

Fusing Recommendations for Social Bookmarking Websites

Journal article
Toine Bogers and Antal van den Bosch
International Journal of Electronic Commerce, vol. 15, no. 3, pages 33-75, April 2011

Abstract

Social bookmarking Web sites are rapidly growing in popularity. Recommender systems, a promising remedy to the information overload accompanying the explosive growth in content, are designed to identify which unseen content might be of interest to a particular user, based on his or her past preferences. Most previous work in recommendation for social bookmarking suffers from lack of comparisons between the different available approaches. In this paper we address this issue by comparing and evaluating eight recommendation approaches on four data sets from two domains. We find that approaches that use tag overlap and metadata provide better results for social bookmarking data sets than the transaction patterns that are used traditionally in recommender systems research. In addition, we investigate how to fuse different recommendation approaches to further improve recommendation accuracy. We find that fusing recommendations can indeed produce significant improvements in recommendation accuracy. We also find that it is often better to combine approaches that use different data representations, such as tags and metadata, than to combine approaches that only vary in the algorithms they use. The best results are obtained when both of these aspects of the recommendation task are varied in the fusion process. Our findings can be used to improve the quality of recommendations not only on social bookmarking Web sites but conceivably also on Web sites that offer annotated commercial content.

An Exploration of Retrieval-Enhancing Methods for Integrated Search in a Digital Library

Workshop paper
Diana Ransgaard Sørensen, Toine Bogers, Birger Larsen
In: Proceedings of the ECIR 2012 Workshop on Task-Based and Aggregated Search (TBAS2012), pages 4-8, April 2012

Abstract

Integrated search is defined as searching across different document types and representations simultaneously, with the goal of presenting the user with a single ranked result list containing the optimal mix of document types. In this paper, we compare various approaches to integrating three different types of documents (bibliographic records for articles and books as well as full-text articles) using the iSearch collection: combining all document types in a single index, weighting the different document types using priors, and using collection fusion techniques to merge the retrieval results on three separate indexes corresponding to each of the document types. We find that a properly optimized retrieval model on a single combined index containing all documents without any special treatment performs no worse than our weighting and fusion methods, suggesting that more work is needed on alternative approaches to integrated search.

RSLIS at INEX 2011: Social Book Search Track

Workshop paper
Toine Bogers, Kirstine Wilfred Christensen, Birger Larsen
In: INEX 2011: Proceedings of the 10th International Workshop of the Initiative for the Evaluation of XML Retrieval, Lecture Notes in Computer Science, vol. 7424, Springer Verlag, Berlin, Heidelberg, pages 45-56, December 2012

Abstract

In this paper, we describe our participation in the INEX 2011 Social Book Search track. We investigate the contribution of different types of document metadata, both social and controlled, and examine the effectiveness of re-ranking retrieval results using social features. We find that the best results are obtained using all available document fields and topic representations.

 

CHAOS: User-driven Development of a Metadata Scheme for Radio Broadcast Archives

Conference paper
Haakon Lund, Toine Bogers, Birger Larsen, and Marianne Lykke
In: Proceedings of iConference 2013, pages 990-994, February 2013

Abstract

CHAOS (Cultural Heritage Archive Open System) is a digital platform for Danish radio broadcasts. Radio broadcasts are an important and vibrant part of our cultural heritage, but providing efficient and effective access to such archives is challenging for lack of a solid digital infrastructure. The Danish LARM project aims to meet this challenge by making one million hours of radio programs available to humanities researchers through the digital platform CHAOS. CHAOS is being built in close cooperation with the researchers involved in LARM. In this paper, we present the user-driven development of the multi-tiered metadata scheme used in CHAOS.

Measuring Serendipity in the Lab: The Effects of Priming and Monitoring

Conference paper
Toine Bogers, Rune Rosenborg Rasmussen, Louis Sebastian Bo Jensen
In: Proceedings of iConference 2013, pages 703-706, February 2013

Abstract

While the phenomenon of serendipity has proven to be a popular research topic, the issue of how to measure it effectively still relatively unexplored. We present an exploratory study that contributes to our understanding of this issue by examining the effect of (1) priming people about the concept of serendipity and (2) monitoring participants on how they experience serendipity when searching for information in a controlled environment. Our experiments indicate that it is best to keep such controlled experiments as natural as possible: priming participants about serendipity and monitoring them during their experiments seem to have a negative influence on experiencing serendipity, as they are more likely to induce participants to stay on task instead of exhibiting diverging information behavior.

Micro-serendipity: Meaningful Coincidences in Everyday Life Shared on Twitter

Conference paper
Toine Bogers, Lennart Björneborn
In: Proceedings of iConference 2013, pages 196-208, February 2013

Abstract

While the phenomenon of serendipity has proven to be a popular research topic, the issue of how to measure it effectively still relatively unexplored. We present an exploratory study that contributes to our understanding of this issue by examining the effect of (1) priming people about the concept of serendipity and (2) monitoring participants on how they experience serendipity when searching for information in a controlled environment. Our experiments indicate that it is best to keep such controlled experiments as natural as possible: priming participants about serendipity and monitoring them during their experiments seem to have a negative influence on experiencing serendipity, as they are more likely to induce participants to stay on task instead of exhibiting diverging information behavior.

Memory-based Named Entity Recognition in Tweets

Workshop paper
Antal van den Bosch, Toine Bogers
In: MSM 2013: Proceedings of the 3rd WWW Workshop on Making Sense of Microposts, pages 40-43, May 2013

Abstract

We present a memory-based named entity recognition system that participated in the MSM-2013 Concept Extraction Challenge. The system expands the training set of annotated tweets with part-of-speech tags and seedlist information, and then generates a sequential memory-based tagger comprised of separate modules for known and unknown words. Two taggers are trained: one on the original capitalized data, and one on a lowercased version of the training data. The intersection of named entities in the predictions of the two taggers is kept as the final output.

On the Assessment of Expertise Profiles

Journal article
Richard Berendsen, Maarten de Rijke, Krisztian Balog, Toine Bogers, Antal van den Bosch
Journal of the American Society for Information Science, vol. 64, no. 10, pages, 2024-2044, October 2013

Abstract

Expertise retrieval has received significant interest in the field of information retrieval. Expert finding has been the predominantly studied task, with somewhat less attention going to the complementary task of expert profiling: automatically identifying what topics a person knows about. In this work we describe a test collection for expert profiling, in which expert users have self-selected their own knowledge areas. Motivated by the sparseness of this self-selected set of knowledge areas, we report on an assessment experiment in which academic experts judge a profile that has been automatically generated by state-of-the-art expert profiling algorithms; optionally, experts are able to indicate a level of expertise for relevant areas. In addition, experts may give feedback comments on the quality of the system-generated knowledge areas; we report on a content analysis of these comments and gain new insights into what aspects of topical profiles matter to experts. Next, we provide an error analysis of the system-generated profiles, identifying factors that help explain why certain experts may be harder to profile than others. We also analyze the impact on evaluating expert profiling systems of using self-selected vs. judged system-generated knowledge areas as ground truth; they rank systems somewhat differently but detect about the same amount of pairwise significant differences, despite the fact that judged system-generated assessments are available for fewer experts than self-selected knowledge areas.

Benchmarking Domain-specific Expert Search Using Workshop Program Committees

Workshop paper
Georgeta Bordea, Toine Bogers, Paul Buitelaar
In: Proceedings of the 2013 CIKM Workshop on Computational Scientometrics: Theory & Applications, New York, NY, USA, pages 19-24, October 2013

Abstract

Traditionally, relevance assessments for expert search have been gathered through self-assessment or based on the opinions of co-workers. We introduce three benchmark datasets for expert search that use conference workshops for relevance assessment. Our data sets cover entire research domains as opposed to single institutions. In addition, they provide a larger number of topic-person associations and allow a more objective and fine-grained evaluation of expertise than existing data sets do. We present and discuss baseline results for a language modelling and a topic-centric approach to expert search. We find that the topic-centric approach achieves the best results on domain-specific datasets.

How ‘Social’ are Social News Sites? Exploring the Motivations for Using Reddit.com

Conference paper
Toine Bogers, Rasmus Nordenhoff Wernersen
In: Proceedings of iConference 2014, pages 329-344, March 2014

Abstract

Social news sites allow their users to submit and vote on online news stories, thereby bypassing the authority and power of traditional newspaper editors. In this paper we explore what motivates users of social news sites, such as Reddit, to participate in this collaborative editorial process. We present a tiered framework of motivational factors for participating on social news sites, based on a comprehensive literature review, drawn from fields like social media research, sociology, (social) psychology, and behavioral economics. We then validate this framework through a survey deployed on Reddit and use the results of this survey to focus the motivational framework for the social news domain. the recreational value of the information posted to Reddit, along with the powerful possibilities for customization appear to be the most powerful incentives for using Reddit. Perhaps surprisingly, the social aspect of social news sites is not a motivating factor for the majority of Reddit users. Influencing the placement and reception of news stories in their niche communities of interest is what draws people to sites such as Reddit.

Overview of INEX 2014

Conference paper
Patrice Bellot, Toine Bogers, Shlomo Geva, Mark Hall, Hugo Huurdeman, Jaap Kamps, Gabriella Kazai, Marijn Koolen, Veronique Moriceau, Josiane Mothe, Michael Preminger, Eric SanJuan, Ralf Schenkel, Mette Skov, Xavier Tannier, David Walsh
In: CLEF 2014: Proceedings of Proceedings of the Conference on Multilingual and Multimodal Information Access Evaluation, Lecture Notes in Computer Science, vol. 8685, Springer Verlag, Berlin, Heidelberg, pages 212-228, September 2014

Abstract

INEX investigates focused retrieval from structured documents by providing large test collections of structured documents, uniform evaluation measures, and a forum for organizations to compare their results. This paper reports on the INEX 2014 evaluation campaign, which consisted of three tracks: The Interactive Social Book Search Track investigated user information seeking behavior when interacting with various sources of information, for realistic task scenarios, and how the user interface impacts search and the search experience. The Social Book Search Track investigated the relative value of authoritative metadata and user-generated content for search and recommendation using a test collection with data from Amazon and LibraryThing, including user profiles and personal catalogues. The Tweet Contextualization Track investigated tweet contextualization, helping a user to understand a tweet by providing him with a short background summary generated from relevant Wikipedia passages aggregated into a coherent summary. INEX 2014 was an exciting year for INEX in which we for the third time ran our workshop as part of the CLEF labs. This paper gives an overview of all the INEX 2014 tracks, their aims and task, the built test-collections, the participants, and gives an initial analysis of the results.

Overview of the INEX 2014 Social Book Search Track

Workshop paper
Marijn Koolen, Toine Bogers, Jaap Kamps, Gabriella Kazai, Michael Preminger
In: Proceedings of the CLEF2014 Working Notes, CEUR Workshop Proceedings, vol. 1180, pp. 462-479

Abstract

The goal of the INEX 2014 Social Book Search Track is to evaluate approaches for supporting users in searching collections of books based on book metadata and associated user-generated content. The track investigates the complex nature of relevance in book search and the role of traditional and user-generated book metadata in retrieval. We extended last year’s investigation into the nature of book sugges- tions from the LibraryThing forums and how they compare to book rele- vance judgements. Participants were encouraged to incorporate rich user profiles of both topic creators and other LibraryThing users to explore the relative value of recommendation and retrieval paradigms for book search. We found further support that such suggestions are a valuable alternative to traditional test collections that are based on top-k pooling and editorial relevance judgements.

Proceedings of the 1st Workshop on New Trends in Content-based Recommender Systems

Workshop proceedings
Toine Bogers, Marijn Koolen, Iván Cantador
co-located with ACM Conference on Recommender Systems (RecSys 2014), October 2014

Abstract

While content-based recommendation has been applied successfully in many different domains, it has not seen the same level of attention as collaborative filtering techniques have. In recent years, competitions like the Netflix Prize, CAMRA, and the Yahoo! Music KDD Cup 2011 have spurred on advances in collaborative filtering and how to utilize ratings and usage data. However, there are many domains where content and metadata play a key role, either in addition to or instead of ratings and implicit usage data. For some domains, such as movies the relationship between content and usage data has seen thorough investigation already, but for many other domains, such as books, news, scientific articles, and Web pages we do not know if and how these data sources should be combined to provided the best recommendation performance.

The CBRecSys 2014 workshop aims to address this by providing a dedicated venue for papers dedicated to all aspects of content-based recommendation. We issued a Call for Papers asking for submissions of novel research papers (both long and short) addressing recommendation in do- mains where textual content is abundant (e.g., books, news, scientific articles, jobs, educational resources, Web pages, etc.) as well as dedicated comparisons of content-based techniques with collaborative filtering in different domains. Other relevant topics included opinion mining for text/book recommendation, semantic recommendation, content-based recommendation to allevi- ate cold-start problems, as well as serendipity, diversity and cross-domain recommendation.

Workshop on New Trends in Content-based Recommender Systems (CBRecSys 2014)

Conference paper
Toine Bogers, Marijn Koolen, Iván Cantador
In: RecSys ‘14: Proceedings of the 2014 ACM Conference on Recommender Systems, pages 379-380, October 2014

Abstract

While content-based recommendation has been applied successfully in many different domains, it has not seen the same level of attention as collaborative filtering techniques have. However, there are many recommendation domains and applications where content and metadata play a key role, either in addition to or instead of ratings and implicit usage data. For some domains, such as movies, the relationship between content and usage data has seen thorough investigation already, but for many other domains, such as books, news, scientific articles, and Web pages we still do not know if and how these data sources should be combined to provided the best recommendation performance. The CBRecSys 2014 workshop aims to address this by providing a dedicated venue for papers dedicated to all aspects of content-based recommendation.

Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?

Conference paper
Toine Bogers and Vivien Petras
In: Proceedings of iConference 2015, March 2015

Abstract

The popularity of social tagging has sparked a great deal of debate on whether tags could replace or improve upon professional metadata as descriptors of books and other information objects. In this paper we present a large-scale empirical comparison of the contributions of individual information elements like core bibliographic data, controlled vocabulary terms, reviews, and tags to the retrieval performance. Our comparison is done using a test collection of over 2 million book records with information elements from Amazon, the British Library, the Library of Congress, and LibraryThing. We find that tags and controlled vocabulary terms do not actually outperform each other consistently, but seem to provide complementary contributions: some information needs are best addressed using controlled vocabulary terms whereas other are best addressed using tags.

Searching for Movies: An Exploratory Analysis of Movie-related Information Needs

Conference paper
Toine Bogers
In: Proceedings of iConference 2015, March 2015

Abstract

Despite a surge in popularity of work on casual leisure search, some leisure domains are still relatively underrepresented. Movies are good example of such a domain, which is peculiar given the popularity of movie-centered websites and discovery services such as IMDB, RottenTomatoes, and Netflix. In this paper, we present an exploratory analysis of IMDB movie discussion threads that contain requests for movies to watch. Through emergent coding we produce a taxonomy of relevance aspects for movie search and selection. Our analysis shows that topical aspects, such as content, metadata, and known-item search, are important for movie selection practices. Other requests focus more on recommendation and feature many subjective relevance aspects, such as the tone of a movie or its intended audience. This suggests efficient access to movies is likely to require different information access paradigms to satisfy all the movie-related information needs expressed in the threads.

Looking for Books in Social Media: An Analysis of Complex Search Requests

Conference paper
Marijn Koolen, Toine Bogers, Jaap Kamps, and Antal van den Bosch
In: ECIR 2015: Proceedings of the 37th European Conference on Information Retrieval, Lecture Notes in Computer Science, vol. 9022, Springer Verlag, Berlin, pages 184-196, March 2015

Abstract

Real-world information needs are generally complex, yet almost all research focuses instead on either relatively simple search based on queries or recommendation based on profiles. It is difficult to gain insight into complex information needs from observational studies with existing systems; potentially complex needs are obscured by the systems’ limitations. In this paper we study explicit information requests in social media, focusing on the rich area of social book search. We analyse a large set of annotated book requests from the LibraryThing discussion forums. We investigate 1) the comprehensiveness of book requests on the forums, 2) what relevance aspects are expressed in real-world book search requests, and 3) how different types of search topics are related to types of users, human recommendations, and results returned by retrieval and recommender systems. We find that book search requests combine search and recommendation aspects in intricate ways that require more than only traditional search or (hybrid) recommendation approaches.

A Longitudinal Analysis of Search Engine Index Size

AcceptedConference paper
Antal van den Bosch, Toine Bogers, and Maurice de Kunder
In: ISSI 2015: Proceedings of the 15th International Conference on Scientometrics and Informetrics

Abstract

One of the determining factors of the quality of Web search engines is the size of their index. In addition to its influence on search result quality, the size of the indexed Web can also tell us something about which parts of the WWW are directly accessible to the everyday user. We propose a novel method of estimating the size of a Web search engine’s index by extrapolating from document frequencies of words observed in a large static corpus of Web pages. In addition, we provide a unique longitudinal perspective on the size of Google and Bing’s indexes over a nine-year period, from March 2006 until January 2015. We find that index size estimates of these two search engines tend to vary dramatically over time, with Google generally possessing a larger index than Bing. This result raises doubts about the reliability of previous one-off estimates of the size of the indexed Web. We find that much if not all of this variability can be explained by changes in the indexing and ranking infrastructure of Google and Bing. This casts further doubt on whether Web search engines can be used reliably for cross-sectional webometric studies.

The iSchool Community: A Case Study of iConference Reviews

AcceptedConference paper
Toine Bogers and Elke Greifeneder
Accepted for: Proceedings of iConference 2016, March 2015

Abstract

A fair review process is essential to the success of any scientific conference. In this paper we present an analysis of the reviewing process of the 2014-2015 iConferences as well as a demographic analysis of the iConference community as a whole. The results show a clear need for making the reviewer pool more representative of the iSchool community as a whole by including more women and more researchers from Asian institutions. Other recommendations are to improve the continuity of the reviewer pool and to provide clearer instructions to reviewers to ensure that written reviews explicitly cover all the aspects represented by the review scores. The results of our study provide the iSchool community with a descriptive analysis of its community and a better understanding of its review process.

Analyzing the influence of Language Proficiency on Interactive Book Search Behavior

AcceptedConference paper
Toine Bogers, Maria Gäde, Mark Hall, and Mette Skov
Accepted for: Proceedings of iConference 2016, March 2015

Abstract

English content still dominates in many online domains and information systems, despite native English speakers being a minority of its users. However, we know little about how language proficiency influences search behavior in these systems. In this paper, we describe preliminary results from an interactive IR experiment with book search behavior and examine how language skills affect this behavior. A total of 97 users from 21 different countries participated in this experiment, resulting in a rich data set including usage data as well as questionnaire feedback. Although participants reported feeling language constraints, a preliminary analysis of native and non-native English speakers indicate little to no meaningful differences in their search behavior.