This page contains our annotations for two of the data sets described in our paper containing tweets tagged with #serendipity:

Both data sets are in comma-separated-value format (which is easily imported in many applications, including Excel) and contains four columns:

  • Our internally assigned IDs
  • Twitter tweet ID
  • Twitter user name
  • Our annotations
    • Five columns for Topsy-150 (COMM/LINK/NAME/REFL/PERS)
    • Two columns for Topsy-Winter (PERS/MISC)

Due to copyright reasons we are not allowed to share the full text of the tweets (or other information). However, the information in these data sets is enough to locate and download all the tweets used in our data sets by yourself. Tweets can be accessed via the URL[username]/status/[tweet-id], where username and tweet-id are provided by our data set. For example, tweet id-J395 corresponds to tweet ID 163676779159093248 and username ajslavin, which can then be accessed at

If you want to use this dataset, please cite the following paper for which this set was constructed: