On the May 1, 2011 evening it was announced that Osama bin Laden had been killed, we started running repeated fetches against the Twitter API for the terms “osama” and “bin laden”. On May 3, we posted more than 1.2 million tweets in XML format. Since then, the live feed collection on DiscoverText keeps rolling along.
The Twitter API serves a maximum of 1500 items per fetch. The DiscoverText live feed scheduler can fetch as often as every five minutes. During the peak of the Tweet storm, running a single repeated fetch could not get 100% of the Tweets. The work around that produced these two large collections was to set up several repeating fetches in DiscoverText that all fed the same archive. The results are frankly more Tweets than anyone might ever need to understand this slice of the the micro-blogging public sphere during a critical juncture in world history.
1:46 PM UPDATE: Approaching 1 million tweets per archive