On October 1st, Texifter staff will begin testing the GNIP Firehose for Twitter, which delivers 100% of the Tweets you want, based on the criteria which users provide. This is a remarkable tool and will greatly contribute to the evolution of DiscoverText as a major social media text analytic toolkit.
Currently, Twitter restricts its public API to 150 unauthenticated calls, per hour, per IP Address. Going over any of these limits results in the user being presented with “Error 420”, which simply means that the user is being rate limited, and the feeds will begin harvesting after a break. This most recently hampered DiscoverText users back in August, when users began seeing near constant rate limitations.
While Twitter has always included these regulations in their public API rules, Twitter might have become more cognizant of those harvesting large amounts of data (not just us), and as a result, are cracking down on heavy users.
The GNIP “Power Track” Firehose will eliminate all the rate limits of the Twitter public API, and allow DiscoverText users to harvest unregulated data with extremely robust metadata. The Firehose allows users to run a search where GNIP guarantees ALL Tweets will be harvested. This is an improvement upon the current 1,500 Tweet limit imposed by Twitter. In addition, searches have the ability to use certain operations to guarantee only certain results.
For example, Twitter searches on DiscoverText will allow users to specify a Klout score (or range of scores), as a filter. Only the Tweets with that score, or falling within that range, will be archived. To assure that searches will be unhindered by bandwidth and processing constraints, Texifter staff has been working assure that data will stream unhindered into DiscoverText databases.
The metadata which is harvested via GNIP is extremely robust, and it will allow users to make deeper more insightful inferences about their data. Currently, the public API only provides basic information, such as user, time, and date, along with the Tweets. The GNIP-enabled Full Firehose opens to door to more metadata, which will allow for more complete and insightful analysis, here are some of the new metadata features:
- Filter on KLOUT Scores. Data can be analyzed according to a person’s internet presence, or influence, as determined by their Klout score.
- All #Hashtags will be harvested, allowing users to find associations between hashtags, specific users, text, and re-tweet patterns.
- Find the actual number of re-tweets in a collection.
- Specify the language of posts. Want only English posts? Use the operator “eg”.
- Leverage location data, may include country, city, and the coordinates. Soon, using the Google Maps API, DiscoverText will have the ability to map Tweets, allowing users to find hotspots.
- Determine the number of tweets per user.
These are just a few of the powerful options. Advanced metadata filters within DiscoverText will be modified to allow the user to search for all of these within the Advanced Search. All these features will be available on DiscoverText in private beta beginning October 8, 2011. Starting October 1, Texifter personnel will be internally testing the Firehose. On the 8th, a small numbers of users will be granted first access to using the Firehose through DiscoverText. Depending on how smoothly the beta proceeds, additional users will be granted access each day. We are currently holding a sign-up for the beta and we are still taking applications.