The First Draft of ‘Twistory’ Revisited – An Update on (Not) Sharing Twitter Collections

Researchers like new datasets. Many of us build tools and techniques that work nicely with existing data, but may perform poorly with “out-of-sample” datasets. The ability to generate new and interesting big datasets, especially ones that draw a crowd of researchers and are of public interest, is what grew directly out of 10-years of eRulemaking research.

Then along comes Twitter and Facebook. People tell me, “Hey Stu, there is useful information mixed in with very large quantities of not-so-useful information.” Where have I heard this? Right, this reminds me of the mass email campaigns that may do more harm than good when they drown out the legitimate voice of the informed citizen.

Twitter itself doesn’t provide a search engine capable of generating accurate, complete, targeted tweet datasets for research or analysis. So we wired up DiscoverText for harvesting this data. People asked for it, we built it. Now Twitter tells us not to share large collections. In their view, this is proprietary data. In my view, this is prime historical data in the public domain (Twitter-History a.k.a. ‘Twistory’)  that yearns to be free.

About Stuart Shulman

Stuart Shulman is a political science professor, software inventor, entrepreneur, and garlic growing enthusiast who coaches U13 boys club soccer and in the Olympic Development Program with a national D-license. He is Founder & CEO of Texifter, LLC, Director of QDAP-UMass, and Editor Emeritus of the Journal of Information Technology & Politics. Stu is the proud owner of a Bernese/Shepherd named "Colbert" who is much better known as 'Bert. You can follow his exploits @stuartwshulman.
This entry was posted in general and tagged , , , , , , , . Bookmark the permalink.