DiscoverText Introduces Tools for Random Sampling

DiscoverText is rolling-out an addition to its analytical toolkit: random sampling. The Web-service already offers an array of tools for text analytics and rigorous, team-based qualitative data analysis. These functions include the ability to code and annotate text, measure inter-rater reliability, adjudicate coder validity, attach memos to text, cluster duplicate and near-duplicate documents, share documents, and to classify text using an active-learning Naive-Bayesian classifier. While still in beta, random sampling is a key new addition.

After DiscoverText users amass extraordinary amounts of social media data (for example via the Public Twitter API, the GNIP Powertrack, or the Facebook Social Graph), they can now more easily extract a random sample for analysis. The size of the sample is decided by the user in order to accommodate to iteration, experimentation and other scientific methods. The option is streamlined into the dataset creation process. On the new dataset creation page, you see a sample size prompt.

This additional method for data prep and analysis augments current information retrieval techniques, such as search with advanced filtering. It also builds up our framework for expanding available NLP methods from straightforward Bayesian classification, which aims to analyze substantial quantities of data in their original bulk-form, to a menu of computationally intensive methods that can iterate more quickly and effectively against random data samples. For example, the LDA topic model tool we are releasing will be faster and more effective against smaller random samples.

This new feature accommodates both an additional analytical approach as well as the opportunity to easily compare results between competing (or complimentary) analytic methods. We look forward to experimenting with this new tool and hearing about how random sampling will enhance the research of our users and users to come.

Special Note to DT Users: We need to turn this feature on one account at a time while we are testing it. Drop us a line if you want to try the tool.

We’ll keep you posted on the launch as more dataset modifications are pushed live. As always, if you have any questions, feel free to email us anytime at Your feedback is crucial. Sign up and try it out for yourself at

About Josh Sowalsky

Josh Sowalsky is the Director of User Support at Texifter, where he has worked since September 2010. He holds two degrees in Political Science and Middle Eastern Studies from UMASS Amherst, where he minored in History, Arabic, and International Relations. While at UMASS Josh designed and taught an advanced course that examined the intersection of technological development and national identity formation. Serving also as a research assistant in the UMASS Political Science department, he researched and published articles on electoral politics and political dissent in Jordan. Josh has conducted and presented multilingual field research on civil society development, democratization, and national identity formation throughout the Middle East - namely in Israel, Lebanon, and Syria. His honors thesis was entitled, "The Role of Women's Rights NGOs in Syrian Democratization." When not managing projects in QDAP or harvesting Arabic protest tweets in DiscoverText, Josh can be found strumming a ukulele, exploring Netflix, or swinging aimlessly at tennis balls.
This entry was posted in DiscoverText, Facebook, general, GNIP, product, research, Twitter and tagged , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , . Bookmark the permalink.