The Latest Texifter Video Tutorials

To quickly learn the latest about gathering Twitter tweets for research, importing SurveyMonkey data, or using state of the art text analytics tools, check out our most recent Texifter tutorial videos. While they have a bit of a “home brew” flavor, we’ve been told they do help jump start the process of learning about exciting new tools for text.

Posted in DiscoverText, product, Social Media | Tagged , , , , , , , , , | Leave a comment

Texifter Social Data & Tools June Prize Winners

As a part of getting new users to test our sifter beta, every month this summer we are awarding 12 #datagrants to academics. All you need to do to be included in the July drawing is submit a valid historical Twitter estimate request using sifter and then send us your CV. These prizes shave thousands of dollars of costs off of your research. The June social data and tools prize winners were:

Kelly Fincham
The Department of Journalism, Media Studies, and Public Relations at Howard University

“I will use the data and software prize to further my research and analysis of journalism practice on Twitter. My research agenda explores journalists’ evolving norms and practices on social media, specifically Twitter, in the U.S. and Ireland. This grant will help me to research and analyze this subject area  in more depth.” @kellyfincham

Martina Wengenmeir
PhD candidate in the  Media and Communication Department at the University of Canterbury, Christchurch, New Zealand

“I am hoping to use the data and software prize for my PhD research on the recovery and rebuild after the Christchurch earthquake of 2011. I am particularly interested in framing and sentiment of tweets and am hoping to compare a historical data set during disaster response and recovery to the conversation about the rebuilt of the city which is still ongoing today. I am hoping to study the differences and similarities of conversations on Twitter now and then.” @tinserella

Carmina Godoy
Postgraduate student in the Universidad Complutense de Madrid

“I will like to integrate the collected data (tweets) in my final essay in order to get my Masters degree. The subject of my essay is: racism online.” @CarminaGodoy

Warren Allen
Assistant Professor at the iSchool at Florida State University

“This award will be used to collect and analyze select data from the early group stages of the 2014 World Cup. Social media – including but not limited to Twitter – are increasingly integrated into traditional (TV, radio, print) media campaigns. At the 2014 World Cup, the hashtags #becausefootball and #becausefutbol were promoted throughout the televising of the games. Exploratory thematic analysis of these Tweets – enabled by Sifter and Discovertext – will describe how the use of these commercially-oriented hashtags are used in comparison to what we know about live event Twitter usage in the current body of research.” @warrensallen

Bryce Newell
PhD Student in the Information School at the University of Washington

“I plan to use the prize to capture and analyze online discussion and commentary about police use of automated license plate recognition (ALPR) systems and wearable cameras.  In particular, I hope to examine discussions related to the public disclosure of data generated by these systems under freedom of information laws.” @newmedialaw

Jae Eun Chung
Assistant Professor in the School of Communications at Howard University 

“This project will survey the current use of online social media by health organization for health campaign and analyze the reach and diffusion of campaign messages. Despite the ever growing number of online social media-based health campaigns, little work has been done to understand how interactive natures of online social media are used for public health promotion. For this project, Twitter data will be analyzed to enhance our understanding of how health organizations use social media for public health promotion and how such uses of online media platforms are received by the public.”

Abhay Gupta
Lecturer at Fairleigh Dickinson University

“I plan to use it to understand the dynamics of public opinion. In particular, I want to test various hypotheses on how major events (e.g. election wins, market crash, sports results) impact the sentiment and whether pre-event opinion analysis has any predictive power in explaining actual outcomes.”

Victor Barger
Assistant Professor of Marketing at the University of Wisconsin-Whitewater

“I am looking forward to using the Texifter data and software to investigate how consumers and brands communicate on social media. In particular, I’m interested in how language use affects consumer behavior in online contexts. Given the extent to which consumers have and are continuing to adopt social media, this research should have important implications for marketing practitioners.” @vabarger

Jamie Baxter
Assistant Professor in the Schulich School of Law at Dalhousie University

“I am studying the influence of social movements on changes in the law — specifically land law. I hope to use the prize to access Twitter data that can tell me about the relationships between movement actors, how they form their interests, and how these change over time.” @jrgbaxter

Stephen Jeffares
Professor in the School of Government and Society at the University of Birmingham

“I will use the software and data to continue my study of the lifecycle of policy initiatives. I used DiscoverText in my latest book Interpreting Hashtag Politics (Palgrave Macmillan, 2014). Historic Twitter data reveals the first mention of policies that enjoy several months of widespread attention before disappearing without trace. To understand why and how this occurs, I will continue use DiscoverText to de-duplicate the data and develop thematic code sets with a team of research assistants.” @SRJeffares

Cristian Vaccari
Lecturer in Politics at the Royal Holloway University of London

“I am planning on using the data and software to analyze how politically motivated users of social media engage with mediated political events, such as televised leader debates and high-profile interviews, to better understand the interplay between television and social media in the flow of political messages.” 25lettori

Bill D. Herman

Remember: All you need to do to be included in the July drawing is submit a valid historical Twitter estimate request using sifter and then send us your CV.

Posted in Disqus, research, Social Media, Texifter, Tumblr, Twitter, WordPress | Tagged , , , , , , , , , , , , , , , , | 1 Comment

Texifter is Plugged In to Gnip

We could not be happier to announce that Texifter, a developer of advanced text data analytics software, is partnering with Gnip, the world’s largest provider of social data. Our Plugged In to Gnip partnership certifies Texifter as an industry leader committed to building innovative analytics solutions on top of reliable, sustainable, and complete social data. In joining Gnip’s partner program, Texifter joins the list of leading analytics providers like Microsoft, Salesforce, and Adobe.

“The Plugged In program was created to really highlight the companies that are doing the most innovative things in social data,” according to Chris Moody, CEO of Gnip, “and Texifter is a great example of that.”

Texifter’s DiscoverText platform provides advanced data analytics solutions for social researchers in public and private institutions.  Combining powerful tools with accessible interfaces, DiscoverText provides “five pillars of text analytics” – search, filtering, clustering, human-coding, and machine-learning.

By partnering with Gnip, Texifter has access to historical Twitter data. Texifter recently launched “Sifter”, a tool to help users estimate Twitter volume associated with historical searches. The Sifter product gives users a free estimate of Twitter volume over a specific date range using advanced Gnip PowerTrack filtering. Customers who license historical Twitter data from Gnip can then access it for text analytics via a 30-day trial of DiscoverText.

“Texifter welcomes this opportunity to work even more closely with a company that we have admired and worked with for years,” said Stu Shulman, CEO of Texifter. “Gnip is an exceptionally reliable provider of social data products and services. Texifter customers will continue to see more benefit as we work with Gnip to deliver high quality products and services.”

Posted in DiscoverText, GNIP, Social Media, Texifter | Tagged , , , , , , , , , , | Leave a comment

Texifter Announces Strategic Partnership with SurveyMonkey

Texifter Announces Strategic Partnership with SurveyMonkey
to Improve Survey Data Analytics

Combining the power and versatility of Texifter’s DiscoverText analytics with the reach of the world’s largest survey website.

AMHERST, MA., May 27 2014—Texifter, a developer of social data and text analysis tools, today announced a new strategic partnership with SurveyMonkey, the world’s largest survey website, to provide advanced text analytics capabilities to SurveyMonkey users through its cloud-based platform, DiscoverText.

SurveyMonkey is known for intuitive interfaces and communications features that allow researchers to collect millions of survey responses every day. When surveys produce very large numbers of responses to open-ended questions, it can be a challenge to analyze all of the verbatim data. This is especially true for those relying on spreadsheet software as their primary text analytics tool.

DiscoverText provides an accessible “point and click” solution for these and other analytic challenges. Starting today, all DiscoverText users will be able to log in to SurveyMonkey to easily import existing survey data. Researchers can use a 30-day free trial to apply the full range of Discover Text’s powerful software tools to both the open ended answers and the structured survey metadata. Texifter’s “five pillars of text analytics” approach combines search, filtering, clustering, human-coding, and machine-learning.

Once registered on DiscoverText, newcomers have access to a wide spectrum of online data feeds. Facebook, Tumblr, YouTube, WordPress, Disqus, and Twitter data can be gathered, managed, and analyzed in DiscoverText alongside SurveyMonkey responses, email, and other forms of text data.

“The Texifter team is excited to be introducing SurveyMonkey users to the powerful and flexible text analytics tools in DiscoverText,” said founder and CEO Stuart Shulman. “We are confident that once people try out features like clustering and custom machine-learning, they’ll begin to see new possibilities for generating insights from bigger and more diverse collections of unstructured free text.”

This strategic partnership signals the latest phase in the evolution of DiscoverText. Originally built for federal agencies sorting large-scale public comment collections, the four-year old collaborative research platform now serves a wide variety of public and private sector clients, as well as the academic research community.

Texifter is a spin-out company based on information science research by Dr. Stuart W. Shulman, who directed the development of numerous human language tools for reviewing large numbers of public comments.

Texifter Contact
Stuart Shulman

Posted in DiscoverText, product | Tagged , , , , , , , , | Comments Off

School Bullying Research Using DiscoverText

Our Vanderbilt University team uses DiscoverText (DT) to support qualitative text analysis of 8,531 high school students’ responses about their in-school experiences of bullying. DiscoverText has offered us powerful ways to perform key steps throughout our coding process. Fundamentally, DT supports parsing our large data set into archives, buckets, and datasets. Thus, we are able to focus on key portions of our large data set to hone our initial hierarchical coding structure while retaining the ability to return to an untouched dataset for final coding. We use the diverse annotation tools in DiscoverText to mark singular problematic items for discussion at meetings. Our team was able to develop a complex coding structure with 58 codes (at one point we had 128), and begin coding in a month and a half. Undoubtedly, DiscoverText’s robust organizational and annotation tools, within an easy-to-use user interface, supported expediency.

Following the development of our coding structure we employed DiscoverText’s analytic tools to better understand and improve our team’s inter-coder reliability. DT’s real-time coding analytics supports decision-making in meetings. Through the use of these tools, we raised our coding reliability from a .2 Kappa value to a .82 Kappa value after five training rounds. Given that four coders are using 58 hierarchical codes to code over 8,000 free-response items, the numbers represent a phenomenal increase in reliability. Presently, we are half way through coding the 8,531 items using overlapping coding patterns to ensure reliability. Out team members share their experiences below:

“I am currently working with a research team that must code students’ responses about their bullying experiences. I had never coded before and was introduced to DiscoverText only a few months ago. Fortunately, I have found DiscoverText to be very user-friendly and easy to navigate. Despite my lack of formal coding experience, I have found the program to run smoothly and have already learned a great deal in such a short period of time. My favorite feature thus far would have to be the code-by-code comparisons. This allows us to discuss any discrepancies among the research team and to increase our reliability. I have enjoyed exploring the features of this program and look forward to discovering what more it can do.” – Abbie, undergraduate, Human and Organizational Development, honors track.

“My team is using DiscoverText to code thousands of brief responses to a survey question about bullying. As someone who is new to qualitative research and coding programs, I have found DiscoverText easy to use. The coding process was very easy for me to learn, and I quickly became efficient at coding responses. Our initial looks at code comparisons have been fairly straightforward for me to figure out as well. As we move forward with more analysis, I anticipate other functions and features of DiscoverText will be similarly straightforward, and I will see more of the power of the program.” - Brian, master’s student, Human Development Counseling.

“I’m working with DiscoverText as part of an academic research team analyzing high school students’ qualitative responses to questions about bullying. As we have been coding responses, we have found the coding process fairly smooth, although not without a few features that we would have done differently. Still, the process of coding is similar to that of other qualitative coding software (I’ve used NVivo). We haven’t yet gotten into any sophisticated filtering or analysis, but I’m expecting that it will be really useful. The biggest impression I’m left with after my three months of using DiscoverText is that it’s a powerful tool, and we’ve only scratched the surface of what it can do.” - Ben, doctoral student, Community Research and Action.

Overall, DiscoverText enabled our team’s timely progress through a complex research process. Following coding, we intend to make use of DT’s meta data “tagging” capabilities such that we can meaningfully export coded response summaries to their “tagged” respective schools. Finally, we intend to continue to explore the useful capabilities of DT in our research. We find DiscoverText easy-to-use and helpful – our questions have been kindly answered by the Textifier support team or solved through processing the helpful support material on DT’s support site!

Thanks a lot DiscoverText!
Joseph H. Gardella

Posted in Coding, research | Tagged , , , , , , , , , | 1 Comment

Five Pillars of Text Analytics

Document relevance is a key challenge for social media research. The specific problem of “word sense disambiguation” is widespread. If I am interested in “banks” where money is stored, I want to exclude mentions of river banks. If I am “Delta” airlines, I do not want to see social data about Delta faucets, Delta force, or those pesky river deltas. If I run a sports team like the Pittsburgh Penguins, the massive numbers of Facebook posts and Tweets about flightless but adorable birds are equally problematic. There are very few social media analytics projects that can easily avoid the challenge of sorting relevant and irrelevant documents.

At Texifter, we have refined a powerful set of tools and techniques for doing word sense disambiguation. This 5-minute video uses the example of Governor Chris Christie to illustrate how the five pillars of text analytics can help anyone to identify and remove irrelevant documents from an ambiguous social data collection. The principles are very similar to spam filtering in email; we use the same mathematics. Using DiscoverText, we argue an individual or small collaborative team can create a custom machine classifier for the task in just a few hours. Someday, we hope to get this down to a few minutes.

Posted in DiscoverText, general, product, research, Social Media | Tagged , , , , , , , | 2 Comments

Big Data TechCon


Posted in general | Comments Off

DiscoverText: A Vital Research Tool for Social Media

Longtime DiscoverText User Jacob Groshek

I’ve been using DiscoverText for several years, primarily in an academic research capacity but also working with journalists to help them reach broader audiences through social media.  From an academic standpoint, DiscoverText was the backbone of collecting Facebook and Twitter data for a study on the 2012 Presidential election that was published in Social Scientific Computer Review.  When working with the New England Center for Investigative Reporting, we use DiscoverText to collect social data and mine that to find users interested in topics being covered by the center and to share stories with them.  Raw data can be exported for use in third party software, as in the case of this work on co-mentions about flooding.

Altogether, DT is a vital tool to not only collect and gather data but also to code and analyze data.  It is simply the best place to begin with social data, and offers utilities many other entities do not, including the ability to clean data and minimize redundancies such as those created by bots.  DiscoverText and Texifter personnel have my highest endorsement. It is a model enterprise for users at all levels who are looking to engage in a rich and thorough analysis of social media data.

Posted in DiscoverText, Facebook, product, research, Social Media, Twitter | Tagged , , , , , , | 1 Comment

DiscoverText as a Teaching and Research Tool

Conducting research on the impact of large projects and events is difficult as each undertaking is unique. Traditional quantitative techniques face limitations of internal validity while qualitative research faces challenges of external validity. However, projects and events generate a massive amount of social media traffic that can be used to understand stakeholder interactions before, during and after delivery.  In addition to research, they also provide an avenue to enhance teaching and learning activities as students can collect social media data to apply new research techniques such as text mining. At Bournemouth university, we’ve launched a project called Festim that aims to develop research and teaching using data from social media networks.

For research, the initial objective  is to  enable the evaluation of social impacts, an area that is difficult to assess using conventional qualitative and quantitative approaches.  In the teaching domain, we wish to develop Reusable Learning Objects that can guide future graduate researchers seeking to apply social media data. We also wish to widen the range of research options available to undergraduate students  to include social media analysis.

We were fortunate to get a trial enterprise subscription to DiscoverText, which we used to support all of these activities. For research, DiscoverText enables us to understand the online narratives around events on Facebook, Google+, and Twitter. So far, we have been able to create a taxonomy that compares festivals by online stakeholder engagement. Our team is also exploring the nature of discussions that generate engagement across multiple platforms. We’ve used DiscoverText to uncover the nature of the temporary communities of interest that are created on Social Media  from the discussions around festivals.

Undergraduate researchers have also deployed DiscoverText. One student has used the platform to compare the impact of music events while another has explored how social media is used to recruit volunteers.  For teaching, our students have been using DiscoverText to understand the content of discussions on Facebook pages of case study companies as a way of illuminating current issues.

Posted in DiscoverText, Facebook, general, research, Twitter | Tagged , , , , , | 1 Comment

Tools for Text – Lecture at Northeastern University Monday March 10, 2014

Tools for Text

Dr. Stuart W. Shulman
Founder & CEO of Texifter
Research Associate Professor of Political Science
University of Massachusetts Amherst

12pm – 1:15pm, Monday, March 10
Center for Complex Network Research
5th floor Dana Building, Northeastern University (take elevator on left)

Tools for reviewing, coding, and retrieving text found in qualitative data analysis packages carry with them no particular attributes for ensuring the reliability or accuracy of the recorded observations. Based on 13 years of multidisciplinary experience, this presentation guides researchers through key aspects of measuring coder validity and reliability as part of building custom machine classifiers. The presentation demonstrates how text mining and related analytic tools focus attention on unexpected or difficult to code concepts, which in many cases will constitute the most interesting terrain for deeper investigation.

Posted in general | Comments Off

Texifter News: Migration to Azure and the Big Boulder Initiative

A brief follow up on Texifter. We successfully migrated “DiscoverText” ( to Microsoft’s Azure. It was very smooth, though we are going through a period of diminished search and filtering capabilities while the data re-indexes. Otherwise, the other capabilities appear stable.

We also launched a new beta product on Azure to allow users to get free estimates (and buy the data) self-serve from the full history of Twitter. The live prototype is “Sifter” (

Finally, I have been elected a board member and Treasurer for the Big Boulder Initiative ( In that capacity, I will be playing a role helping to organize the social data industry association that will launch in June at Big Boulder.

2014 is looking good for Texifter. On January 31, 2014, the company re-acquired of all assets and intellectual property related to DiscoverText, including the Sifter stack of language technologies for de-duplication, clustering, coding, and machine-learning, as well as the “CoderRank” patent.  Going forward, we believe these tools can make a significant impact on the history of information.

Posted in general, Texifter | Tagged , , , , , , , , , | 1 Comment

Collecting Facebook & Twitter Data

This is an updated 4-minute tutorial on how to collect public Facebook data via the Open Graph API using DiscoverText.

This is an even shorter 75-second tutorial on how to collect Twitter data via the public API.

Posted in API, Facebook, Social Media, Twitter, Twitter | Tagged , , , , , , , , , , , | Comments Off