Texifter, LLC. Blog

Building a Social Sifter

Posted on March 5, 2012 by Stuart Shulman

Check out this video introducing our latest experiments creating custom social “sifters” to winnow down a lake of social media data and leave behind only those items that are truly responsive to your search. Great tool, or greatest tool ever? You be the judge.

Tweet This Post

Posted in DiscoverText, research | Tagged analytics, crowdsourcing, DiscoverText, Machine Classifiers, Machine Learning, SIFTER, Social Media, social sifter | Comments Off

Why Try GNIP?

Posted on March 1, 2012 by Stuart Shulman

We asked folks signing up for the 2nd GNIP beta using DiscoverText why they were doing it. Here is a nice Wordle showing some of the common themes:

We also asked for job titles. No surprise the professors lead the way.

The sign up remains open. Jump in and let us know if you like our Enterprise solution for social media analytics.

Tweet This Post

Posted in DiscoverText, GNIP, research, Twitter | Tagged Data Mining, DiscoverText, GNIP, Klout, Machine Learning, NLP, Power Track, Social Media, social media monitoring, Text Analytics, Tweets, Twitter Mining | 1 Comment

New Facebook Meta Data

Posted on February 9, 2012 by Stuart Shulman

DiscoverText is deploying new capabilities to work with geographic information and social media. Part of this work is now centered on Facebook and the Open Graph API. This one-minute video shows some of the new meta data publicly available from user “Check Ins” as well as the like count and liking user names. More to come in the weeks ahead.

Tweet This Post

Posted in DiscoverText, Facebook, research | Tagged API Graph, Data Mining, DiscoverText, Facebook, Facebook API, R&D, social media monitoring, Text Analysis | Comments Off

Get the Twitter Fat Pipe

Posted on February 2, 2012 by Stuart Shulman

Texifter is launching a second beta test period using “Power Track for Twitter” fire hose filtering a service provided by GNIP. We have streamlined the process of providing Enterprise class access to the beta test. This beta includes access to an expanding set of tools for archiving, filtering, coding, validating and machine classifying text. You can train a custom machine classifier in about 30 minutes.

The GNIP Power Track, in partnership with Twitter, provides users with unrestricted, real-time filtering of the Twitter fire hose. This enriched feature for DiscoverText provides a valuable analytical tool to our users. Not only will the GNIP Power Track provide users with access to the full stream of fire hose data, it will also provide Klout scores, language data, re-tweet frequency, geographic coordinates, and all #hashtags where available in the results. Taken together, this quantity of data and rich metadata fields will allow users to perform valuable social media analysis within DiscoverText.

For more information: info@DiscoverText.com

Tweet This Post

Posted in DiscoverText, GNIP, Twitter | Tagged #bigdata, beta test, Code Text, Data Mining, DiscoverText, GNIP, Twitter API | 1 Comment

Tracking FEMA Emergency Hashtags

Posted on January 27, 2012 by Joseph Delfino

On November 9^th, the Federal Emergency Management Agency(FEMA) conducted its first national test of the Emergency Alert System. In some communities this meant full involvement, with teams responding to mock emergencies, and managers monitoring the execution. In the deaf community, the response to monitor was regarding two Twitter hashtags, #SMEM, and #DEMX. The #SMEM hashtag is specific to the emergency response community, and was created over a year ago, and the #DEMX hastag is specific to the deaf community, but created specifically for this event. Monitoring the usage of these hashtags was Steph Jo Kent, a PhD. Candidate in Communications at the University of Massachusetts. Steph’s goal was to monitor the spread of these hashtags throughout the deaf community and emergency response community and how they crossed channels. In order to do this, she utilized DiscoverText, which is how I was lucky enough to become involved in the project.

Monitoring these specific Tweets adds to the already diverse functionality of DiscoverText. To start the project, we simply used the Twitter API to harvest uses of #SMEM and #DEMX beginning on November 2. After the event on November 9, we continued to harvest uses of the hashtags. By early December, we had archived nearly 800 Tweets using the hashtag #DEMX, and nearly 8,000 Tweets using the hashtag #SMEM. From these two archives, it is possible to breakdown Tweets by time and person, giving us valuable information about key individuals and how they spread the hashtag. For Steph’s research, it was particularly valuable to isolate the crossover between the two hashtags. Using our search feature, we were able to isolate cases of crossover and bucket those results. This allows us to move from noisy data, to a more manageable and germane grouping of Tweets.

From here, we utilized the newly optimized TopMeta feature to breakdown the occurrences by day and by user. We were able to discover which days and individuals produced the most Tweets. The information we found allowed us to better visualize how the Tweets broke down before and after the event. The results showed a small number of users producing the majority of Tweets, and that prior to the event, there was more usgage of the hashtags. Unfortunately, the mass crossover of Tweets that we had envisioned did not occur. There was a minimal amount of crossover, meaning the message did not travel well through the two communities. Steph has posted a detailed analysis of her findings on her blog, where she uses her expertise to analyze the project.

In the future, this same methodology can be applied to hashtags that have been created for marketing or other purposes, such as hashtags for television shows and large events. There is valuable information in these hashtags; they reflect an emergent folksonomy that influences how ideas, links and memes spread over Twitter.

Using the GNIP Power Track, these hashtags can be leveraged as metadata, broken down over time and used to display how well information did or did not travel. Overall, this was great experiment, and I am happy to have had the opportunity to collaborate with Steph, and to have participated in a project that has the power to influence the way social media is used to interact those in the deaf community.

Tweet This Post

Posted in DiscoverText, GNIP, research | Tagged Active Learning, Data Mining, DEMX, DiscoverText, Emergency Management, FEMA, Hashtag Monitoring, Machine Learning, public comments, SMEM, Texifter, Text Mining, Twitter API, Twitter Mining | 1 Comment

GNIP Twitter Firehose: beta v2

Posted on January 19, 2012 by Stuart Shulman

We are very close to launching v2 of the GNIP-enabled Power Track for Twitter, aka the full fire hose. Sign up here!

If you missed v1, or even if you were part of the first beta test, we need your help. Sign up today to discover why we are huge fans of GNIP and their powerful API.

Tweet This Post

Posted in general | 1 Comment

Archiving Google+

Posted on January 15, 2012 by Stuart Shulman

As part of developing our suite of live feed options, we are currently beta testing a new Google+ API. If you would like to join the beta group, let us know. This video outlines the way we have set it up.

Tweet This Post

Posted in general | Tagged #bigdata, API, Google, Research | Comments Off

DiscoverText Tutorial Videos

Posted on January 8, 2012 by Stuart Shulman

Screencast.com, a web-service that has been very reliable and affordable for years delivering tutorial videos, has a nice “Playlist” feature for sharing a set of related videos. An updated collection of updated DiscoverText tutorials is now posted with an RSS feed as well as an iTunes podcast option. The videos look great in iTunes; enjoy!

Tweet This Post

Posted in general | Comments Off

Stanford Lecture and Demo

Posted on January 7, 2012 by Stuart Shulman

The Institute for Research in the Social Sciences (iriss.stanford.edu) will be hosting a talk and software demo by Dr. Stuart Shulman, Founding Director of the Qualitative Data Analysis Program at UMass Amherst.

The talk abstract is copied below. We invite you to share this announcement with anybody who may be interested in the topic. The talk is open to all.

Date: Jan 12, 2011, 3-5pm
Location: Green Library, SSRC Seminar Rm.

Measuring Validity in Annotation
This lecture and software demonstration with Q&A prepares faculty and students from across a wide range of disciplines who are conducting, or may be contemplating, a qualitative or text analytic study. Opportunities for interesting qualitative research abound in traditional forms (ex., interview and focus group data, or open ended survey answers) and novel new sources (ex., databases, listserves, social media, other web content, email interviews, or blog posts and comments).

Each attendee will learn about the Coding Analysis Toolkit (CAT is free/open source) and DiscoverText (free trial/commercial) and how these packages can be used to perform critical research tasks, such as social media archiving, human annotation, validity and reliability measurement, as well as machine classification. The skills and methods developed in this session can be extended to the task of preparing and executing a successful thesis or dissertation, or to carrying out critical professional activities, such as publication and program or project evaluation.

Bio: Dr. Stuart W. Shulman is founder & CEO of Texifter, LLC and an Assistant Professor of Political Science at the University of Massachusetts Amherst. He is the founding Director of the Qualitative Data Analysis Program (QDAP) at the University of Pittsburgh and at UMass Amherst, as well as Associate Director of the National Center for Digital Government and Editor Emeritus of the Journal of Information Technology & Politics.

Tweet This Post

Posted in general | 1 Comment

TopMeta as Filter

Posted on December 22, 2011 by Stuart Shulman

Click on the First Frame below to view the video via Screencast.

Tweet This Post

Posted in general | Comments Off

Next DiscoverText Webinar

Posted on December 22, 2011 by Joseph Delfino

On Friday, January 13th, 2012, at 2:00 PM EST, the 2011 DiscoverText Webinar Series will begin. Kicking off the series will be an in-depth discussion of how to use DiscoverText for HR Analytics.

To sign-up for the webinar, click here.

Over the past year, Texifter has developed innovative HR Analytics methodologies using DiscoverText, which extract crucial information from employee engagement surveys. The webinar will display how these methodologies are put to use, and the type of information that has been extracted as a result. It will also include a brief overview of the software, a question and answer period, and discussion. Hosting the webinar will be Josh Sowalsky and Joe Delfino, both members of the business development and analysis team.

Running from January 13th, the Webinar series will continue to be held every 2 weeks on Fridays. Webinars will last approximately 60-minutes. In the future, the team will hold webinars on specific aspects of DiscoverText, such as social media analysis, brand analysis, public opinion monitoring, and general overviews of the software. Coming soon will be a full schedule of dates and topics. Until then, mark your calendars for January 13th at 2PM EST. If you have any questions, please contact me at joe@discovertext.com.

To sign-up for the webinar, click here.

Tweet This Post

Posted in general, webinars | Tagged DiscoverText, HR, Text Analysis, Text Analytics, Webinar | Comments Off

What is DiscoverText?

Posted on December 20, 2011 by Stuart Shulman

A 9-minute high-level overview of DiscoverText.

Tweet This Post

Posted in DiscoverText, product | Tagged analytics, API, crowdsourcing, Data Mining, DiscoverText, Facebook, GNIP, Machine Learning, Social Media, social media monitoring, twitter, Twitter API, Twitter Mining, Video | 1 Comment

Free Text Analytics Workshop @UBC 1.19.12

Posted on December 20, 2011 by Stuart Shulman

Unlock the Power of Text
Open Lecture and Free Software Training on Text and Social Media Analysis

Date: Thursday January 19, 2011
Time: 9:00 AM – 12:00 Noon
Location: Irving K. Barber Centre, School of Library, Archival and Information Studies, Room 458 (4th Floor), University of British Columbia

Presentation: “Measuring Validity in Annotation”
Tools for reviewing, coding, and retrieving text found in qualitative data analysis packages carry with them no particular attributes for ensuring the reliability or validity of the recorded observations. Based on more than 10 years of multidisciplinary experience doing qualitative research, this presentation guides researchers through all aspects of coder validity and reliability. This presentation demonstrates how text mining and analytic tools can be used to focus attention on difficult to code concepts and themes, which in many cases, constitute the most interesting terrain for investigation.

Training: DiscoverText Software
A PhD-holding Political Scientist, Stu knows the importance of easy to use, powerful, text analytic software. As founder of a technology start up (http://texifter.com) and the QDAP labs (http://www.umass.edu/qdap), Stu’s work advances text mining and natural language processing research. His software demonstrations link these worlds via straightforward and easy to understand explanations of software features that can be tailored for all experience levels and industries. For the training, participants may bring their own laptops or use the desktop machines in the lab to connect to the UBC wireless network and learn to use DiscoverText (http://discovertext.com).

Dr. Stuart Shulman (PhD U Oregon 1999) is founder & CEO of Texifter, LLC and an Assistant Professor of Political Science at the University of Massachusetts Amherst. He is the founding Director of the Qualitative Data Analysis Program (QDAP) at the University of Pittsburgh and at UMass Amherst, as well as Associate Director of the National Center for Digital Government. Dr. Shulman is Editor Emeritus of the Journal of Information Technology & Politics, which is the official journal of Information Technology & Politics section of the American Political Science Association.

All interested participants are requested to reserve a spot by sending an RSVP to luanne.freund@ubc.ca, including your name and academic unit.

Tweet This Post

Posted in general | Tagged Active Learning, DiscoverText, Machine Learning Loop, NLP, Social Media Analysis, Software Training, Stu Shulman, Texifter, Text Analytics, UBC | 2 Comments

Upload SurveyMonkey Output

Posted on December 5, 2011 by Joseph Delfino

For the past couple of months the Texifter blog has featured many posts about the power of DiscoverText to harvest Twitter data and make sense of the noisy and often confusing data. This process has been amplified by our recent GNIP Firehose beta, which allows users to leverage even more metadata and Tweets. Just a few months ago we wrote about our use of crowd sourcing to find the funniest bin Laden Tweet, and recently we have been monitoring holiday shopping through Tweets. With all the commotion around Twitter data, it is easy to forget traditional data forms, such as surveys, which hold crucial information for many individuals and organizations.

Everyday, thousands of free surveys are administered using the popular platform SurveyMonkey. We often encounter folks who analyze the open ended answers by simply reading through posts and looking at word clouds. This work can be time consuming and does not always produce the best results. Using DiscoverText, it is fast and easy to upload a .csv (Comma Separated Values) or .xls (Microsoft Excel) file. By doing so, you can start using a state-of-the-art text tool in minutes. Regardless of size and scope, you can try uploading your survey data to DiscoverText and use our Professional text analytic toolkit for two weeks on a trial basis. If you have any questions about methodologies, or need help getting started, feel free to drop me a note at joe@discovertext.com.

Tweet This Post

Posted in general | Tagged Code Text, CSV Files, Data Mining, DiscoverText, Survey, SurveyMonkey, Texifter, Text Mining, Text Toolkit, Twitter Mining, Upload | 1 Comment

Black Friday Divide

Posted on November 28, 2011 by Joseph Delfino

Even as the crowds grew to record size, gunfire, and one epic fight over waffle irons, people still ventured out to their local shopping malls and opened their wallets this Black Friday. With overall spending numbers up over 6%, Black Friday 2011 can be hailed as the best ever. The money spent on discounted items could help a recent Rasmussen Poll, in which 62% of respondents claimed they would be spending less on gifts this year-meaning taking advantage of the sales could have been priority number one for consumers. The positive sales numbers come at a time of high tension, as consumer sentiment is near lows, and unemployment remains high at 9%. Metrics aside, it was a bang-up day for retailers, and a great day for my “Black Friday” archive, which harvested over 35,000 Tweets, adding to the already massive 250,000 Tweet archive.

Taking a sample of 2,000 Tweets from Black Friday, I wanted to find out what people were saying about the most anticipated shopping day of the year. While the majority of Tweets referenced the shopping deals, there was an interesting battle which loomed below the surface. In the Twittersphere, there is a clear divide between those who chose to shop, and those who did not. People who did not venture out seem to be nearly as vocal as those who did. In many ways, it begins to look much like a class battle when one looks deeper into the Tweets. Many of the jokes took a hostile jab at people willing to stand in the long lines, wake up early, and bare the crowds. There are a fair amount of Tweets which directly reference shoppers as “crazy” and “insane.” Wal-Mart and Target are singled out as low end stores, just discounting there “mediocre” goods. One Tweet even went so far as to say, “Black Friday is the Special Olympics of Capitalism.”

Regardless of the divide, the numbers are strong. Nobody can deny this year was a success for retailers, who in one day handled 54 billion dollars worth of transactions. However, there are still 4 busy shopping weeks until Christmas. Check back to the Texifter Blog next week to see if the positive sentiment and willing shoppers are still eagerly out spending money.

Tweet This Post

Posted in general | Tagged big data, Black Friday, DiscoverText, Machine Learning, Market Research, Marketing Analysis, Retail Analysis, Texifter, Twitter API, Twitter Mining | 1 Comment

Get the Twitter Fat Pipe

Tracking FEMA Emergency Hashtags

GNIP Twitter Firehose: beta v2

Archiving Google+

DiscoverText Tutorial Videos

Stanford Lecture and Demo

TopMeta as Filter

Next DiscoverText Webinar

Upload SurveyMonkey Output

Black Friday Divide

Contact Us

Texifter Links

Recent Posts

Archives

Blogroll

Meta

@texifter Twitter Updates