Tag Archives: Data Mining

Coding Text Using the “QDAP Method” – Part One

We did it! The free, open source, Web-based, university-hosted, FISMA-compliant “Coding Analysis Toolkit” CAT recorded its one millionth coding choice. Pretty much all the credit goes to Texifter CTO and chief CAT architect Mark Hoy who has put in many … Continue reading

Posted in general | Tagged , , , , , , , , | 1 Comment

CAT on the Brink of 1 Million Recorded Coding Choices

Texifter manages the Coding Analysis Toolkit (CAT), which is a free, open source, Web-based and FISMA-compliant system launched in the fall of 2007 and hosted by the University of Pittsburgh. CAT is the precursor to PCAT and DiscoverText. This is … Continue reading

Posted in general | Tagged , , , , , , , | Comments Off

Twitter and History March On

The Twistory saga continues. The tech news is full of stories these days about firms delivering social media brand management and public opinion results, all magically derived from massive tweet collections. Whether prudent or not, people want to know what … Continue reading

Posted in general | Tagged , , , , , , , | 1 Comment

Throttled, or Not?

I was watching TweetDeck rattle off reactions to the end of “Twistory” and the “Dustin of Twistory” when my son came in an asked: Why is your tweet stream being throttled? I had missed the message in small print, but … Continue reading

Posted in general | Tagged , , , , , , , | Comments Off

Data We Can Share

Since we had to take down our 1.2 million “Osama bin Laden” tweets, we substituted data we can share gathered from Facebook to satisfy the curiosity of researchers who don’t normally handle big data but might want to dip their … Continue reading

Posted in general | Tagged , , , , , , , | Comments Off

The First Draft of ‘Twistory’ Revisited – An Update on (Not) Sharing Twitter Collections

Researchers like new datasets. Many of us build tools and techniques that work nicely with existing data, but may perform poorly with “out-of-sample” datasets. The ability to generate new and interesting big datasets, especially ones that draw a crowd of … Continue reading

Posted in general | Tagged , , , , , , , | 2 Comments

1.2 Million “osama” & “bin laden” Tweets…and Counting

On the May 1, 2011 evening it was announced that Osama bin Laden had been killed, we started running repeated fetches against the Twitter API for the terms “osama” and “bin laden”. On May 3, we posted more than 1.2 … Continue reading

Posted in general | Tagged , , , , , , , | 6 Comments

Harvesting Twitter Tweets

Using the Twitter API, we have made it easy to archive massive numbers of Twitter tweets. This short video demonstrates just how easy we have made it for you to do research on Twitter. Tweet This Post

Posted in general | Tagged , , , , , , | Comments Off

Scraping Facebook

Using a core element of the Facebook architecture (the “Graph API”), we have enabled DiscoverText users to “Connect with Facebook” to register their new accounts. You can learn about this option and how to collect data off Facebook by watching … Continue reading

Posted in general | Tagged , , , , , , , | 1 Comment

Text Analysis during the 2011 State of the Union Address

As part of the underlying research Texifter is doing on sentiment and topic analysis, we collected data from various Twitter and Facebook feeds +/- 48 hours during the 2011 State of the Union address on Tuesday, January 25th, 2011. Texifter … Continue reading

Posted in general, research | Tagged , , , , , , , , , | 1 Comment