Tag Archives: DiscoverText

Dwindling Osama bin Laden Tweets and the RT Champs

The running count in my DiscoverText “bin Laden” project is ~4.5 million unsharable Tweets. Though we can’t share them, we can describe them. One of the interesting features of this dataset is the rapidly dwindling Tweet rate over the month … Continue reading

Posted in general | Tagged , , , , , , | 2 Comments

Connect Existing Facebook & DiscoverText Accounts

Many people have asked us “How to I import Facebook data if I have a regular DiscoverText Account?” – The short answer is that there is no way to pull in Facebook feeds within DiscoverText unless you register and login … Continue reading

Posted in DiscoverText, product | Tagged , , , , | 1 Comment

New DiscoverText Import Available: Congressional Bills Via GovTrack

Tonight we’ve added a new import ability to DiscoverText – for any user with a Professional or Enterprise license (as well as the 30-day free trial license), you can now directly import data on Federal Congressional bills. Thanks to the … Continue reading

Posted in DiscoverText, product | Tagged , , , , | 2 Comments

Coding Text – Part Three

Researchers interested in large text collections and their itinerant coders tend to muddle through with limited collaborative, cross-disciplinary resources upon which to draw. The generic criteria for high-quality codebook construction and effective coding are underdeveloped, even as the tools and … Continue reading

Posted in general | Tagged , , , , , , , , | 1 Comment

Coding Text – Part Two

In Part One of the Series “Coding Text the QDAP Way,” I wrote about the problem of idiosyncratic annotation and the lack of diverse, interesting and re-usable annotated data sets. Providing data for replication (when possible) is a requisite for … Continue reading

Posted in general | Tagged , , , , , , | Comments Off

The Return of Google Reader Feeds in DiscoverText

A number of months ago, Google Reader removed its direct ability to get an RSS feed of the feeds for your reader account. Due to this, we had to take the feed ingestion of Google Reader feeds offline inside of … Continue reading

Posted in DiscoverText, product | Tagged , , , , , , | Comments Off

Coding Text Using the “QDAP Method” – Part One

We did it! The free, open source, Web-based, university-hosted, FISMA-compliant “Coding Analysis Toolkit” CAT recorded its one millionth coding choice. Pretty much all the credit goes to Texifter CTO and chief CAT architect Mark Hoy who has put in many … Continue reading

Posted in general | Tagged , , , , , , , , | 1 Comment

CAT on the Brink of 1 Million Recorded Coding Choices

Texifter manages the Coding Analysis Toolkit (CAT), which is a free, open source, Web-based and FISMA-compliant system launched in the fall of 2007 and hosted by the University of Pittsburgh. CAT is the precursor to PCAT and DiscoverText. This is … Continue reading

Posted in general | Tagged , , , , , , , | Comments Off

Twitter and History March On

The Twistory saga continues. The tech news is full of stories these days about firms delivering social media brand management and public opinion results, all magically derived from massive tweet collections. Whether prudent or not, people want to know what … Continue reading

Posted in general | Tagged , , , , , , , | 1 Comment

Throttled, or Not?

I was watching TweetDeck rattle off reactions to the end of “Twistory” and the “Dustin of Twistory” when my son came in an asked: Why is your tweet stream being throttled? I had missed the message in small print, but … Continue reading

Posted in general | Tagged , , , , , , , | Comments Off

Data We Can Share

Since we had to take down our 1.2 million “Osama bin Laden” tweets, we substituted data we can share gathered from Facebook to satisfy the curiosity of researchers who don’t normally handle big data but might want to dip their … Continue reading

Posted in general | Tagged , , , , , , , | Comments Off

The First Draft of ‘Twistory’ Revisited – An Update on (Not) Sharing Twitter Collections

Researchers like new datasets. Many of us build tools and techniques that work nicely with existing data, but may perform poorly with “out-of-sample” datasets. The ability to generate new and interesting big datasets, especially ones that draw a crowd of … Continue reading

Posted in general | Tagged , , , , , , , | 2 Comments

1.2 Million “osama” & “bin laden” Tweets…and Counting

On the May 1, 2011 evening it was announced that Osama bin Laden had been killed, we started running repeated fetches against the Twitter API for the terms “osama” and “bin laden”. On May 3, we posted more than 1.2 … Continue reading

Posted in general | Tagged , , , , , , , | 6 Comments

Harvesting Twitter Tweets

Using the Twitter API, we have made it easy to archive massive numbers of Twitter tweets. This short video demonstrates just how easy we have made it for you to do research on Twitter. Tweet This Post

Posted in general | Tagged , , , , , , | Comments Off

Scraping Facebook

Using a core element of the Facebook architecture (the “Graph API”), we have enabled DiscoverText users to “Connect with Facebook” to register their new accounts. You can learn about this option and how to collect data off Facebook by watching … Continue reading

Posted in general | Tagged , , , , , , , | 1 Comment