During my recent visit to the Digital Methods Initiative (DMI) summer school, hosted by my good friend Richard Rogers, I had the pleasure of spending two days teaching and working with 35 exceptionally bright students who were new to the tools and techniques that are part of DiscoverText.
They were an excellent group, highly motivated and digitally fluent. As part of the class, students put forward project ideas and formed small teams to hack out a solution to some research problem. Many of these ideas involved scraping content off Facebook via the Graph API. I watched eagerly as teams of students furiously tested out many of the “shiny new toy” functionalities they found in DiscoverText. Very quickly, they helped to articulate some of the key mysteries of the permissions managed via the social Graph.
Some data collection trends were immediately raised. For example:
- Why is there is a numerical discrepancy between what appears on the actual public Facebook pages and groups and what is delivered via the Graph?
- By what combination of criteria do different users get slightly (or vastly) different results for the same query?
- Why is there often a substantial gap between the number of items the API delivers and the number of items a user of DiscoverText actually gets in the downloaded archive?
As the experiments at the DMI continue, and users of DiscoverText all over the world start asking some of the same questions, we hope to better document here on the blog the precise way in which your credentials, and the settings of diverse Facebook users, impact the data collection made possible using the DiscoverText-Facebook API.