Texifter Releasing AC/DC-Inspired Shirts at Big Data TechCon

As a part of exhibiting at Big Data TechCon next week, Texifter will be releasing an AC/DC-inspired t-shirt. Stop by our booth (#302) to get yours while they last!

Posted in general | Tagged , , | Leave a comment

Texifter Social Data and Tools: August Prize Winners

As a part of getting new users to test our sifter beta, every month this summer we are awarding 12 #datagrants to academics. These prizes shave thousands of dollars of costs off of your research. The August social data and tools prize winners were:

Kelli S. Burns, Ph.D.
University of South Florida School of Mass Communications
I will look at the #icebucketchallenge during a particularly active time in the campaign (mid-August 2014) when several celebrities were creating a lot of attention for their videos. I plan to explore the celebrity impact on tweets as well as specific mentions of ALS in tweets about the campaign. I am also interested in conversation themes related to the campaign and how other organizations hijacked the hashtag for their own gain.”  @KelliSBurns

Kathleen PJ Brennan
PhD Candidate at the University of Hawai’i at Manoa Political Science
I hope to use my data and software prize to study the influence of internet memes on political interest and awareness.  This particular analysis will form part of a dissertation chapter on internet memes, which examines such memes as emergent agents in the overlaps of online and offline spaces.  This will be my first opportunity to incorporate such data into my dissertation, and I can’t wait to get started!”  @katiepbrennan 

Aminu Bello
Phd Research Student Marketing
To analyse data from social media   To find out the role of social media in  CRM  Data will be collected primarily from facebook and twitter pages” 

Ann Pegoraro
Laurentian University School of Sports Administration and I am the Director of the Institute for Sport Marketing, a research center at the university
I plan on using the Texifter Data and software to further my research work in social media use in sport. In particular, the historical data will be used by my colleagues and I to investigate how the use of Twitter by athletes, teams/organizations and fans has evolved over time.”  @SportMgmtProf

Susan Currie Sivek
Linfield College Mass Communication
I will use the prize to continue to study the relationship between journalism and social media. I am especially interested in how magazines use these media to connect to their audiences.”  @profsivek

Dimitrinka Atanasova
Research Associate (CascEff) and PhD student Media and Communication, University of Leicester
I plan to study information sharing about obesity, specifically I hope to identify the sources behind the web links that are shared most. For my recently submitted PhD I analysed obesity-related news articles from selected online newspapers, and while it can be expected that content from these should be among the most shared, I would like to see what other information sources are read/shared.”  @dbatanasova

Hassan Zamir
University of South Carolina School of Library and Information Science
The Texifter data prize will be primarily used as the data for writing my dissertation which focuses on how and what citizens and expatriates of Bangladesh reported about the Shahbag Movement during 2013 in Twitter. A content analysis of these tweets will be helpful to get an insight about the protest, it’s primary issues, protesters, and their concerns. The data will be useful for understanding how social media tools like Twitter increases democracy, civic engagement, and social empowerment. A potential outcome of this research will be designing a computer supported tool for better understanding worldwide social movements and mitigate the social crisis issues quickly.”  @hassan_zamir

Jacob Groshek
Boston university Emerging media
I plan to look at how people use social media in a smoking cessation program. Or follow other emergent social situations, like Ferguson or Gaza.”  @jgroshek

Yunkang Yang
University of Washington Department of Communication
I would use it to extract historical posts to study online discourse regarding a major public event in China in 2012, as well as the access to discover text to cleanse, code and visualize the data.   I hope to group those posts into categories to show the levels of contention in discourse and to reflect the role social media play in facilitating public debate.”  @yangyunkang

Will Frankenstein
Carnegie Mellon University Dept. Engineering & Public Policy / Center for Computational Analysis of Social and Organizational Systems
I will be using the data to explore how individuals communicate and discuss technological risk as expressed on social media. I will be focusing on discussions of nuclear proliferation. The prize is especially helpful for gauging and distinguishing the immediate social media response vs. the long-term response of major events related to nuclear materials, such as Fukushima and New START.” 

Micah Altman
MIT Libraries; Program on Information Science
“We will experiment with PowerTrack to pilot to integrate dynamic corrections to official statistics. We will experiment with DiscoverText to perform collaborative evaluation of transparency in government data and websites.” @drmaltman

 

Posted in general | Tagged , , , , , , , , , , , , , , | 1 Comment

Updates: Crowd Sourcing the FCC Open Internet Data

Texifter will post periodic updates here on the efforts to use DiscoverText to crowd source code some of the #NetNuetrality public comment received by the FCC in response to the Open Internet rulemaking effort. Everyone is invited to join in.

Our 1st project update:
http://www.screencast.com/t/wK4giTZP

Update #2 on coding the exploratory dataset:
http://www.screencast.com/t/Gj3xknItaT7

Update #3: Tips for Crowd Sourcing the FCC Data in DiscoverText
http://www.screencast.com/t/050jCmEE

There is an elastic search option:
http://elasticsearch.texifter.com
username: fcc
password: fcc

Check out this video on being a good coder:
https://vimeo.com/69834903

thanks,
The DiscoverText Team

Posted in general | Comments Off

Texifter Social Data and Tools: July Prize Winners

As a part of getting new users to test our sifter beta, every month this summer we are awarding 12 #datagrants to academics. All you need to do to be included in the August drawing is submit a valid historical Twitter estimate request using sifter and then send us your CV. These prizes shave thousands of dollars of costs off of your research. The July social data and tools prize winners were:

Enrique Castro Sanchez
Centre for Infection Prevention and Management at Imperial College London

“I am interested in exploring how antibiotics and antibiotic resistance are discussed in Twitter, focusing on opinion leaders driving particular perceptions. The data will allow me to explore collective Twitter responses to news and events related to antibiotics, in an effort to understand how best mobilise public opinion.” @castrocloud

Stephen Barnard
Department of Sociology at St. Lawrence University

“I plan to use the Texifter #datagrant and DiscoverText software package to extend my research on the significance of Twitter in American journalism. This may include collecting both real-time and historical tweets relating to major events in the journalistic field. Additionally, I am also hoping to use the Texifter/DiscoverText package as a grading tool, given that I often incorporate social media projects and Twitter discussion in my classes and have been searching for an efficient way to collect and grade them. This prize provides an ideal opportunity for me to experiment with new grading protocols.” @socsavvy

Gonzalo Bacigalupe
Counseling Psychology at the University of Massachusetts Boston

“Do ehealth, innovation in healthcare and technology, mhealth, and other forms of ehealth ideas, emerge associated to the question of health equity, social determinants of health, and overall with concerns about social justice”  @bacigalupe

James Reade
Department of Economics at the University of Reading

“As an economist I am interested in how economic agents interact with each other; in particular how networks (formally or informally – hence Twitter and other social networks) influence decision-making. I hope to use this data award to learn more about the ways in which decisions are impacted by the position somebody has within a network.” @jjreade

Zachary Steinert-Threlkeld
Political Science at the University of California – San Diego

“I am researching how individuals use Twitter to organize contentious action in authoritarian regimes. Because I have too many tweets to hand code, creating topic models is a core part of my research. Access to an Enterprise level DiscoverText account will prove invaluably productive.” @ZacharyST

Omar Jaafor
Department of Computer Science at Université Jean Monnet in Saint-Etienne

“I will be using the data for community detection and anomaly detection. I am building algorithms that allow for community and anomaly detection in networks using both the attributes of nodes (country, age, messages…) and relationships between nodes.” @lmhasher

Libby Hemphill
Communication and Information Studies at the Illinois Institute of Technology

“With Prof. Ed Lee in the IIT Chicago-Kent College of Law, I’m studying how to evaluate online protests and their achievements. We use the case study method to examine tweets related to protests of NSA surveillance. Our goal is to develop a set of metrics by which we can better evaluate the success of online protests and what they may achieve, particularly in protests whose objectives do not involve revolution or overthrow of the government. The results of the project will be useful for Internet activists, businesses, media, policymakers, and software programmers in designing, evaluating, or utilizing social media for political purposes.” @libbyh

Bill Warters
Department of Communication at Wayne State University

“I’m exploring social media commentary about the use of conflict resolution programming in schools, with a special focus on peer mediation. I’ve been gathering tweets related to peer mediation and find some interesting back-channel conversations going on that school staff probably are not aware of.” @bwarters

Nigel L. Williams
FestIM Research Project, School of Tourism at Bournemouth University

“My research examines Digital Engagement by stakeholders with Projects and Events. I’m especially interested in applying Social Network Analysis and Text Analysis to understand conversations on Social Media about Projects and Events. In the Project Domain, I will look at online narratives discussing Crossrail, a London transport project. For Events, I will apply the data and software to examine the impact of online narratives on a costal destination” @Org_PM

Meredith Clark
Journalism & Mass Communication at UNC-CH

“I will use the prize to extend my research into digital media use and connectivity among minorities.” @meredithclark

Stephen K Tagg
Marketing at Strathclyde Business School 

“To produce academic articles on dynamic modelling of sentiments in the Scottish Independence Referendum debate. This is in cooperation with a colleague in the school of government (Dr Mark Shephard). Techniques for the analysis of unstructured data in the R software environment will be used: qdap, tm and Austin.” @stephenktagg

Bill Wilkerson
Political Science at SUNY Oneonta 

“I am interested in learning about how the US Supreme Court is discussed on Twitter. What cases draw interest? What network patterns exist in this discussion? I hope that there is sufficient geo-location data to use this as part of the research as well.” @bill_wilkerson

Remember: All you need to do to be included in the July drawing is submit a valid historical Twitter estimate request using sifter and then send us your CV.

Posted in general | Tagged , , , , , , , , , , , , , , | Comments Off

Update #1: Sorting the FCC’s Open Internet Public Comments

This is the first update on Texifter’s effort to support the crowd source review of the FCC’s Open Internet (a.k.a. “Net Neutrality”) public comments. Help review and report on what the public has said about the future of the Internet.

If you are getting ready to code some public comments, please take a look at this introduction to the first experimental coding task.

Posted in general | Tagged , , , , , , , , , , | Comments Off

Open Data on Net Neutrality: Help Crowd Source Analysis of Comments to the FCC

Yesterday the FCC released the public comments on Net Neutrality. The FCC has asked the public to help make “visualizations” to help surface substantive comments and key themes. Quoting the FCC:

“We recognize that not everyone may have the requisite technical skills to build visualizations and analyze raw XML data. (Members of the public will, of course, still have the option of reviewing and searching the record via ECFS). However, we’re hoping that those who do have the technical know-how will develop and share these tools for the public to use.”

Texifter has the right tools to allow anyone not versed in raw XML data extraction to search and code this data, then export the results as a CSV file including the relevant metadata. We have loaded the data and started a project using DiscoverText, which was built specifically for crowd source public comment review by US federal agencies.

We invite you to join our collaborative, web-based effort to find substantive comments and visualize what the public said about Net Neutrality. You can work directly with me and others to crowd source the review of the non-duplicate comments, or you can conduct your own parallel project with the same data.

To get involved, sign up for the free trial DiscoverText account and please note in the comment box that you want to work with the FCC data.

You might be interested in these preliminary stats based on what we downloaded yesterday:

  • 446,667 items posted to the FCC web site
  • 300,172 items after de-duplication
  • The largest group of exact duplicates is 105,320 identical items that say:

“Net neutrality is the First Amendment of the Internet, the principle that Internet service providers (ISPs) treat all data equally. As an Internet user, net neutrality is vitally important to me. The FCC should use its Title II authority to protect it. Most Americans have only one choice for truly high speed Internet: their local cable company. This is a political failure, and it is an embarrassment. America deserves competition and choice. Without net neutrality, a bad situation gets even worse. These ISPs will now be able to manipulate our Internet experience by speeding up some services and slowing down others. That kills choice, diversity, and quality. It also causes tremendous economic harm. If ISPs can speed up favored services and slow others, new businesses will no longer be able to rely on a level playing field. When ISPs can slow your site and destroy your business at will, how can any startup attract investors? My friends, family, and I use the Internet for conversation and fun, but also for work and business. When you let ISPs mess with our Internet experience, you are attacking our social lives, our entertainment, and our economic well being. We won’tstand [sic] for it. ISPs are opposing Title II so that they can destroy the FCC’s net neutrality rules in court. This is the same trick they pulled last time. Please, let’s not be fooled again. Title II is the strong, legally sound way to enforce net neutrality. Use it.”

Posted in general | Tagged , , , , , , , , , | 1 Comment

The Latest Texifter Video Tutorials

To quickly learn the latest about gathering Twitter tweets for research, importing SurveyMonkey data, or using state of the art text analytics tools, check out our most recent Texifter tutorial videos. While they have a bit of a “home brew” flavor, we’ve been told they do help jump start the process of learning about exciting new tools for text.

Posted in DiscoverText, product, Social Media | Tagged , , , , , , , , , | Comments Off

Texifter Social Data & Tools June Prize Winners

As a part of getting new users to test our sifter beta, every month this summer we are awarding 12 #datagrants to academics. All you need to do to be included in the July drawing is submit a valid historical Twitter estimate request using sifter and then send us your CV. These prizes shave thousands of dollars of costs off of your research. The June social data and tools prize winners were:

Kelly Fincham
The Department of Journalism, Media Studies, and Public Relations at Hofstra University

“I will use the data and software prize to further my research and analysis of journalism practice on Twitter. My research agenda explores journalists’ evolving norms and practices on social media, specifically Twitter, in the U.S. and Ireland. This grant will help me to research and analyze this subject area  in more depth.” @kellyfincham

Martina Wengenmeir
PhD candidate in the  Media and Communication Department at the University of Canterbury, Christchurch, New Zealand

“I am hoping to use the data and software prize for my PhD research on the recovery and rebuild after the Christchurch earthquake of 2011. I am particularly interested in framing and sentiment of tweets and am hoping to compare a historical data set during disaster response and recovery to the conversation about the rebuilt of the city which is still ongoing today. I am hoping to study the differences and similarities of conversations on Twitter now and then.” @tinserella

Carmina Godoy
Postgraduate student in the Universidad Complutense de Madrid

“I will like to integrate the collected data (tweets) in my final essay in order to get my Masters degree. The subject of my essay is: racism online.” @CarminaGodoy

Warren Allen
Assistant Professor at the iSchool at Florida State University

“This award will be used to collect and analyze select data from the early group stages of the 2014 World Cup. Social media – including but not limited to Twitter – are increasingly integrated into traditional (TV, radio, print) media campaigns. At the 2014 World Cup, the hashtags #becausefootball and #becausefutbol were promoted throughout the televising of the games. Exploratory thematic analysis of these Tweets – enabled by Sifter and Discovertext – will describe how the use of these commercially-oriented hashtags are used in comparison to what we know about live event Twitter usage in the current body of research.” @warrensallen

Bryce Newell
PhD Student in the Information School at the University of Washington

“I plan to use the prize to capture and analyze online discussion and commentary about police use of automated license plate recognition (ALPR) systems and wearable cameras.  In particular, I hope to examine discussions related to the public disclosure of data generated by these systems under freedom of information laws.” @newmedialaw

Jae Eun Chung
Assistant Professor in the School of Communications at Howard University 

“This project will survey the current use of online social media by health organization for health campaign and analyze the reach and diffusion of campaign messages. Despite the ever growing number of online social media-based health campaigns, little work has been done to understand how interactive natures of online social media are used for public health promotion. For this project, Twitter data will be analyzed to enhance our understanding of how health organizations use social media for public health promotion and how such uses of online media platforms are received by the public.”

Abhay Gupta
Lecturer at Fairleigh Dickinson University

“I plan to use it to understand the dynamics of public opinion. In particular, I want to test various hypotheses on how major events (e.g. election wins, market crash, sports results) impact the sentiment and whether pre-event opinion analysis has any predictive power in explaining actual outcomes.”
@EmpForesights

Victor Barger
Assistant Professor of Marketing at the University of Wisconsin-Whitewater

“I am looking forward to using the Texifter data and software to investigate how consumers and brands communicate on social media. In particular, I’m interested in how language use affects consumer behavior in online contexts. Given the extent to which consumers have and are continuing to adopt social media, this research should have important implications for marketing practitioners.” @vabarger

Jamie Baxter
Assistant Professor in the Schulich School of Law at Dalhousie University

“I am studying the influence of social movements on changes in the law — specifically land law. I hope to use the prize to access Twitter data that can tell me about the relationships between movement actors, how they form their interests, and how these change over time.” @jrgbaxter

Stephen Jeffares
Professor in the School of Government and Society at the University of Birmingham

“I will use the software and data to continue my study of the lifecycle of policy initiatives. I used DiscoverText in my latest book Interpreting Hashtag Politics (Palgrave Macmillan, 2014). Historic Twitter data reveals the first mention of policies that enjoy several months of widespread attention before disappearing without trace. To understand why and how this occurs, I will continue use DiscoverText to de-duplicate the data and develop thematic code sets with a team of research assistants.” @SRJeffares

Cristian Vaccari
Lecturer in Politics at the Royal Holloway University of London

“I am planning on using the data and software to analyze how politically motivated users of social media engage with mediated political events, such as televised leader debates and high-profile interviews, to better understand the interplay between television and social media in the flow of political messages.” 25lettori

Bill D. Herman

Remember: All you need to do to be included in the July drawing is submit a valid historical Twitter estimate request using sifter and then send us your CV.

Posted in Disqus, research, Social Media, Texifter, Tumblr, Twitter, WordPress | Tagged , , , , , , , , , , , , , , , , | 1 Comment

Texifter is Plugged In to Gnip

We could not be happier to announce that Texifter, a developer of advanced text data analytics software, is partnering with Gnip, the world’s largest provider of social data. Our Plugged In to Gnip partnership certifies Texifter as an industry leader committed to building innovative analytics solutions on top of reliable, sustainable, and complete social data. In joining Gnip’s partner program, Texifter joins the list of leading analytics providers like Microsoft, Salesforce, and Adobe.

“The Plugged In program was created to really highlight the companies that are doing the most innovative things in social data,” according to Chris Moody, CEO of Gnip, “and Texifter is a great example of that.”

Texifter’s DiscoverText platform provides advanced data analytics solutions for social researchers in public and private institutions.  Combining powerful tools with accessible interfaces, DiscoverText provides “five pillars of text analytics” – search, filtering, clustering, human-coding, and machine-learning.

By partnering with Gnip, Texifter has access to historical Twitter data. Texifter recently launched “Sifter”, a tool to help users estimate Twitter volume associated with historical searches. The Sifter product gives users a free estimate of Twitter volume over a specific date range using advanced Gnip PowerTrack filtering. Customers who license historical Twitter data from Gnip can then access it for text analytics via a 30-day trial of DiscoverText.

“Texifter welcomes this opportunity to work even more closely with a company that we have admired and worked with for years,” said Stu Shulman, CEO of Texifter. “Gnip is an exceptionally reliable provider of social data products and services. Texifter customers will continue to see more benefit as we work with Gnip to deliver high quality products and services.”


Posted in DiscoverText, GNIP, Social Media, Texifter | Tagged , , , , , , , , , , | Comments Off

Texifter Announces Strategic Partnership with SurveyMonkey

Texifter Announces Strategic Partnership with SurveyMonkey
to Improve Survey Data Analytics

Combining the power and versatility of Texifter’s DiscoverText analytics with the reach of the world’s largest survey website.

AMHERST, MA., May 27 2014—Texifter, a developer of social data and text analysis tools, today announced a new strategic partnership with SurveyMonkey, the world’s largest survey website, to provide advanced text analytics capabilities to SurveyMonkey users through its cloud-based platform, DiscoverText.

SurveyMonkey is known for intuitive interfaces and communications features that allow researchers to collect millions of survey responses every day. When surveys produce very large numbers of responses to open-ended questions, it can be a challenge to analyze all of the verbatim data. This is especially true for those relying on spreadsheet software as their primary text analytics tool.

DiscoverText provides an accessible “point and click” solution for these and other analytic challenges. Starting today, all DiscoverText users will be able to log in to SurveyMonkey to easily import existing survey data. Researchers can use a 30-day free trial to apply the full range of Discover Text’s powerful software tools to both the open ended answers and the structured survey metadata. Texifter’s “five pillars of text analytics” approach combines search, filtering, clustering, human-coding, and machine-learning.

Once registered on DiscoverText, newcomers have access to a wide spectrum of online data feeds. Facebook, Tumblr, YouTube, WordPress, Disqus, and Twitter data can be gathered, managed, and analyzed in DiscoverText alongside SurveyMonkey responses, email, and other forms of text data.

“The Texifter team is excited to be introducing SurveyMonkey users to the powerful and flexible text analytics tools in DiscoverText,” said founder and CEO Stuart Shulman. “We are confident that once people try out features like clustering and custom machine-learning, they’ll begin to see new possibilities for generating insights from bigger and more diverse collections of unstructured free text.”

This strategic partnership signals the latest phase in the evolution of DiscoverText. Originally built for federal agencies sorting large-scale public comment collections, the four-year old collaborative research platform now serves a wide variety of public and private sector clients, as well as the academic research community.

Texifter is a spin-out company based on information science research by Dr. Stuart W. Shulman, who directed the development of numerous human language tools for reviewing large numbers of public comments.

Texifter Contact
Stuart Shulman
http://texifter.com
stu@texifter.com

Posted in DiscoverText, product | Tagged , , , , , , , , | Comments Off

School Bullying Research Using DiscoverText

Our Vanderbilt University team uses DiscoverText (DT) to support qualitative text analysis of 8,531 high school students’ responses about their in-school experiences of bullying. DiscoverText has offered us powerful ways to perform key steps throughout our coding process. Fundamentally, DT supports parsing our large data set into archives, buckets, and datasets. Thus, we are able to focus on key portions of our large data set to hone our initial hierarchical coding structure while retaining the ability to return to an untouched dataset for final coding. We use the diverse annotation tools in DiscoverText to mark singular problematic items for discussion at meetings. Our team was able to develop a complex coding structure with 58 codes (at one point we had 128), and begin coding in a month and a half. Undoubtedly, DiscoverText’s robust organizational and annotation tools, within an easy-to-use user interface, supported expediency.

Following the development of our coding structure we employed DiscoverText’s analytic tools to better understand and improve our team’s inter-coder reliability. DT’s real-time coding analytics supports decision-making in meetings. Through the use of these tools, we raised our coding reliability from a .2 Kappa value to a .82 Kappa value after five training rounds. Given that four coders are using 58 hierarchical codes to code over 8,000 free-response items, the numbers represent a phenomenal increase in reliability. Presently, we are half way through coding the 8,531 items using overlapping coding patterns to ensure reliability. Out team members share their experiences below:

“I am currently working with a research team that must code students’ responses about their bullying experiences. I had never coded before and was introduced to DiscoverText only a few months ago. Fortunately, I have found DiscoverText to be very user-friendly and easy to navigate. Despite my lack of formal coding experience, I have found the program to run smoothly and have already learned a great deal in such a short period of time. My favorite feature thus far would have to be the code-by-code comparisons. This allows us to discuss any discrepancies among the research team and to increase our reliability. I have enjoyed exploring the features of this program and look forward to discovering what more it can do.” – Abbie, undergraduate, Human and Organizational Development, honors track.

“My team is using DiscoverText to code thousands of brief responses to a survey question about bullying. As someone who is new to qualitative research and coding programs, I have found DiscoverText easy to use. The coding process was very easy for me to learn, and I quickly became efficient at coding responses. Our initial looks at code comparisons have been fairly straightforward for me to figure out as well. As we move forward with more analysis, I anticipate other functions and features of DiscoverText will be similarly straightforward, and I will see more of the power of the program.” - Brian, master’s student, Human Development Counseling.

“I’m working with DiscoverText as part of an academic research team analyzing high school students’ qualitative responses to questions about bullying. As we have been coding responses, we have found the coding process fairly smooth, although not without a few features that we would have done differently. Still, the process of coding is similar to that of other qualitative coding software (I’ve used NVivo). We haven’t yet gotten into any sophisticated filtering or analysis, but I’m expecting that it will be really useful. The biggest impression I’m left with after my three months of using DiscoverText is that it’s a powerful tool, and we’ve only scratched the surface of what it can do.” - Ben, doctoral student, Community Research and Action.

Overall, DiscoverText enabled our team’s timely progress through a complex research process. Following coding, we intend to make use of DT’s meta data “tagging” capabilities such that we can meaningfully export coded response summaries to their “tagged” respective schools. Finally, we intend to continue to explore the useful capabilities of DT in our research. We find DiscoverText easy-to-use and helpful – our questions have been kindly answered by the Textifier support team or solved through processing the helpful support material on DT’s support site!

Thanks a lot DiscoverText!
Joseph H. Gardella

Posted in Coding, research | Tagged , , , , , , , , , | 1 Comment

Five Pillars of Text Analytics

Document relevance is a key challenge for social media research. The specific problem of “word sense disambiguation” is widespread. If I am interested in “banks” where money is stored, I want to exclude mentions of river banks. If I am “Delta” airlines, I do not want to see social data about Delta faucets, Delta force, or those pesky river deltas. If I run a sports team like the Pittsburgh Penguins, the massive numbers of Facebook posts and Tweets about flightless but adorable birds are equally problematic. There are very few social media analytics projects that can easily avoid the challenge of sorting relevant and irrelevant documents.

At Texifter, we have refined a powerful set of tools and techniques for doing word sense disambiguation. This 5-minute video uses the example of Governor Chris Christie to illustrate how the five pillars of text analytics can help anyone to identify and remove irrelevant documents from an ambiguous social data collection. The principles are very similar to spam filtering in email; we use the same mathematics. Using DiscoverText, we argue an individual or small collaborative team can create a custom machine classifier for the task in just a few hours. Someday, we hope to get this down to a few minutes.

Posted in DiscoverText, general, product, research, Social Media | Tagged , , , , , , , | 2 Comments

Big Data TechCon

 

Posted in general | Comments Off