Texifter in MassChallenge Semifinal

Texifter moved into the MassChallenge semifinals earlier this week.  Fresh off pitching the River Valley Investors and LaunchPad Investors, this new pitch will be our third in less than a month.

MassChallenge was lauded by President Obama at the 2012 State of the Union address. It is a start-up accelerator located in downtown Boston and the first program allowing start-ups to compete for resources. If chosen as a finalist, start-ups receive free office space, world-class mentoring, media coverage, and a share of $1 million in cash awards.

The “challenge” started in January and drew 1,237 applicants from 35 countries and 36 states. The next round commences on Tuesday, May 15th. If you are interested, please take a look at our MassChallenge profile and give us a thumbs up.

Posted in general | Leave a comment

Sina Weibo Working Group

Texifter-Sponsored Sina Weibo Working Group

Contact josh@discovertext.com to join the working group. Free Enterprise software for active group members.

The Sina Weibo Working Group is off to a wonderful start. We invite other students of Chinese micro-blogging to join the group. All group members receive free, Enterprise individual DiscoverText licenses for active group participation.

Contact josh@discovertext.com for more information.

Posted in DiscoverText, research | Tagged , , , , | Leave a comment

Election Analytics Part I: Romney Trends in April

[In the coming  months leading up the U.S. presidential election, the Texifter blog will be featuring a multi-part series on various analytical insights into the election, which can be gained by utilizing a host of social media analytics (SMA) methods within DiscoverText. The following is the first post in this multi-part series.]

The month of April saw dramatic shifts in the landscape of the Republican presidential nomination. When the month began, Mitt Romney and Rick Santorum were still battling for support of the Republican base. And while many believed Rick Santorum’s ardent conservatism  would enable him to remain in the race for quite some time, he ultimately conceded on April 10th, giving way to Romney’s presumptive Republican nomination. Meanwhile, and in spite of their inevitable loss to the Romney establishment, Newt Gingrich and Ron Paul remained in the race through April – largely on principle.

Throughout April, millions of individuals utilized social media to express their feelings about this heated Republican primary race as well as about specific Republican candidates. So as April ultimately became the month during which Romney’s nomination solidified, the question on many minds is: will Conservatives, Independents, and undecideds coalesce around the moderate Romney? Social Media provides a wealth of data from which one might begin to answer that question (which will continue to be answered in subsequent blog posts regarding sentiments, topics, and natural language processing).
For this first experiment, I kept things simple: I decided to investigate which hashtags were most commonly and most perpetually associated with “Romney” on Twitter throughout the month.

To do this, I began by setting up a live feed from the Public Twitter API in DiscoverText on April 9th. (Therefore my dataset spans from mid-day on April 9th until 11:59pm on the 30th) Once my data collection was complete, I created individual “buckets” (or saved search results) – each containing a full day’s worth of “Romney” Tweets. Next, I utilized the software’s “TopMeta Discovery” function in order to view the most frequently occurring hashtags from each day in April. I then exported each list of hashtags with the number of times each occurred per day.

Each list of hashtags-by-day contained upwards of 2,000-4,000 unique hashtags. Since my interest – for this post – is to examine the volume of the most common and most consistent hashtags, I decided to only examine the top 25 hashtags from each day. Having now brought the total number of hashtags to analyze from over 50,000 to 525, I was now left with 93 unique hashtags. Of those 93 hashtags, only 13 of them were among the most frequently used (top 25) hashtags each day. The following charts demonstrates the proportional change in volume that each of these 13 hashtags experienced in April:

[Click on any graph to enlarge image.]

All 13 Hashtags:


A closer look at the top 5:

A closer look a the next 8:


Initial Observations
With even a quick glance at the five most consistent and most common hashtags (that co-occur in Romney tweets in April), it is easy to conclude that a pattern emerges in the daily volume of “GOP” and “Obama” hashtags. The volume of one rarely deviates from the other, whereas hashtags “Romney,” “P2,” and “TCOT” differ substantially in volume while deviate in a pattern.

The daily volume of the remaining eight hashtags – on the other hand – is rather sporadic. With a cursory glance, one can plainly see the dramatic range of both “withNEWT” and “RonPaul” hashtags. Perhaps due to the concession of Rick Santorum, “withNEWT” hashtags saw an early spike in volume on the 10th, only to plummet and remain low in volume until a sudden rise and fall in volume between the 20th and the 23rd. And while “withNEWT” saw a rapid fall in volume on the 22nd, the volume of “RonPaul” tweets rapidly inclined on that same day.

Hashtags only reveal so much insight. They may provide topic (and co-occurring topic) volume, but they provide little to no true context. Until the text of these Tweets is closely analyzed, one can only speculate as to why these rapid volume shifts took place and what meaning these labels truly hold. Keep a lookout for a follow-up post about how you can use DiscoverText to uncover more nuanced context with other text analysis methods. In the meantime, if you’re interested in getting started with the software or have any questions, you can email me at Josh@discovertext.com.

Note: All tweet volume here is proportionate to the sample of tweets provided by the Public Twitter API and do not reflect the true volume of Romney Tweets. The sample volume of these “Romney” tweets are shown below:

Posted in DiscoverText, research, Twitter | Tagged , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , | 1 Comment

Texifter @ #IHRIMconf

Most of the company came to Chicago this week to make our debut presenting and exhibiting at IHRIM, an international meeting for Human Resources professionals.

I smell business leads in my sleep, not that I sleep, I'm just sayin'.

We were pleasantly surprised to find we were the only text analytics firm making a play for the business. Our conversations with firms (large and small) taught us a great deal about the space and the needs of HR professionals. Plus, we had a wicked battle of nerf darts with the good folks @SPARC.

One of the big discoveries for us is that many firms, large and small, shy away from asking open-ended questions in the employee engagement surveys. The reason? They cannot imagine going through thousands or tens of thousands of responses by hand. Well, you should have seen their eyes go wide as we explained our work with Google, JetBlue and Marsh on precisely that sort of unstructured text data. Our team is bringing a rigorous and well-researched set of methods, as well as bleeding edge tools, into a part of corporate America that is currently trying to make do with one of the amazing software creations of the previous century: the spreadsheet!

#LongStoryShort – Though my university teaching career just ended, the 20 years of devotion to being an educator goes on. This is my new classroom and I love it.

What do I need to do to have you drive this software off the lot?

The team poses; Stu plays yoyo

Posted in DiscoverText | Tagged , , , , , , , , | Leave a comment

“You Don’t Say”

 I am looking forward to making a joint presentation with JetBlue employee Jeremy Kasle at the upcoming 2012 SIOP conference in San Diego, CA. Our panel also has experts from Dell & Google and we anticipate sharing best practices with a wide range of industry professionals. In addition, Texifter is rolling out our brand new People Analytics brochure at this event, and the following week as well at IHRIM.

This is an exciting time for the start-up. The DiscoverText development team keeps pumping out popular new features at the request of our clients and each week the tools and techniques become more powerful and easier to use.

Posted in DiscoverText, general | Tagged , , , , , , , | Leave a comment

As Syria Falls, Russia Rises…on Twitter

Bordering Turkey, Israel, Lebanon, Iraq, and Jordan, Syria has served as a linchpin of geopolitical stability in the Middle East for decades. In spite of its many years of single-party dictatorial rule, its track-record of Lebanese encroachment, and its continuous support of Israeli resistance organizations, Syria’s minority-led dictatorship has ensured that foreign political influence over the state (and, by extension, the region) remains largely static. By way of example, Syria has – for decades – given way to Iranian money and military supplies passing through the country, while also housing Russia’s sole Mediterranean naval base – a critical strategic position it has held since 1971.

The Arab uprisings that have taken place since the end of 2010 have rocked this dynamic of static influence. Syria, since its initial uprisings in January of 2011, has grown unstable. Now at the cusp of civil war, Russian and Iranian interests are gravely threatened by the prospect of a new regime and new foreign influencers – namely supported by the United States, Europe, Saudi Arabia and the Muslim Brotherhood.

In response, Russia has increased its military presence in the country and has gestured a growing alliance with the Bashar al-Assad regime.  A critical question to ask at this time is: How does the Arab world perceive this foreign influence? If this can be answered, one can more accurately predict the likelihood for an extended proxy war in Syria.

To comprehensively answer this question, one would likely have to gather a wide variety of data, reflecting all of the geopolitical realities of the region. Instead (and as a starting point) I have pinpointed and analyzed Arabic discussion of Russians on Twitter in hopes of gaining a greater appreciation of 1) how Arabs are discussing Russians, 2) the volume of Arabic social media about Russians that is focused upon Syrian intervention, and 3) the likelihood for Arab resistance to growing Russian influence.

For this experiment, I utilized DiscoverText – the commercial text analytics solution from Texifter - to capture nearly three months of Tweets containing the word “روسية” (“Russian” in Arabic). I then created a topic model (using natural language processing) to classify the content of those tweets according to their relevance to the Syrian uprisings as well as the presence of Islamist rhetoric.  The following classification charts demonstrate how these trends have shifted between February and April on Twitter.

First week of February:

 

 

 

 

 

 

First week of March:

 

First week of April:

 

Analysis: 

While Arabic social media witnessed a spike in Russian news and pop culture content in March, it should be noted that the proportion of “روسية” Tweets regarding Russian political and military action in Syria nearly returned to its February proportion (of 44%) in April at 41%. (Tangentially, one might also suggest that this fluctuation in “irrelevant” Russian content is indicative of a bourgeoning cultural exchange between Russia and the Arab world.)  In any case, Arabs on social media are plainly recognizing that Russians are a bourgeoning intervening force. One will also notice a slight drop in general statements about Syria (such as neutral statements regarding Syria’s historical relationship with Russia). This is not surprising considering that regional tensions have sharply risen since February and Russia’s political influence and military presence in the country have steadily increased.

Perhaps most intriguing is the drop in Islamist rhetoric displayed since February. Granted, religious-based calls for external fundraising and rebel rearmament can still be found, but what explains this decrease? This question gives rise to a host of theories. First, it is distinctly possible that as the Syrian regime grows desperate for control, the Islamist support for the “Free Syrian Army” and the “Syrian National Council” has grown quiet, driving its social media conversation deeper underground. Second, one might also suggest that this decrease in Islamist rhetoric is – in part – due to the recent influx of money and weaponry provided by Western nations as well as Saudi Arabia since March. In other words, the decrease in rebel desperation for weaponry has – perhaps – softened those Islamist cries for aid on social media.

Ultimately, the relationship between the decreasing Islamist rhetoric about Russians in the region and the increasing Russian intervention in Syria merits further study, as it will shed light on the probability of conflict between those two influences. Text Analytics can serve a critical purpose here, as sentiment analyses and further topic modeling will be essential for predicting the behavior of states and non-state actors in a time of great geopolitical change in Syria and the region.


Research for this project was contributed by Lina Shaikhouni. She can be reached at linashaikhouni@gmail.com.

Posted in general | Tagged , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , | 2 Comments

The DiscoverText API

Many have asked for it, and your requests have not fallen on deaf ears. We are proud to announce the DiscoverText development team is testing an Application Programming Interface (API) to our internal “sifter” data structures. The new DiscoverText API will allow authorized DiscoverText users to access their data and perform DiscoverText functions from third party client applications.

 The DiscoverText API uses standard JSON for all object output and conforms to the oAuth 1.0 standard for authentication to ensure your data is safe. More information can be found on the DiscoverText API Documentation website.

Still in its early alpha phase, the DiscoverText API offers access to projects, archives, buckets and datasets. Functionality is available for obtaining an archive’s de-duplicated units, near-duplicate clustering, as well as machine classification scores.

If you need functionality that is not currently included in the API, please let us know. At the moment, we are currently only offering the API to DiscoverText enterprise customers. If you are interested, please contact us.

Posted in API, DiscoverText | Tagged , , , , | Leave a comment

Machine Translate Archives

I’ve been experimenting with our machine translation beta and the new Sina Weibo feed. For example, I have a small archive responsive to the search 柳时元. (Note: you can click on either image to see a full size close up view.)

After running the Microsoft-enabled machine translation, this is what I saw:

If you would like to join the machine translation beta test group, please let me know. Just send a short email to stu@texifter.com. We can use the help of experts in a variety of languages as well as non-fluent students of online engagement who would like to sample content machine translated from a variety of social media in various languages.

Posted in DiscoverText, product | Tagged , , , , | Leave a comment

Tracking Anti-Smoking Campaign Effects Online

The use of social media has grown exponentially over the last several years.  In fact, most television programs and televised advertising have a social media component, designed to expand reach and engagement with the audience.   To date, the tobacco control community has relied on traditional media—paid television, radio, billboard and print media advertising—to promote their messages.

On March 19, 2012, the Centers for Disease Control and Prevention (CDC) launched Tips from Former Smokers.  This campaign was the CDC’s largest anti-smoking campaign ever and its first national advertising effort.  The campaign will last four months and consist of both traditional and social media. The Health Media Collaboratory at the University of Illinois at Chicago, directed by Sherry Emery, PhD, will measure and evaluate a key social media component of the campaign—its Twitter reach and impact.

Using DiscoverText with GNIP’s Power Track  provides full access to Twitter’s Firehose. This is in contrast to Twitter’s publicly available API stream, which provides only a 1% sample of tweets.  Because the volume of tweets for health social media campaigns are relatively low, every tweet matters.   Access to GNIP’s premium Twitter feed allows us to capture all tweets and metadata for the campaign.

The use of DiscoverText to sift through tweets and code for content provides a useful tool for measuring online public engagement, audience sentiment, and campaign discourse.

The Collaboratory will report on the overall reach and audience engagement of the campaign through an analysis of unique users reached, number of retweets, and mentions.  This information will not only track the engagement of individual users but also measure the engagement of state tobacco control programs in the campaign.  A sentiment analysis will be conducted on tweets to gauge the emotional valence of the campaign and individual television ads.  Finally, using root keywords for quitting and smoking uptake, the numbers of twitter users that express interest in quitting or prevention will be reported.

For more information about this project, visit http://go.uic.edu/HealthMediaCollab or follow @GLENszczypka for updates. Research funded by the National Cancer Institute (Grant No. 1U01CA154254).

Posted in DiscoverText, general, GNIP, product, Twitter | Tagged , , , , , , | Leave a comment

Browsing “Disqus” Clusters

We are experimenting with the new “Disqus” API in house. In this video, I show results from a 5-day query against the public API for the term Syria. The 50,000+ results are drawn from a variety of mass media discussion boards, including Al Jazeera, Fox News & CNN. When the results are de-duplicated and machine clustered, an interesting pattern of form-letter commenting is revealed. Further research is needed to see if this is spontaneous or coordinated. Either way, it is a novel parallel to mass e-mail campaigns.

Posted in API, DiscoverText, Disqus, product | Tagged , , , , , , , , | 6 Comments

Decoding the Hunger Games

Essentially, everybody is trying to tell a story these days. Unfortunately, what we’ve been finding is that the stories are getting increasingly prescribed. That’s true for media and brands. There’s a confidence that gets lost when the question of who owns the narrative comes up.

So, when we saw that the Hunger Games series was being made into a film, we decided it was the perfect property to track. Will the movie live up to the codes the book established? Will the die-hard readers find the movie believable? Will it bridge the gap to the movie-only goers? Will the movie-first viewers go back and read the books? What stories might evolve? What stories should evolve? Those answers lie in the surrounding cultures and looking at what real people are putting out there for us to find. That said, we set out on a mission to uncover the grammar and signs surrounding the Hunger Games before the movie released.

We started the process by using DiscoverText to source and analyze related social media. We filtered those findings through our Culture Mapping process.

We’ve found DiscoverText to be an engine that lets us get in and dig. It’s what you’re trained to do as a hunter of information. We don’t like walls. We don’t measure the loudest signals, only the most intriguing ones. Loud squelches the source behind the impact and you miss the kinetics of human nature.

You can read our initial Hunger Games hypothesis here (and below).



Posted in general | Tagged , , , , , , , , , , , , , , , , , , , , , , , , , , , | Leave a comment

Texifter, JetBlue & Google @SIOP

Prior to heading to IHRIM in Chicago at the beginning of May, Texifter will be making an appearance in San Diego at the 2012 Society of Industrial and Organizational Psychology (SIOP) Conference. SIOP is a division of the American Psychological Association and an Organizational Affiliate of the Association of Psychological Science. The San Diego event is the largest gathering of Industrial Psychologists in the country. The event over 4 days, will include an exhibition, educational talks, workshops, and even a 5K road race.

On Saturday, April 28th, at 10:30AM, Texifter CEO Dr. Stuart W. Shulman will make his debut at SIOP on a panel discussing innovative approaches to employee engagement survey analytics. Dr. Shulman will be speaking with members of the HR teams at Dell Computers, Google, and JetBlue Airways, all companies that have used DiscoverText for text analytics. Other members of the panel include professionals from CLC Genesee and CEB Valtera, both employee engagement survey providers. Specifically, the panel will speak about their experiences in harvesting, organizing, and deriving insight from qualitative data, and how qualitative data can be used in tandem with quantitative data points.

For information on getting together at SIOP or IHRIM, you may contact me, at joe@discovertext.com.

Posted in general | Tagged , , , , , , , , , , , | 1 Comment

Weibo Arrives @DiscoverText

After months of user anticipation, the Weibo API is now integrated into DiscoverText, our “do-it-yourself” text analytic solution for deriving insight from unstructured text.

With over 300 million users, Weibo is the largest micro-blogging website in China. Its integration into DiscoverText will empower social media analysts in the United States, China, and around the world. Fused with DiscoverText’s range of functions (incuding advanced metadata filtering, near-duplicate text detection, auto-translation, and machine classification) the Weibo integration opens the door to exciting new multi-lingual research possibilities. If you’re interested in test driving the Weibo API, email me at josh@discovertext.com and Texifter personnel can help get you started.

Posted in DiscoverText, general, product, research, Twitter | Tagged , , , , , , , , , , , , , , , , , , , , , , , , , , | 1 Comment