Most of the company came to Chicago this week to make our debut presenting and exhibiting at IHRIM, an international meeting for Human Resources professionals.
I smell business leads in my sleep, not that I sleep, I'm just sayin'.
We were pleasantly surprised to find we were the only text analytics firm making a play for the business. Our conversations with firms (large and small) taught us a great deal about the space and the needs of HR professionals. Plus, we had a wicked battle of nerf darts with the good folks @SPARC.
One of the big discoveries for us is that many firms, large and small, shy away from asking open-ended questions in the employee engagement surveys. The reason? They cannot imagine going through thousands or tens of thousands of responses by hand. Well, you should have seen their eyes go wide as we explained our work with Google, JetBlue and Marsh on precisely that sort of unstructured text data. Our team is bringing a rigorous and well-researched set of methods, as well as bleeding edge tools, into a part of corporate America that is currently trying to make do with one of the amazing software creations of the previous century: the spreadsheet!
#LongStoryShort – Though my university teaching career just ended, the 20 years of devotion to being an educator goes on. This is my new classroom and I love it.
What do I need to do to have you drive this software off the lot?
Due to popular demand, the product development team at Texifter is proud to announce that “TopMeta” is now exportable! What does this mean, you might ask?
What is TopMeta?
When you import either your own data or live social media feeds into DiscoverText, that data often includes various “metadata,” providing a wealth of revealing information about the Tweet, Facebook post, public comment, or survey response you will be analyzing. “TopMeta Explorer” is the function in DiscoverText that allows you to view the number of most (or least) frequently occurring metadata items and filter your data according to that metadata. Considering the wealth of metadata that may be within your data, the ability to easily organize and filter such metadata may turn out to be the difference between substantive and inadequate research.
Metadata is Power
When might the organization of metadata come in handy, you may also ask? It’s easy to imagine the answer to this question when you consider the kinds of metadata you may collect from live feeds such as the public Twitter API or the GNIP PowerTrack. From those feeds alone, you may collect any of the following metadata (depending on your search method):
1) The time & date of a Tweet, 2) the account name of the tweet’s sender, 3) the real name of the tweet’s sender, 4) the “hashtags” in a tweet, 5) the account name(s) “mentioned” in a tweet, 6) the shortened URL in a tweet, 7) the expanded URL in a tweet, 8) a link to the tweet itself, 9) a direct link to the media in a tweet, 10) the geo-coordinates from which a tweet is sent, 11) the number of “followers” of a tweet’s sender, 12) the number of those “following” a tweet’s sender, 13) the date that a tweet sender’s account was created, 14) the city of the tweet sender, and 15) the “Klout” score of the tweet’s sender.
Exporting TopMeta
Until now, the “TopMeta Explorer” function has allowed users to easily sort this kind of metadata within DiscoverText.
As of this week, this metadata can now be exported as a .CSV file, empowering Enterprise DiscoverText users to more seamlessly utilize the capabilities of DiscoverText, in tandem with their other research tools. We’ll continue to keep you posted about exciting new developments in DiscoverText as they are launched. If you are interested in trying DiscoverText for yourself, sign-up at discovertext.com and email me at josh@discovertext.com. I’ll be happy to get you started.
I am looking forward to making a joint presentation with JetBlue employee Jeremy Kasle at the upcoming 2012 SIOP conference in San Diego, CA. Our panel also has experts from Dell & Google and we anticipate sharing best practices with a wide range of industry professionals. In addition, Texifter is rolling out our brand new People Analytics brochure at this event, and the following week as well at IHRIM.
This is an exciting time for the start-up. The DiscoverText development team keeps pumping out popular new features at the request of our clients and each week the tools and techniques become more powerful and easier to use.
Bordering Turkey, Israel, Lebanon, Iraq, and Jordan, Syria has served as a linchpin of geopolitical stability in the Middle East for decades. In spite of its many years of single-party dictatorial rule, its track-record of Lebanese encroachment, and its continuous support of Israeli resistance organizations, Syria’s minority-led dictatorship has ensured that foreign political influence over the state (and, by extension, the region) remains largely static. By way of example, Syria has – for decades – given way to Iranian money and military supplies passing through the country, while also housing Russia’s sole Mediterranean naval base – a critical strategic position it has held since 1971.
The Arab uprisings that have taken place since the end of 2010 have rocked this dynamic of static influence. Syria, since its initial uprisings in January of 2011, has grown unstable. Now at the cusp of civil war, Russian and Iranian interests are gravely threatened by the prospect of a new regime and new foreign influencers – namely supported by the United States, Europe, Saudi Arabia and the Muslim Brotherhood.
In response, Russia has increased its military presence in the country and has gestured a growing alliance with the Bashar al-Assad regime. A critical question to ask at this time is: How does the Arab world perceive this foreign influence? If this can be answered, one can more accurately predict the likelihood for an extended proxy war in Syria.
To comprehensively answer this question, one would likely have to gather a wide variety of data, reflecting all of the geopolitical realities of the region. Instead (and as a starting point) I have pinpointed and analyzed Arabic discussion of Russians on Twitterin hopes of gaining a greater appreciation of 1) how Arabs are discussing Russians, 2) the volume of Arabic social media about Russians that is focused upon Syrian intervention, and 3) the likelihood for Arab resistance to growing Russian influence.
For this experiment, I utilized DiscoverText – the commercial text analytics solution from Texifter - to capture nearly three months of Tweets containing the word “روسية” (“Russian” in Arabic). I then created a topic model (using natural language processing) to classify the content of those tweets according to their relevance to the Syrian uprisings as well as the presence of Islamist rhetoric. The following classification charts demonstrate how these trends have shifted between February and April on Twitter.
First week of February:
First week of March:
First week of April:
Analysis:
While Arabic social media witnessed a spike in Russian news and pop culture content in March, it should be noted that the proportion of “روسية” Tweets regarding Russian political and military action in Syria nearly returned to its February proportion (of 44%) in April at 41%. (Tangentially, one might also suggest that this fluctuation in “irrelevant” Russian content is indicative of a bourgeoning cultural exchange between Russia and the Arab world.) In any case, Arabs on social media are plainly recognizing that Russians are a bourgeoning intervening force. One will also notice a slight drop in general statements about Syria (such as neutral statements regarding Syria’s historical relationship with Russia). This is not surprising considering that regional tensions have sharply risen since February and Russia’s political influence and military presence in the country have steadily increased.
Perhaps most intriguing is the drop in Islamist rhetoric displayed since February. Granted, religious-based calls for external fundraising and rebel rearmament can still be found, but what explains this decrease? This question gives rise to a host of theories. First, it is distinctly possible that as the Syrian regime grows desperate for control, the Islamist support for the “Free Syrian Army” and the “Syrian National Council” has grown quiet, driving its social media conversation deeper underground. Second, one might also suggest that this decrease in Islamist rhetoric is – in part – due to the recent influx of money and weaponry provided by Western nations as well as Saudi Arabia since March. In other words, the decrease in rebel desperation for weaponry has – perhaps – softened those Islamist cries for aid on social media.
Ultimately, the relationship between the decreasing Islamist rhetoric about Russians in the region and the increasing Russian intervention in Syria merits further study, as it will shed light on the probability of conflict between those two influences. Text Analytics can serve a critical purpose here, as sentiment analyses and further topic modeling will be essential for predicting the behavior of states and non-state actors in a time of great geopolitical change in Syria and the region.
Many have asked for it, and your requests have not fallen on deaf ears. We are proud to announce the DiscoverText development team is testing an Application Programming Interface (API) to our internal “sifter” data structures. The new DiscoverText API will allow authorized DiscoverText users to access their data and perform DiscoverText functions from third party client applications.
The DiscoverText API uses standard JSON for all object output and conforms to the oAuth 1.0 standard for authentication to ensure your data is safe. More information can be found on the DiscoverText API Documentation website.
Still in its early alpha phase, the DiscoverText API offers access to projects, archives, buckets and datasets. Functionality is available for obtaining an archive’s de-duplicated units, near-duplicate clustering, as well as machine classification scores.
If you need functionality that is not currently included in the API, please let us know. At the moment, we are currently only offering the API to DiscoverText enterprise customers. If you are interested, please contact us.
I’ve been experimenting with our machine translation beta and the new Sina Weibo feed. For example, I have a small archive responsive to the search 柳时元. (Note: you can click on either image to see a full size close up view.)
After running the Microsoft-enabled machine translation, this is what I saw:
If you would like to join the machine translation beta test group, please let me know. Just send a short email to stu@texifter.com. We can use the help of experts in a variety of languages as well as non-fluent students of online engagement who would like to sample content machine translated from a variety of social media in various languages.
The use of social media has grown exponentially over the last several years. In fact, most television programs and televised advertising have a social media component, designed to expand reach and engagement with the audience. To date, the tobacco control community has relied on traditional media—paid television, radio, billboard and print media advertising—to promote their messages.
On March 19, 2012, the Centers for Disease Control and Prevention (CDC) launched Tips from Former Smokers. This campaign was the CDC’s largest anti-smoking campaign ever and its first national advertising effort. The campaign will last four months and consist of both traditional and social media. The Health Media Collaboratory at the University of Illinois at Chicago, directed by Sherry Emery, PhD, will measure and evaluate a key social media component of the campaign—its Twitter reach and impact.
Using DiscoverText with GNIP’s Power Track provides full access to Twitter’s Firehose. This is in contrast to Twitter’s publicly available API stream, which provides only a 1% sample of tweets. Because the volume of tweets for health social media campaigns are relatively low, every tweet matters. Access to GNIP’s premium Twitter feed allows us to capture all tweets and metadata for the campaign.
The use of DiscoverText to sift through tweets and code for content provides a useful tool for measuring online public engagement, audience sentiment, and campaign discourse.
The Collaboratory will report on the overall reach and audience engagement of the campaign through an analysis of unique users reached, number of retweets, and mentions. This information will not only track the engagement of individual users but also measure the engagement of state tobacco control programs in the campaign. A sentiment analysis will be conducted on tweets to gauge the emotional valence of the campaign and individual television ads. Finally, using root keywords for quitting and smoking uptake, the numbers of twitter users that express interest in quitting or prevention will be reported.
We are experimenting with the new “Disqus” API in house. In this video, I show results from a 5-day query against the public API for the term Syria. The 50,000+ results are drawn from a variety of mass media discussion boards, including Al Jazeera, Fox News & CNN. When the results are de-duplicated and machine clustered, an interesting pattern of form-letter commenting is revealed. Further research is needed to see if this is spontaneous or coordinated. Either way, it is a novel parallel to mass e-mail campaigns.
Essentially, everybody is trying to tell a story these days. Unfortunately, what we’ve been finding is that the stories are getting increasingly prescribed. That’s true for media and brands. There’s a confidence that gets lost when the question of who owns the narrative comes up.
So, when we saw that the Hunger Games series was being made into a film, we decided it was the perfect property to track. Will the movie live up to the codes the book established? Will the die-hard readers find the movie believable? Will it bridge the gap to the movie-only goers? Will the movie-first viewers go back and read the books? What stories might evolve? What stories should evolve? Those answers lie in the surrounding cultures and looking at what real people are putting out there for us to find. That said, we set out on a mission to uncover the grammar and signs surrounding the Hunger Games before the movie released.
We started the process by using DiscoverText to source and analyze related social media. We filtered those findings through our Culture Mapping process.
We’ve found DiscoverText to be an engine that lets us get in and dig. It’s what you’re trained to do as a hunter of information. We don’t like walls. We don’t measure the loudest signals, only the most intriguing ones. Loud squelches the source behind the impact and you miss the kinetics of human nature.
You can read our initial Hunger Games hypothesis here (and below).
On Saturday, April 28th, at 10:30AM, Texifter CEO Dr. Stuart W. Shulman will make his debut at SIOP on a panel discussing innovative approaches to employee engagement survey analytics. Dr. Shulman will be speaking with members of the HR teams at Dell Computers, Google, and JetBlue Airways, all companies that have used DiscoverText for text analytics. Other members of the panel include professionals from CLC Genesee and CEB Valtera, both employee engagement survey providers. Specifically, the panel will speak about their experiences in harvesting, organizing, and deriving insight from qualitative data, and how qualitative data can be used in tandem with quantitative data points.
For information on getting together at SIOP or IHRIM, you may contact me, at joe@discovertext.com.
After months of user anticipation, the Weibo API is now integrated into DiscoverText, our “do-it-yourself” text analytic solution for deriving insight from unstructured text.
With over 300 million users, Weibo is the largest micro-blogging website in China. Its integration into DiscoverText will empower social media analysts in the United States, China, and around the world. Fused with DiscoverText’s range of functions (incuding advanced metadata filtering, near-duplicate text detection, auto-translation, and machine classification) the Weibo integration opens the door to exciting new multi-lingual research possibilities. If you’re interested in test driving the Weibo API, email me at josh@discovertext.com and Texifter personnel can help get you started.
On May 1st and 2nd, Texifter personnel will be bringing DiscoverText to IHRIM in Chicago, Illinois. The event is the largest gathering of Human Resource (HR) professionals in the country. The meeting facilitates the exchange of new ideas and technologies for use in HR information management profession. In 2012, Texifter will continue building on successful HR analytics engagements with companies like JetBlue, Google, and Marsh.
Texifter will be hosting a booth in the exhibition hall (Number 915). We will be displaying our technology for employee engagement survey analysis. In addition to the exhibition, the 2012 conference offers over 50 educational sessions, open discussion forums, as well as networking events for professionals and vendors. Participating in the educational sessions will be Texifter founder and CEO, Dr. Stuart Shulman. In Educational Session #227, Stu will be presenting with Christina Harris from Marsh, Inc. Together, they will highlight the innovative ways Marsh has used text analytics for finding actionable insights in employee engagement surveys. Mark your calendars HR professionals, the session is at 4:00 PM on Tuesday, May 1st.
Also in attendance at IHRIM will be some of the market leaders in the HR technology space, including Oracle, Infosys, and PeopleSoft. The IHRIM conference is a great opportunity to learn more about DiscoverText. If you have any questions or you are interested in meeting at the conference, please contact me via email at joe@discovertext.com.
DiscoverText is pleased to announce it has added Disqus to its repertoire of APIs for harvesting content produced online. Disqus, the world’s largest comment platform operator, has over 70 million users who produce roughly 500,000 comments per day. There are currently 1.3 million websites which utilize the service, with just about 1,500 more websites registering for Disqus each day.
Disqus allows websites to take user comments to a new level, giving users the opportunity to interact directly with other users, post photos, and even comment on a mobile device. Having the ability to harvest comments from the Disqus platform opens new possibilities for DiscoverText users. Ever wonder what the conversation is about in response to controversial articles and news stories? CNN, Time Magazine, FoxNews, and Bloomberg are among the many news sites that power their discussion boards with Disqus. If politics is your thing, the Obama/Biden campaign is powering its forums with Disqus. If you are not as into politics as we are, there are millions of other sites, such asIGN, the Hollywood Reporter, and Engadget which use Disqus.
To start harvesting Disqus with DiscoverText, sign-up for a 14-day free trial at DiscovertText.com. If you require assistance, you can reach the Director of User Support via an email to josh@discovertext.com.
What do people “Disqus” about Syria?
I have a “Disqus” feed going on Syria. The “TopMeta” shows the distribution of online forum sources. Click on the image to see it full-size.
Much of this content is from Al Jazeera blogs, but there also many comments from CNN & Fox News sites.
Below is a visualization of the most frequently non-stop words based on an archive of 107,197 non-duplicate comments.
This visualization is produced by the "CloudExplorer" feature in DiscoverText. Click on it to see the full size version.
Tweet This Post