Using the CloudExplorer

Tag clouds have gained popularity as a short cut to discovery of frequently occurring terms in a text collection. Much of the inspiration comes from the remarkable Wordle site at http://www.wordle.net. Building on that inspiration, we engineered an interactive word cloud tool in DiscoverText that we call the CloudExplorer. This video is a short introduction to how it works.

Click here to see a 3-minute video on how to use the Cloud Explorer

Posted in DiscoverText, general | Tagged , , , , , , , , , , | Comments Off

Interactive Custom Machine Classifier Histograms

This is the latest DiscoverText filtering feature designed to speed up the creation of accurate custom machine classifiers. This video shows how we use an interactive display of classifier scores to isolate items in a dataset that require further human coding to improve the accuracy of the classifier. Click on the screenshot below to start the video.

Click here to learn how and why these interactive histograms are powerful tools.

Posted in DiscoverText, general, product | Tagged , , , , , , , , , , | Comments Off

Start-up Showcase

Joe and I had fun fielding questions, passing out literature and giving away DiscoverText t-shirts. Many thanks once again for the excellent facilities and services from our MassChallenge hosts. We met some leaders of cool start-ups, a number of interesting VCs and overall had a great night.

 

Posted in DiscoverText, general, Texifter | Tagged , , , | Comments Off

Fun with Text Analytics at Start-Up Showcase

Tomorrow (7/31), DiscoverText will be on display at the MassChallenge Start-Up
Showcase. The event, hosted by the New England Venture Capital Association will be held at One Marina Park from 6PM-9PM and will feature 90 of the MassChallenge Finalists.

On display at the showcase will be Liam’s custom machine text classifier optimized for identifying statements about the Pittsburgh Penguins, sales pitches by Joe, and of course DiscoverText & QDAP XL t-shirts from Stu, along with his latest insights on Text Analytics. Drop by for some fun with text analytics and maybe even a sweet treat!

Posted in general | Tagged , , , | Comments Off

Shout Out to Boston New Tech Meetup #BNT19

Stu and Aleksandra at the NERD center. There was approval for ladies-cut DiscoverText t-shirts

A big thanks goes out to the organizers of the Boston New Technology Meetup, held at the Microsoft NERD Center on Tuesday, July 25th (#BNT19). The Citigroup-sponsored event gathered a nice crowd of entrepreneurs, technologists, and start-up enthusiasts, all of whom enjoyed networking, Domino’s Pizza, fast-paced technology discussions, and free drinks thanks to Microsoft.

Texifter participated in the event, along with 5 other young companies, presenting new technology in front of a highly engaged Cambridge, MA crowd. Founder & CEO Stu Shulman taught the crowd a few things about human coding and machine classifiers, which prompted the question from the audience, “does the CIA know about this?” The DiscoverText brochures and postcards were a big hit, as were the “sold-out” free t-shirts.

Posted in DiscoverText, general, Texifter | Tagged , , , , , , , , , , | 1 Comment

Boston New Technology Meetup July 2012 #bnt19

Come learn about 6 innovative and exciting technology products and network with the Boston/Cambridge startup community! Each presenter gets 5 minutes for product demonstration and 5 minutes for Q&A. Looks like a full house for this Citibank-sponsored event, and @Texifter is excited to show the latest power tools for text.

 

Posted in general | Comments Off

Coding Off a List

We are always looking for ways to speed up the process of creating accurate custom machine classifiers. Coding off a list is one such method. This video demonstrates a simple approach that can be applied to a variety of text analytic problems.

Posted in DiscoverText, general | Tagged , , , , , , , | Comments Off

Texifter in 60 Seconds

This 60-second video introduces our start-up. Click on the image below to see the Screencast and let us know what you think.

1-minute overview of Texifter

This is the Texifter elevator pitch. Feedback welcome!

Posted in Texifter | Tagged , , , , , , , , | Comments Off

Texifter in MassChallenge Finals

Following a sublime second round pitch made by Stu on borrowed shoes and less than 5 hours of sleep, Texifter moved into the finals of the MassChallenge. This afternoon, MassChallenge revealed the 125 finalists, 46 of which are high-tech companies, after a journey of 4 months, beginning with a simple application in February. The competition, fresh off a mention by President Obama in the State of the Union, called for entrants in late January. The organization fielded applications from over 30 states and foreign countries, totaling more than 1300 start-ups.

The remaining start-ups will compete for 1 million dollars in funding and all will have the ability to take advantage of office space in the accelerator, located at One Marina Park on the Fan Pier in Boston, MA. The accelerator will commence at the end of June, with a week long boot camp for start-up team members. Throughout the summer, MassChallenge will engage start-ups through information sessions and expert mentors.

This development comes at a busy time for Texifter, which has been making rounds in the local angel scene, most recently coming off a successful second round meeting with LaunchPad Angel Group. Over the summer, Texifter will continue rapid development and outreach, growing the client list, speaking at conferences, and developing new features for the platform. If you have any questions about Texifter and DiscoverText, please feel free to reach me at joe@discovertext.

Posted in general | Comments Off

Analyzing Co-Occurrences

A common question we hear at Texifter is, “How can one derive co-occurrences in a slice of data?” DiscoverText makes this process relatively easy – when starting with one known variable. For instance, one might like to see which hashtags or links or keyword co-occur with other hashtags, links, and keywords in a given archive. In this case below, it is done with an archive of Romney tweets.

To derive this insight, DiscoverText users can use the “TopMeta Discovery” function to quickly “slice” a timeframe of their data into a bucket (a saved search result).

Once the “slice” is bucketed, users can perform the same TopMeta function across the hashtags (or other metadata fields) of that specific data.

Once the hashtag (or other metadata item) is selected, users can – once again – perform a quick filter, and bucket the data that matches a particular metadata item.

Finally, with the known variable isolated to a bucket, users can easily utilize functions such as TopMeta or the interactive cloud explorer in DiscoverText to pull out co-occurring links, hashtags, keywords, and other substantive insight and analyze the tweets that reference such data.

If you have any questions, feel free to email me at josh@discovertext.com.

Posted in DiscoverText, general, research | Tagged , , , , , , , , , , , , , , , , , , , , , | 1 Comment

Using Twitter Bios & GNIP Rules to Find Domain Specific Twitter Users and Important Tweets

Thanks Lee! Here is the video link:

Posted in DiscoverText, general, GNIP, product | Tagged , , , , , , , | 1 Comment

Texifter in MassChallenge Semifinal

Texifter moved into the MassChallenge semifinals earlier this week.  Fresh off pitching the River Valley Investors and LaunchPad Investors, this new pitch will be our third in less than a month.

MassChallenge was lauded by President Obama at the 2012 State of the Union address. It is a start-up accelerator located in downtown Boston and the first program allowing start-ups to compete for resources. If chosen as a finalist, start-ups receive free office space, world-class mentoring, media coverage, and a share of $1 million in cash awards.

The “challenge” started in January and drew 1,237 applicants from 35 countries and 36 states. The next round commences on Tuesday, May 15th. If you are interested, please take a look at our MassChallenge profile and give us a thumbs up.

Posted in general | Tagged , , , , , , , , , , , , | Comments Off

Sina Weibo Working Group

Texifter-Sponsored Sina Weibo Working Group

Contact josh@discovertext.com to join the working group. Free Enterprise software for active group members.

The Sina Weibo Working Group is off to a wonderful start. We invite other students of Chinese micro-blogging to join the group. All group members receive free, Enterprise individual DiscoverText licenses for active group participation.

Contact josh@discovertext.com for more information.

Posted in DiscoverText, research | Tagged , , , , | Comments Off

Election Analytics Part I: Romney Trends in April

[In the coming  months leading up the U.S. presidential election, the Texifter blog will be featuring a multi-part series on various analytical insights into the election, which can be gained by utilizing a host of social media analytics (SMA) methods within DiscoverText. The following is the first post in this multi-part series.]

The month of April saw dramatic shifts in the landscape of the Republican presidential nomination. When the month began, Mitt Romney and Rick Santorum were still battling for support of the Republican base. And while many believed Rick Santorum’s ardent conservatism  would enable him to remain in the race for quite some time, he ultimately conceded on April 10th, giving way to Romney’s presumptive Republican nomination. Meanwhile, and in spite of their inevitable loss to the Romney establishment, Newt Gingrich and Ron Paul remained in the race through April – largely on principle.

Throughout April, millions of individuals utilized social media to express their feelings about this heated Republican primary race as well as about specific Republican candidates. So as April ultimately became the month during which Romney’s nomination solidified, the question on many minds is: will Conservatives, Independents, and undecideds coalesce around the moderate Romney? Social Media provides a wealth of data from which one might begin to answer that question (which will continue to be answered in subsequent blog posts regarding sentiments, topics, and natural language processing).
For this first experiment, I kept things simple: I decided to investigate which hashtags were most commonly and most perpetually associated with “Romney” on Twitter throughout the month.

To do this, I began by setting up a live feed from the Public Twitter API in DiscoverText on April 9th. (Therefore my dataset spans from mid-day on April 9th until 11:59pm on the 30th) Once my data collection was complete, I created individual “buckets” (or saved search results) – each containing a full day’s worth of “Romney” Tweets. Next, I utilized the software’s “TopMeta Discovery” function in order to view the most frequently occurring hashtags from each day in April. I then exported each list of hashtags with the number of times each occurred per day.

Each list of hashtags-by-day contained upwards of 2,000-4,000 unique hashtags. Since my interest – for this post – is to examine the volume of the most common and most consistent hashtags, I decided to only examine the top 25 hashtags from each day. Having now brought the total number of hashtags to analyze from over 50,000 to 525, I was now left with 93 unique hashtags. Of those 93 hashtags, only 13 of them were among the most frequently used (top 25) hashtags each day. The following charts demonstrates the proportional change in volume that each of these 13 hashtags experienced in April:

[Click on any graph to enlarge image.]

All 13 Hashtags:


A closer look at the top 5:

A closer look a the next 8:


Initial Observations
With even a quick glance at the five most consistent and most common hashtags (that co-occur in Romney tweets in April), it is easy to conclude that a pattern emerges in the daily volume of “GOP” and “Obama” hashtags. The volume of one rarely deviates from the other, whereas hashtags “Romney,” “P2,” and “TCOT” differ substantially in volume while deviate in a pattern.

The daily volume of the remaining eight hashtags – on the other hand – is rather sporadic. With a cursory glance, one can plainly see the dramatic range of both “withNEWT” and “RonPaul” hashtags. Perhaps due to the concession of Rick Santorum, “withNEWT” hashtags saw an early spike in volume on the 10th, only to plummet and remain low in volume until a sudden rise and fall in volume between the 20th and the 23rd. And while “withNEWT” saw a rapid fall in volume on the 22nd, the volume of “RonPaul” tweets rapidly inclined on that same day.

Hashtags only reveal so much insight. They may provide topic (and co-occurring topic) volume, but they provide little to no true context. Until the text of these Tweets is closely analyzed, one can only speculate as to why these rapid volume shifts took place and what meaning these labels truly hold. Keep a lookout for a follow-up post about how you can use DiscoverText to uncover more nuanced context with other text analysis methods. In the meantime, if you’re interested in getting started with the software or have any questions, you can email me at Josh@discovertext.com.

Note: All tweet volume here is proportionate to the sample of tweets provided by the Public Twitter API and do not reflect the true volume of Romney Tweets. The sample volume of these “Romney” tweets are shown below:

Posted in DiscoverText, research, Twitter | Tagged , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , | 1 Comment