What the [bleep] Does Joyent Manta Have to Do with Twitter?

September 27, 2013 - by Mark Cavage

Yesterday, I hosted a webinar that walked through the concepts of Manta, and ran some demos showing off how you interact with the basics of storage, as well as (at least in my opinion) an interesting analysis of public Twitter data.

As a different side project, I had written a "Twitter2Manta" data harvester (you can get the source here), so I already had approximately 150GB of public tweets laying around. As an individual interested in random things, I was just mildly curious about how much profanity was in tweets, and how it trended; for example, was it highly variable and tied to public events? Or was it just some kind of fixed percentage? Ben is a big fan of "show me the money!" up front, so here's the resulting graph of 1.5 weeks of public tweets:

Twitter profanity analysis using Joyent Manta

Node.js scripts (for map/reduce), quantized all tweets into 15 minute buckets, and then checked each word in the tweet against a known dictionary of profanity. I kept counters for all tweets and profane tweets, and then sent that to a reducer that sorted by time, and produced an HTML graph using D3.js, which of course was written back to Manta.

Hopefully this webinar plus the demo give you greater insight into use cases for Joyent Manta. And internally at Joyent, Manta powers almost all of our analytics for the public cloud (and Manta itself!); so the TSD-demo really is indicative of at least one "home run" use case. As we've seen for things like image/video transcoding, there are lots more as well. We'd love to hear what you use it for!