After creating the Free Wtr bot using Tweepy and Python and this code, I wanted a way to see how Twitter users were perceiving the bot and what their sentiment was. So I created a simple data analysis program that takes a given number of tweets, analyzes them, and displays the data in a scatter plot.
Twitter is a popular social network where users can share short SMS-like messages called tweets. Users share thoughts, links and pictures on Twitter, journalists comment on live events, companies promote products and engage with customers. The list of different ways to use Twitter could be really long, and with 500 millions of tweets per day, there’s a lot of data to analyse and to play with.
This is the first in a series of articles dedicated to mining data on Twitter using Python. In this first part, we’ll see different options to collect data from Twitter. Once we have built a data set, in the next episodes we’ll discuss some interesting data applications.
“Twitter runs multiple large Hadoop clusters that are among the biggest in the world. Hadoop is at the core of our data platform and provides vast storage for analytics of user actions on Twitter. In this post, we will highlight our contributions to ViewFs, the client-side Hadoop filesystem view, and its versatile usage here.
ViewFs makes the interaction with our HDFS infrastructure as simple as a single namespace spanning all datacenters and clusters. HDFS Federation helps with scaling the filesystem to our needs for number of files and directories while NameNode High Availability helps with reliability within a namespace. These features combined add significant complexity to managing and using our several large Hadoop clusters with varying versions. ViewFs removes the need for us to remember complicated URLs by using simple paths. Configuring ViewFs itself is a complex task at our scale. Thus, we run TwitterViewFs, a ViewFs extension we developed, that dynamically generates a new configuration so we have a simple holistic filesystem view…”
“In this post, we explore LDA an unsupervised topic modeling method in the context of twitter timelines. Given a twitter account, is it possible to find out what subjects its followers are tweeting about?
Knowing the evolution or the segmentation of an account’s followers can give actionable insights to a marketing department into near real time concerns of existing or potential customers. Carrying topic analysis of followers of politicians can produce a complementary view of opinion polls.
The goal of this post is to explore my own followers, 698 at time of writing and find out what they are tweeting about through Topic Modeling of their timelines…”
Yao has worked at Twitter for a few years. She’s seen some things. She’s watched the growth of the cache service at Twitter explode from it being used by just one project to nearly a hundred projects using it. That’s many thousands of machines, many clusters, and many terabytes of RAM.
It’s clear from her talk that’s she’s coming from a place of real personal experience and that shines through in the practical way she explores issues. It’s a talk well worth watching.
As you might expect, Twitter has a lot of cache…”
“With open-source technologies proliferating as “Big Data” and analytics explode, we thought it would be beneficial to let our users and friends utilize a script that takes care of the nitty gritty and allows them to explore what makes MongoDB great. We’re excited to present Twitter-Harvest, a Python script that utilizes the Twitter REST API v1.1 to retrieve tweets from a user’s timeline and inserts them into a MongoDB database…”
“twemproxy (pronounced “two-em-proxy”), aka nutcracker is a fast and lightweight proxy for memcached protocol. It was primarily built to reduce the connection count on the backend caching servers…”
“Twemcache is the Twitter Memcached…”
“We built Twemcache because we needed a more robust and manageable version of Memcached, suitable for our large-scale production environment. Today, we are open-sourcing Twemcache under the New BSD license. As one of the largest adopters of Memcached, a popular open source caching system, we have used Memcached over the years to help us scale our ever-growing traffic. Today, we have hundreds of dedicated cache servers keeping over 20TB of data from over 30 services in-memory, including crucial data such as user information and Tweets. Collectively these servers handle almost 2 trillion queries on any given day (that’s more than 23 million queries per second). As we continued to grow, we needed a more robust and manageable version of Memcached suitable for our large scale production environment…”
“Without open source, Twitter wouldn’t exist. Every Tweet you send and receive touches open source software on its journey between computers and mobile devices. We were curious about how much open source is used at Twitter. Beyond that, we wanted to discover how open source may influence the culture at Twitter, Inc.
We asked Chris Aniszczyk, Open Source Manager at Twitter, to share the company’s open source story. Aniszczyk will be keynoting at this month’s LinuxCon, August 29 through 31, in San Diego, CA. His topic: The open source technology behind a Tweet.
See what Aniszczyk (@cra on Twitter) had to say about open source and the open culture at Twitter…”