|timeu.se: What Do People Do All Day?||[ home | about | examples ]|
timeu.se is a web tool for investigating how millions of people report spending their time. This is done by collecting and aggregating millions of messages from Twitter.
Some people say that Twitter conversations are "pointless" or "babble" or that people tweet about "what they had for breakfast." But if you think about it, most casual conversations are just as inconsequential — we just don't record them forever on the web!
I think it's great that people tweet what they had for breakfast, or that they're stuck in traffic or are mowing the lawn. Because what sociologists have always wanted to be able to do is get very detailed measures of how people spend their time without bothering them too much by asking them. Now, by simply examining the conversations that they're having in public, we can do it without bothering them at all!
Visit the Examples page to get an idea of the kinds of interesting questions you can ask.
Note that the tweets in the data were collected in the beginning of 2010, so more recent things (e.g. new TV shows) won't show up in the trends.
N.B.: Our paper using this data was published in Science in the 30 September 2011 issue. The paper is "Diurnal and Seasonal Mood Vary with Work, Sleep and Daylength Across Diverse Cultures." by Scott A. Golder and Michael W. Macy. timeu.se has four kinds of plots:
|A Week Plot graphs a single series over 24 hours, with one line for each day of the week.|
|A Line Plot will graph up to five different series, each as a separate line, over 168 hours (24x7).|
|A Scatterplot compares two series, fits a line, and displays the correlation between their activity levels. Series that vary together will have a correlation close to 1 (or -1). Series that are independent will have a correlation close to zero.|
|A Heat Calendar graphs a single series on a calendar-like grid, 24 hours a day by 7 days a week. An hour is "hottest" at its highest point of activity.|
Example: email as a line plot
Example: email as a week plot
Example: email as a heat map
Two-word phrases are separated with a space, but are otherwise just like single-word phrases.
Example: email address
Some plot types can support plotting multiple trends at a time. The week plot and heat map require one (and only one) trend. The scatterplot requires two (and only two) trends - to plot on both axes. The line plot is the most versatile and will plot many comma-separated trends at a time. To plot multiple trends, separate the trends with a comma.
Example: breakfast , lunch , dinner
Sometimes, you might want to combine phrases. This is helpful when including tenses or plurals. To combine phrases, use a plus sign.
Example: boy + boys or eat + ate + eating
It is okay to sum phrases and have multiple series in the same query.
Example: breakfast + breakfasts , lunch + lunches + lunching , dinner + dinners
This is valid as a line plot, which will include three series, one for all the breakfast words, one for all the lunch words, and one for all the dinner words.
Lagged Variables. Scatterplots compare how correlated in time two series are. For example, "breakfast" and "lunch" are not correlated in time (r=0.04). This means that lunch and breakfast aren't popular at the same time.
However, if you lag the X series (breakfast) by 4 hours, the correlation goes up (r=0.7585). This means that peak activity of "lunch" is about 4 hours after that of "breakfast."
Scaling. Line plots with multiple series can either plot all the series on the same Y axis, or plot each on its own axis. Plotting them all on the same axis lets you see which is more popular, but might obscure one if they are of wildly different popularity levels. Plotting them on their own axes is useful if you want to see how the shape of their trends compare, regardless of their popularity.
Example: bacon , sausage on their own (different) scales, showing that bacon and sausage follow the exact same temporal rhythm.
Example: bacon , sausage on the same scale, showing that bacon is much more popular than sausage.
The dataset comes from public Twitter accounts and was collected using the Twitter API over the course of several weeks in early 2010. The dataset comprises 2.4M user accounts and over 500M tweets. The data were stored and processed on the Cornell Center for Advanced Computing's Hadoop cluster. I love Hadoop.
The text of each message was normalized by stripping out non-alphanumeric characters (except apostrophe). The activity level for a given phrase (we currently support one- and two-word phrases) is measured as the average number of occurrences of that phrase per message, in a given hour of the week. Phrases are excluded if they occur fewer than 300 times in the corpus.
The week consists of 168 hours (24 hours/day * 7 days/week) starting on Sunday morning at 12:00:00am and ending Saturday night at 11:59:59. Each message was converted from UTC (Coordinated Universal Time) to local time using the user-specified time zone.
timu.se was created by Scott Golder, a graduate student in the Social Dynamics Lab in the Department of Sociology at Cornell University. On Twitter, he is @redlog.
→ Like the website? Read the paper! S. Golder, M. Macy. Diurnal and Seasonal Mood Vary with Work, Sleep and Daylength Across Diverse Cultures. Science. 9/30/2011.