Monday, September 12, 2011

TED talk Deb Roy: Analysing Mass Media and Social Networks

MIT researcher Deb Roy wanted to understand how his infant son learned language -- so he wired up his house with videocameras to catch every moment (with exceptions) of his son's life, then parsed 90,000 hours of home video to watch "gaaaa" slowly turn into "water." Astonishing, data-rich research with deep implications for how we learn.

From Transcript  around 11:40 in to it:

In my lab, which we're peering into now, at MIT -- this is at the media lab. This has become my favorite way of videographing just about any space. Three of the key people in this project, Philip DeCamp, Rony Kubat and Brandon Roy are pictured here. Philip has been a close collaborator on all the visualizations you're seeing. And Michael Fleischman was another Ph.D. student in my lab who worked with me on this home video analysis, and he made the following observation: that "just the way that we're analyzing how language connects to events which provide common ground for language, that same idea we can take out of your home, Deb, and we can apply it to the world of public media." And so our effort took an unexpected turn.

Think of mass media as providing common ground and you have the recipe for taking this idea to a whole new place. We've started analyzing television content using the same principles -- analyzing event structure of a TV signal -- episodes of shows, commercials, all of the components that make up the event structure. And we're now, with satellite dishes, pulling and analyzing a good part of all the TV being watched in the United States. And you don't have to now go and instrument living rooms with microphones to get people's conversations, you just tune into publicly available social media feeds.

So we're pulling in about three billion comments a month. And then the magic happens. You have the event structure, the common ground that the words are about, coming out of the television feeds; you've got the conversations that are about those topics; and through semantic analysis -- and this is actually real data you're looking at from our data processing -- each yellow line is showing a link being made between a comment in the wild and a piece of event structure coming out of the television signal. And the same idea now can be built up. And we get this wordscape, except now words are not assembled in my living room. Instead, the context, the common ground activities, are the content on television that's driving the conversations. And what we're seeing here, these skyscrapers now, are commentary that are linked to content on television. Same concept, but looking at communication dynamics in a very different sphere.

And so fundamentally, rather than, for example, measuring content based on how many people are watching, this gives us the basic data for looking at engagement properties of content. And just like we can look at feedback cycles and dynamics in a family, we can now open up the same concepts and look at much larger groups of people. This is a subset of data from our database -- just 50,000 out of several million -- and the social graph that connects them through publicly available sources. And if you put them on one plain, a second plain is where the content lives. So we have the programs and the sporting events and the commercials, and all of the link structures that tie them together make a content graph. And then the important third dimension. Each of the links that you're seeing rendered here is an actual connection made between something someone said and a piece of content. And there are, again, now tens of millions of these links that give us the connective tissue of social graphs and how they relate to content. And we can now start to probe the structure in interesting ways.

So if we, for example, trace the path of one piece of content that drives someone to comment on it, and then we follow where that comment goes, and then look at the entire social graph that becomes activated and then trace back to see the relationship between that social graph and content, a very interesting structure becomes visible. We call this a co-viewing clique, a virtual living room if you will. And there are fascinating dynamics at play. It's not one way. A piece of content, an event, causes someone to talk. They talk to other people. That drives tune-in behavior back into mass media, and you have these cycles that drive the overall behavior.

Another example -- very different -- another actual person in our database -- and we're finding at least hundreds, if not thousands, of these. We've given this person a name. This is a pro-amateur, or pro-am, media critic who has this high fan-out rate. So a lot of people are following this person -- very influential -- and they have a propensity to talk about what's on TV. So this person is a key link in connecting mass media and social media together.

One last example from this data: Sometimes it's actually a piece of content that is special. So if we go and look at this piece of content, President Obama's State of the Union address from just a few weeks ago, and look at what we find in this same data set, at the same scale, the engagement properties of this piece of content are truly remarkable. A nation exploding in conversation in real time in response to what's on the broadcast. And of course, through all of these lines are flowing unstructured language. We can X-ray and get a real-time pulse of a nation, real-time sense of the social reactions in the different circuits in the social graph being activated by content.

So, to summarize, the idea is this: As our world becomes increasingly instrumented and we have the capabilities to collect and connect the dots between what people are saying and the context they're saying it in, what's emerging is an ability to see new social structures and dynamics that have previously not been seen. It's like building a microscope or telescope and revealing new structures about our own behavior around communication. And I think the implications here are profound, whether it's for science, for commerce, for government, or perhaps most of all, for us as individuals.

No comments: