With the new year I'm introducing a new top-level category on my Web site for "economics"; some past postings that fit in there have been moved into the new category, which will probably break links. Anyway, today's new item is from an article in Science News on the place of highly influential individuals (called "influentials") in viral propagation of ideas. New research by Watts and Dodds (PDF preprint) suggests that such individuals aren't so important after all; if you want lots of people to adopt something, you're actually better off concentrating on the most easily influenced people in the population, so as to get as many of those as possible. Sheer numbers are the important factor, not having the adopters be the more influential members of the population.
Unfortunately, I think there's a critical problem with applying this work to the case I'm most interested in. I'm most interested in readership of things posted on the Web. I want to write and post articles that lots of people will read. I've had a few notable successes on that - the top ones are probably What Colour are your bits? and The terrible secret of Livejournal. The Breaking of Cyber Patrol (see FAQ) would probably count too, but it wasn't my doing - it was the choice of the plaintiffs (probably forced by "due diligence" concerns for the company they were selling the Cyber Patrol division to) to file the lawsuit, and it was the lawsuit that made the whole thing notable. These successes seem to be somewhat random. It can be argued that they're my best work, but I've done other work I've thought was better that didn't attract the same kind of attention. (For instance, Tranquility Bay: What to do?.) I've also done other work that might or might not be better but that I would like to receive a lot of attention whether it's good or not. Figuring out how to get lots of attention in a controllable way for ideas of my choice would be of some value.
The problem with applying this recent work to cases like those is that this recent work is on influence graphs quite different from the Web influence graph. First, Watts and Dodds seem to mostly be interested in graphs with a fairly small variance in degree - the most influential people aren't all that much more influential than average. The basic model has a Poisson distribution of vertex degrees. That's not typical of the Web environment at all. In the Web environment it seems to be more like a power-law distribution. In the power-law distribution there are a few highly influential participants (the "A-list" if you will; Watts and Dodds call them "hyper-influentials") who have far more influence than average.
The paper claims that even when some people like that are added to the simulation, it remains the case that influencing them isn't all that helpful. It turns out that the main reason is because they're using a model wherein someone "adopts" (in my case "adopting" might mean "choosing to talk about an idea") after a certain percentage of the person's neighbours in the graph have adopted, with different people having different threshold percentages. Those hyper-influentials have a lot of neighbours, so they don't adopt until a lot of people have done so, so they can't be among the critical early adopters needed to create a big cascade of adoption. What we observe on the Web is that to get a lot of attention you need to get mentioned by someone on the A-list, and if you do, you're quite likely to get a lot of attention, but the A-list people aren't a lot harder to influence than anyone else.
Watts and Dodds anticipate that "people who influence many others are very difficult to influence" may not be realistic in all cases of interest, so they also examine a different model in which you're influenced by each of your neighbours with fixed, independent, probability. Then the more people you influence, the more people influence you. With the basic model for the shape of the graph (everyone having roughly the same amount of influence) but people being easier to influence with more neighbours, then the effect of influentials becomes a bit stronger than in the pure basic model (that is, influentials are more valuable) but not by a lot. If you also throw in the presence of hyper-influentials, then the thesis that influentials aren't too important becomes even weaker. This case is discussed on pages 30 to 32 of the paper. From my point of view it's the most interesting case, but it seems to be the least interesting one for the authors, and that's a shame. They don't go into as much detail on it as I would like and I'm not completely sure what they're saying the result for this case is - if readers here have thoughts about it, they'd be welcome.
Saying that "influentails aren't too important" is a tricky thing to say for another reason: they're using a special definition of importance. What they really mean is that influentials generally do not have influence on the overall success of the project by a greater factor than their immediate influence. Someone who has twice as many neighbours as the average person, has less than twice as much value overall compared to the average person. That does seem like a reasonable way to define it from a theoretical perspective; the assumption Watts and Dodds are examining is the one that influential people have some kind of magic status beyond just bigger readerships of their own.
But the conclusion isn't really that influentials aren't important; it's that influentials are not disproportionately important. They still do have more overall value than the average person! Just not by as much as their bigger readership. In the presence of hyper-influentials (the A-list) I'm not sure how meaningful that statement is for practice. If an A-list Web logger has 1000 times the readership of a D-list Web logger, but only 100 times the value to the eventual overall readership of my article, then it still makes sense for me to spend 100 times as much effort to get mentioned by the A-list Web logger. And that's a much greater disproportion than Watts and Dodds actually claim - they're talking about ratios less than 2:1 instead of 10:1 - although they also haven't examined the case where there's such a big gap as 1000:1 in the direct influence levels of individuals.
My biggest disappointment is that in all these models they're assuming an undirected influence graph. The people who influence me (with whatever model) have to be the same people I influence. For the Web, I think that's completely unrealistic and probably fatal. I don't read all the Web logs of people who read mine. People whose Web logs I read don't all read mine. (I wish!) It's really an undirected graph and very much an undirected graph. Some people have high in-degree and low out-degree - they read lots but don't have many readers of their own. Some people have low in-degree and high out-degree - they don't read many sources but have many readers of their own. Very many of the links only go in one direction.
The famous bow-tie paper is talking about links between Web pages rather than influence among people who post Web pages, but I'd strongly expect influence among people on the Web to work the same way because the bow-tie situation seems to be the natural consequence of having a directed graph with power-law degrees at all. Assuming a bow-tie situation, it seems like it's really important to get your idea adopted by a few of the hyper-influencers in the middle "knot" part of the tie, because they all influence each other and once you get a cascade of influence going in there, it'll propagate to the rest of the world. The easily-influenced people (of high in-degree) seem more likely to be in the fringes (the "OUT" component or right-hand loop of the bow) and influencing them cannot even in principle cause a global cascade that leads to large readership. But that's all an intuitive argument. I'd love to see some rigorous work like Watts and Dodds's, on an Internet-like directed graph with bow-tie structure.
In fairness to Watts and Dodds, it appears they chose to ignore directed graphs for a reasonable reason: they're interested in product marketing by word of mouth, which probably is a lot more like an undirected graph. The people I talk to around the water cooler are the same ones I listen to around the water cooler. As long as we're talking about that kind of model it seems okay to assume an undirected graph. They do mention "media-like" networks with directed links, but explain that those are beyond the scope of their work. My complaint is that this research doesn't happen to be on the question I'm most interested in, which is the question of large disorganized media-like networks, such as the network of Web logs. The research is still on a worthwhile question, just not on my question.