The social hilltop algorithm

Posted on 11. Nov, 2011 by in Digital Marketing

I feel like a post like this needs to begin with a disclaimer.

I suppose in the ideal world disclaimers should not be necessary but there has been plenty of conversations recently about skill shortages in SEO and the tendency for too many account managers to consider themselves experts just because the keep up to date with the echo chamber of opinion and without testing anything for themselves.

I’ve some sympathy for those concerns and so here’s my disclaimer; this post is purely speculation. Sure, it’s based on fact and history but the extrapolation is purely a ‘what-if’ scenario. Hopefully it’s food for thought.

Social Hilltop background

This post is inspired by some of the live coverage from Pubcon. Pubcon is a drinking binge in Vegas in which people are able to make the business case to attend because SEO experts present, on stage, their thoughts and opinions. I’m not jealous that I’ve never been able to make said business case, you understand, it’s just that… well… I should get to go.

SERoundtable has some good, live blog, coverage.

Brian Ussery: Twitter and Facebook are still essentially blocked from Googlebot…
Barry Schwartz:: Lots of people talking about social in terms of SEO and some SEOs are getting ahead of themselves. Google can only see stuff on the open web, not stuff hidden to GogoleBot

I agree with much of that. Modern SEO is not just social SEO. Modern SEO is multi-signal SEO and social is one of the signal collections. In fact, in a very useful interview between Eric Enge and Duane Forrester of Bing, Bing’s Sr Product Manager confirmed that social signals are the second most important ranking factor for them, ahead of links, but behind behavioural factors.

The question, therefore, is Googlebot really blocked from Twitter?

I find it hard to argue that it is. After all, Google has indexed over a trillion pages from Twitter.

Most of those pages, the important pages, are people’s “homepages” or profile pages.

It may be the case that Google does not always get its hands on the individual tweets as fast as they would like. Without the firehose Google may not be able to tell directly from Twitter, in real-time, which pages are popular tweet targets. When Google lost the firehose from Twitter, a few months ago, the world lost real-time search from Google. Despite the growth of Google+ and the brand new launch of Google+ Pages we are yet to see RTS return.

So, can Google cope without direct access to the Twitter firehose? Yes. I’m going to suggest that Google could expand the Hilltop algorithm to the Social Hilltop algorithm and monitor the real-time web of influencers and determine the social authorities.

The Hilltop algorithm is an absolute must know for any SEO account manager or freelancer. It has its own wikipedia page. It was when Google first started talking about authority.

The Hilltop algorithm was created by Krishna Bharat and George Mihaila. It was bought by Google in 2003. That’s right: this was a search algorithm good enough that Google needed to buy it.

The concept is simple. Start with a carefully selected list of “expert sites”. Hand picked, if needs be. Expert sites are those without affiliation, which link to many sites and which have significance of their own.

Sites that many “expert sites” linked to are considered to be “authority sites” by the Hilltop algorithm.

In some ways this is the very first social media search algorithm. Hilltop uses influencers to work out which sites should be ranking for any given keyword.

Krishna Bharat goes on to be employed by Google, be fundamental in producing Google News and LocalRank. Google News, to this day, cares about the influence of your news organisation.

Social Hilltop algorithm

Let’s imagine the Social Hilltop algorithm is used to let Google cope without direct Twitter firehose access. It first needs to find the social equivalent of expert sites.

In the social world expert sites would be those that –without affiliation- surface pages which are popular among the social community. For this blog I’m going to suggest Tweetmeme and Topsy. Both monitor popular URLs as they’re shared on Twitter. They are without affiliation, neither block Google – in fact, both have Allow: / in their robots.txt and both link to popular pages.

As a result we can find our social authority pages. Google’s crawls will find it simple to identify which pages are considered popular by both Tweetmeme and Topsy. Should Google find this a helpful measurement of engagement, authority or influence – in the way they find the traditional Hilltop algorithm helpful – then these discoveries become an easily inclusion into the legion of ranking factors.

We may, of course, also conclude that Google still has a pretty good idea of what is popular on Twitter by being able to examine that social network directly. Either way; it seems very likely that Google is all too aware of what’s popular and what’s not among today’s blog posts or discoveries.

Photo credits: Dilip Muralidaran and Sghosh30

  • Chris Dugdale

    PageRank and Hilltop are just measures of raw popularity and topical authority respectively. People keep banging on about how there is some whizz-bang science developing in the wings that means we will end up with “Social Hilltop”, or some-such nonsense. At the end of the day, a twitter profile is just a web-page and the existing mechanisms happily handle such pages just as well as a page on a blog.

    The only real difference is the rate of change; Glueing together PageRank, Hilltop and “speed” (delivered by Caffeine and the latest freshness update) are all you really need to map-out social networks and their associated ever-fickle trending topics. There is nothing really new here – the only thing that Google was lacking was being able to crawl fast enough and shovel that fresh data into the index quickly enough for it to hold any relevance – which they now can.

    I think talking about social signals has always been somewhat disingenuous too; once upon a time, people created content in newsgroups, then on forums, then on blogs and associated comments; now we have twitter and Facebook, but all of these “channels” are just web-pages and as long as those pages are open to Google, it has all the equipment it needs to process that information. Social signals are nothing new (before links became a commodity, _they_ were “social signals” – one webmaster endorsing another’s content), we are just seeing a mass-market adoption of online communication and a required change in infrastructure required to keep pace.

    • Andrew Girdwood

      Google bought Usenet’s archives, after all.

  • Jeremy Chatfield

    @twitter-24935504:disqus - [aussie twang]Nah, reckon yer wrong, mate![/aussie twang]
    There’s a vast hyper super duper difference between *Google* social and backlinks. It’s that Google’s social activity is as measurable as AdSense clicks. AdSense allows “invalid clicks” but dismisses the charges for users – IOW, Google does click fraud detection. It can because it has so much data. Data that it can’t get about third party backlinks that someone else put up, at some point (almost any data Googlebot collects can be spoofed – like Lastmod, for example; want to have your articles scattered through time? Append them to an uncrawled page, attach that page high in the site, and lie like a demon about the Lastmod on the page you want to be noticed – for example).

    However, try lying about how well you know someone – do they respond to you? Are your clicks strangely co-ordinated; 1,000 users with a similar time signature all clicking interest on a business? And then the same group doing the same for something else? The kinds of patterns that Google spots in AdSense.

    Google Social is much richer for Google to understand where the *quality* clicks and content are, than web pages. Much, much easier. It’s a game changer. See my speculation earlier this week -

    Twitter? An also ran. Look at the active user estimates for Twitter, and look at the growth rates for Plus. Ignore crawling Twitter. It’s the +1 and Plus clicks that count as Social. And when people work out that Plus and +1 affect reputation and ranking – Twitter’s toast – except for spammers and laggards.

    • Chris Dugdale

      You can wield your Aussie Twang all you like, but that is pretty naive thinking in my book. Do you really think it is impossible to spoof ‘social signals’? Conversational interaction is harder to spoof (although there are chat-bots that pass the Turing Test these days) but a distributed net of random browsers that occasionally mash +1 buttons with a subjective bias toward a topic or client is almost trivial.

      Also. do you really think that there is a difference between a link between pages and a link between people? To Google, people and pages are just data nodes; the links between them have attributes and infer meaning but at the end of the day they will be modelled the same way, using the same tools; there may be different nomenclature attached to nodes and connections, but they will be transposable at a very base level.

      This is why Google and Facebook will fail in their modelling of the ‘social graph’. People’s relationships and interactions are not easily codified into six categories; we are so much more complex than can be modelled (yet) and so we will be mapped using the same simple tools that are used to map pages and links.

      Also, (I really must stop starting sentences with that word) Google are forcing businesses into using G+ by saying that the +1s will stack up and be that influencing factor. This isn’t going to get people using it because they want to, but because they feel they have to. First and foremost, this is going to attract people looking to game the system and this is what will ruin it for all. Every successful social media network has grown to be popular before monetisation and business moved in. Turning this around and putting business first will not provide an environment that users will take to easily. I wouldn’t write twitter off yet.

      • Andrew Girdwood

        I think it’s harder to spoof an influential author than it is to “spoof” an influential page or network of pages, though. 

        It’s easy to create a sock puppet, sure, but a sock puppet with significant authority? That’s much harder.