April 21, 2008

Google was a links-driven search engine

by Brian Turner

Google was a links-driven search engine

Google was a links-driven search engine.

That’s a fundamental truth that needs underlining.

I have to underline it because a lot of people still don’t believe that, including a lot of “SEO experts”, who even occupy positions of authority on various internet communities – blogs and forums.

And while the shape of Google has certainly changed over the years, and is in the process of dramatically changing away from a links-driven model, at the very beginning, Google was a links-driven search engine.

End of argument.

But I say “was” and not “is” because that is rapidly changing.

PageRank

PageRank: an algorithm named after Google founder Larry Page

Pretty much every webmaster has heard of PageRank – too many think PageRank = page rankings. Which is a common myth among people new to SEO issues – and one they are often frustratingly insistent on arguing.

There is no correlation between the little green erection on the Google Toolbar and the keyword ranking of any individual page.

PageRank is an algorithm named after Google founder Larry Page – an algorithm that uses links to determine the meaning of pages, and their relationship to the rest of the web.

As the original PageRank proto-type clearly states, Google’s PageRank algorithm originally looked not simply at how webpages linked to each other – but also how the keywords in those links could be used to describe those other pages:

2.2 Anchor Text

The text of links is treated in a special way in our search engine. Most search engines associate the text of a link with the page that the link is on. In addition, we associate it with the page the link points to. This has several advantages. First, anchors often provide more accurate descriptions of web pages than the pages themselves. Second, anchors may exist for documents which cannot be indexed by a text-based search engine, such as images, programs, and databases.

So it’s clear that in the early days of Google, links were Google’s unique strength.

After all, before then, Alta Vista and others decided that if a page described itself in some way, that this should be taken at face value.

Which had generated a haven for spammers, scammers, and a whole load of nasty practices. A porn site claiming to be the official Disney homepage could be interpreted as – Disney’s official website.

This is why Google’s links-driven PageRank pounding approach revitalised search engines, and brought something new to the internet.

Links were much harder to game, so it became a much more trustworthy signal to rely on.

Links: Google’s Achilles Heel

Brad Pitt as Achilles in the film Troy

Google already knew for a long time that in trusting so much in links was a serious Achilles Heel.

Luckily, mainstream SEO didn’t catch on to the fact that Google was a links-driven search engine for years after.

Even as late as 2003, when I first entered SEO, John Scott was evangelising links to mainstream SEO’s and having to argue his point.

Mainstream SEO’s then nauseatingly hummed the mantra of “Content is King”, while the link builders claimed the rankings. The “content is king” argument was that if you published great content, links would naturally come. But without links in the first place, your content could not be found, so it could not generate links – a conundrum Mike Grehan brilliantly highlighted in his commentary Filthy Linking Rich.

Yet when SEO’s saw themselves outranked by links, they still convinced themselves that Google loved good content, that the link builders would soon get their just desserts one day, and their own “great content” would soon rank.

Google was already working on fixing the weakness – their trust in links – from an early stage.

Patent applications such as Hilltop (2001?), Topic Sensitive PageRank (2002) and LocalRank (2003) made it clear that Google was already applying damage control measures to against link manipulation – even before it became widescale.

2003: the year when it all changed

There were four major attempts by Google in 2003 to make their search defensible:

1. Google dance went quarterly

Google stopped their monthly “Google Dance”, which had normally been a monthly update of Google’s index and PageRank values on the Google Toolbar.

Instead, the toolbar PageRank update occured only quarterly, while search results themselves began to update more frequently.

Buying links for PageRank purposes had been a fast growing problem for Google, and this at least helped reduce certainty over the PageRank value of paid links.

2. Human review of Google Search

Additionally, it also become the year when it first became suggested that Google were employing humans as search quality reviewers for Google’s algorithmic results.

3. The Florida Update

Then on November 12th 2003 I was the first SEO to spot a new Google Update in progress – one that would live in infamy in SEO history: the Florida Update.

Florida was so destructive because it brought a number of different updates to the Google algorithm, of which stemming (the ability to treat plural and singular words as related) was the most obvious.

It also introduced “authority” for the first time – the idea that some websites were more equal than others.

And it also introduced anti-spam techniques introduced in the above patents, of which filtering links from within the same C class IP range (/24 IP block) was one of them.

4. Human-user data to supplement links

On December 31st 2003, Google engineer Matt Cutts and others filed for a patent on a paper called Information retrieval based on historical data – a paper in which historical data and user behaviour could be factored into ranking algorithms.

Google had made it clear that it was going to defend its Achilles Heel as much as possible.

Google’s New Human User Driven Algorithm

Bored by the history lesson yet?

Okay, let’s get to the really important point:

Google are moving to providing search results based on a human user-driven algorithm.

Don’t believe me? You’ve already seen it in action.

Let’s quickly recap:

Google was originally a links-driven search engine, but by 2003 was already applying various algorithmic techniques to make links defensible against manipulation.

By the end of 2003, Google had started to employ human search quality reviewers, and additionally submitted a patent that used human-user data to improve Google Search results.

Google’s future direction was so blindingly obvious that in December 2005 I wrote a report “Links are dead – long live links” in which I explored how optimising for human users would have to become integral for mainstream SEO.

In 2006, Google launched it.

Original referred to as “Web category links” or “Quick links”, Vanessa Fox – then working for Google – confirmed these as Sitelinks.

Sitelinks were indented search results that accompanied websites listed top for user searches under certain conditions.

Michael Nguyen was one of the first commentators to make the observation that these were driven by traffic patterns.

Bill Slawski later rooted into the patent causes of sitelinks – and made it plain that this was somehow sourced from user-driven data.

Since then it’s become clear that sitelinks are usually triggered by keyword search frequency – ie, not by traditional relevancy scoring, but according to how often a domain or brand was searched for in the first place.

The profundity of this has probably escaped the majority of people in mainstream SEO: that sitelinks heralded the arrival of Google Search results that were driven not by link data, but human user data.

Google to expands use of human user data?

The launch of Google Universal was generally seen as nothing more fundamental than Google integrating media – images, video, maps, etc – into Google’s search results.

However, the potential is clear for Google to return universal results not based on traditional links-based algorithms, but using instead by using traffic data to display different media options in Google Search results.

At present, it’s hard to conclusively see this in action. Various test searches brings up YouTube videos embedded in search results, which have impressive numbers of individual pageviews.

However, other videos with less views but more recent comments also rank well at times. Is this because these videos are seeing a traffic surge, and are therefore being seen as more relevant due to traffic data?

Google’s apparent moves to apply optimisation of results using time stamping as a factor means that in combination with human user data, key user-driver components may be further obscured.

This is especially if traffic volume itself is mixed with freshness of traffic volume according to time stamp data.

The algorithmic future of Google

Looking at the options for Google, the company has two diverse options:

1. Deliver text search results based on a links-driven relevancy algorithm which Google knows is being manipulated
2. Media-rich search results based on human traffic data, which is potentially more relevant – and also human endorsed.

At present, it’s worth even speculating that Google may in fact have two major algorithms in play – one links driven (traditional search results) and one human-data driven.

Traditional links-driven search results could be called up as normal, with human-user media simply inserted into the search results template.

This would even explain a curious anomaly – why despite sitelinks appearing for the top result in a keyword search, the same domain would hold an additional number 2 position in the results.

A suggestion that both algorithms are not yet entirely synchronised, but that further work by Google to tie them up more smoothly will see the second repeat result removed in time?

Whatever the truth of the matter, one thing remains very clear – Google has been and continues to collect vast amounts of human user data.

More importantly, though, Google have increasing opportunities to apply this data directly into search results.

The question is simply to what extent and over what time frame we will see this take place.

UPDATED SUMMARY:

I think the big pointer is that Google are already using human user data to drive sitelinks, and that Google Universal allows Google to use human user data directly for selecting how news, video, and other media is injected into search results.

The result – human driven data that dilutes the impact of link driven results.

Danny Sullivan actually ran a presentation at SES in 2004 about how Google could push down “traditional search” results with what became Google Universal – I’m not sure he suggested it as human user driven, but certainly the potential is there.

Add the employees of the search quality team to apply an editorial process to the actual link driven search results returned, and the links aspect of ranking becomes further diminished.

Discuss this in the Internet Business forums

Story link: Google was a links-driven search engine

 

10 Responses to “Google was a links-driven search engine”

  1. Lee Croucher on April 22nd, 2008 11:32 am

    i have heard about google doing human research in major markets to check their first 20 listings are giving consumers the best sites. So gone are the days of spammers producing self gen sites as they won’t rank or last.

  2. Dave on April 22nd, 2008 2:58 pm

    Nice round up and timeline Brian… I am also one that believes further user performance metrics are becoming more prevalent in the major engines. It is funny that now everyone is now stuck in the link centric thinking which now needs to be partially broken after having to convince them the value once upon a time…. how long will it take? Dunno… but many SEOs are missing the boat IMO

  3. Brian Turner on April 22nd, 2008 6:48 pm

    Indeed – we’ve seen how different signals could be used.

    What I’ve been trying to do is to go back to basics and look at everything again, from a fresh perspective.

    I fired up Google, ran a search, saw sitelinks – and suddenly realised that I was looking at human data driven results. With universal, the potential is significant.

    The days of link building are very subdued.

  4. PPCblogger on April 23rd, 2008 7:47 am

    Brian – Simply a fantastic post.

  5. Matt Ridout on April 23rd, 2008 8:55 am

    An interesting take on Google – I liked the history of the algorithm

  6. Anon on April 23rd, 2008 11:51 am

    Interesting post, I’ve noticed some crazy behaviour in the top 10 serps in the past month or so. One of our sites would rank extremely highly for an extremely competitve phrase then 2 days later would drop back to #30

    This has happened around 5 times so far with 5 main keywords, its like they are testing our site to see if searchers will approve of our content.

    However given that links still seem to trump everything, and a domain with the keywords contained seems especially popular. Maybe because of the links, but maybe because people searching see the keywords in the domain and click it?

  7. Brian Turner on April 23rd, 2008 12:27 pm

    I’ve noticed this a lot as well – my interpretation is that Google are drawing results from different datacenters and comparing user preference on each using the tracking built into Google Search.

    I think at least sometimes this is when Google are applying minor tweaks to the main algo – but run a test sample for user reaction, they can determine if there are any significant problems for the user experience being introduced with the new tweak.

    Just my 2c, though. :)

  8. Paul on April 23rd, 2008 12:52 pm

    Great article Brian, simply stunning in its structure and message. Stumbled!

  9. Tin Pig on April 23rd, 2008 2:25 pm

    Good article. I especially like the history.

    I see this as a move in the right direction for google. The link-based algorithm is simply too easy to manipulate and does not represent a true measure of relevancy.

  10. Alex McArthur on April 24th, 2008 5:09 pm

    Great stuff. Moved your feed from my “when I find the time list” to “must reads” ;-) .

Leave a Reply




 

Previous: «
Next: »

Visited 40172 times, 47 so far today

Tags: , , , ,

Posted in: SEO