Google was a links-driven search engine
by Brian Turner
Google was a links-driven search engine.
That’s a fundamental truth that needs underlining.
I have to underline it because a lot of people still don’t believe that, including a lot of “SEO experts”, who even occupy positions of authority on various internet communities – blogs and forums.
And while the shape of Google has certainly changed over the years, and is in the process of dramatically changing away from a links-driven model, at the very beginning, Google was a links-driven search engine.
End of argument.
But I say “was” and not “is” because that is rapidly changing.
Pretty much every webmaster has heard of PageRank – too many think PageRank = page rankings. Which is a common myth among people new to SEO issues – and one they are often frustratingly insistent on arguing.
There is no correlation between the little green erection on the Google Toolbar and the keyword ranking of any individual page.
PageRank is an algorithm named after Google founder Larry Page – an algorithm that uses links to determine the meaning of pages, and their relationship to the rest of the web.
As the original PageRank proto-type clearly states, Google’s PageRank algorithm originally looked not simply at how webpages linked to each other – but also how the keywords in those links could be used to describe those other pages:
2.2 Anchor Text
The text of links is treated in a special way in our search engine. Most search engines associate the text of a link with the page that the link is on. In addition, we associate it with the page the link points to. This has several advantages. First, anchors often provide more accurate descriptions of web pages than the pages themselves. Second, anchors may exist for documents which cannot be indexed by a text-based search engine, such as images, programs, and databases.
So it’s clear that in the early days of Google, links were Google’s unique strength.
After all, before then, Alta Vista and others decided that if a page described itself in some way, that this should be taken at face value.
Which had generated a haven for spammers, scammers, and a whole load of nasty practices. A porn site claiming to be the official Disney homepage could be interpreted as – Disney’s official website.
This is why Google’s links-driven PageRank pounding approach revitalised search engines, and brought something new to the internet.
Links were much harder to game, so it became a much more trustworthy signal to rely on.
Links: Google’s Achilles Heel
Google already knew for a long time that in trusting so much in links was a serious Achilles Heel.
Luckily, mainstream SEO didn’t catch on to the fact that Google was a links-driven search engine for years after.
Even as late as 2003, when I first entered SEO, John Scott was evangelising links to mainstream SEO’s and having to argue his point.
Mainstream SEO’s then nauseatingly hummed the mantra of “Content is King”, while the link builders claimed the rankings. The “content is king” argument was that if you published great content, links would naturally come. But without links in the first place, your content could not be found, so it could not generate links – a conundrum Mike Grehan brilliantly highlighted in his commentary Filthy Linking Rich.
Yet when SEO’s saw themselves outranked by links, they still convinced themselves that Google loved good content, that the link builders would soon get their just desserts one day, and their own “great content” would soon rank.
Google was already working on fixing the weakness – their trust in links – from an early stage.
Patent applications such as Hilltop (2001?), Topic Sensitive PageRank (2002) and LocalRank (2003) made it clear that Google was already applying damage control measures to against link manipulation – even before it became widescale.
2003: the year when it all changed
There were four major attempts by Google in 2003 to make their search defensible:
1. Google dance went quarterly
Google stopped their monthly “Google Dance”, which had normally been a monthly update of Google’s index and PageRank values on the Google Toolbar.
Instead, the toolbar PageRank update occured only quarterly, while search results themselves began to update more frequently.
Buying links for PageRank purposes had been a fast growing problem for Google, and this at least helped reduce certainty over the PageRank value of paid links.
2. Human review of Google Search
Additionally, it also become the year when it first became suggested that Google were employing humans as search quality reviewers for Google’s algorithmic results.
3. The Florida Update
Then on November 12th 2003 I was the first SEO to spot a new Google Update in progress – one that would live in infamy in SEO history: the Florida Update.
Florida was so destructive because it brought a number of different updates to the Google algorithm, of which stemming (the ability to treat plural and singular words as related) was the most obvious.
It also introduced “authority” for the first time – the idea that some websites were more equal than others.
And it also introduced anti-spam techniques introduced in the above patents, of which filtering links from within the same C class IP range (/24 IP block) was one of them.
4. Human-user data to supplement links
On December 31st 2003, Google engineer Matt Cutts and others filed for a patent on a paper called Information retrieval based on historical data – a paper in which historical data and user behaviour could be factored into ranking algorithms.
Google had made it clear that it was going to defend its Achilles Heel as much as possible.
Google’s New Human User Driven Algorithm
Bored by the history lesson yet?
Okay, let’s get to the really important point:
Google are moving to providing search results based on a human user-driven algorithm.
Don’t believe me? You’ve already seen it in action.
Let’s quickly recap:
Google was originally a links-driven search engine, but by 2003 was already applying various algorithmic techniques to make links defensible against manipulation.
By the end of 2003, Google had started to employ human search quality reviewers, and additionally submitted a patent that used human-user data to improve Google Search results.
Google’s future direction was so blindingly obvious that in December 2005 I wrote a report “Links are dead – long live links” in which I explored how optimising for human users would have to become integral for mainstream SEO.
In 2006, Google launched it.
Original referred to as “Web category links” or “Quick links”, Vanessa Fox – then working for Google – confirmed these as Sitelinks.
Sitelinks were indented search results that accompanied websites listed top for user searches under certain conditions.
Michael Nguyen was one of the first commentators to make the observation that these were driven by traffic patterns.
Bill Slawski later rooted into the patent causes of sitelinks – and made it plain that this was somehow sourced from user-driven data.
Since then it’s become clear that sitelinks are usually triggered by keyword search frequency – ie, not by traditional relevancy scoring, but according to how often a domain or brand was searched for in the first place.
The profundity of this has probably escaped the majority of people in mainstream SEO: that sitelinks heralded the arrival of Google Search results that were driven not by link data, but human user data.
Google to expands use of human user data?
The launch of Google Universal was generally seen as nothing more fundamental than Google integrating media – images, video, maps, etc – into Google’s search results.
However, the potential is clear for Google to return universal results not based on traditional links-based algorithms, but using instead by using traffic data to display different media options in Google Search results.
At present, it’s hard to conclusively see this in action. Various test searches brings up YouTube videos embedded in search results, which have impressive numbers of individual pageviews.
However, other videos with less views but more recent comments also rank well at times. Is this because these videos are seeing a traffic surge, and are therefore being seen as more relevant due to traffic data?
Google’s apparent moves to apply optimisation of results using time stamping as a factor means that in combination with human user data, key user-driver components may be further obscured.
This is especially if traffic volume itself is mixed with freshness of traffic volume according to time stamp data.
The algorithmic future of Google
Looking at the options for Google, the company has two diverse options:
1. Deliver text search results based on a links-driven relevancy algorithm which Google knows is being manipulated
2. Media-rich search results based on human traffic data, which is potentially more relevant – and also human endorsed.
At present, it’s worth even speculating that Google may in fact have two major algorithms in play – one links driven (traditional search results) and one human-data driven.
Traditional links-driven search results could be called up as normal, with human-user media simply inserted into the search results template.
This would even explain a curious anomaly – why despite sitelinks appearing for the top result in a keyword search, the same domain would hold an additional number 2 position in the results.
A suggestion that both algorithms are not yet entirely synchronised, but that further work by Google to tie them up more smoothly will see the second repeat result removed in time?
Whatever the truth of the matter, one thing remains very clear – Google has been and continues to collect vast amounts of human user data.
More importantly, though, Google have increasing opportunities to apply this data directly into search results.
The question is simply to what extent and over what time frame we will see this take place.
I think the big pointer is that Google are already using human user data to drive sitelinks, and that Google Universal allows Google to use human user data directly for selecting how news, video, and other media is injected into search results.
The result – human driven data that dilutes the impact of link driven results.
Danny Sullivan actually ran a presentation at SES in 2004 about how Google could push down “traditional search” results with what became Google Universal – I’m not sure he suggested it as human user driven, but certainly the potential is there.
Add the employees of the search quality team to apply an editorial process to the actual link driven search results returned, and the links aspect of ranking becomes further diminished.
Discuss this in the Internet Business forums
Story link: Google was a links-driven search engine
Posted in: SEO