July 9, 2008

Different Google algos for different keywords

by Brian Turner

One of the more interesting points to come out of the Keyword Suggestion Tool is that the numbers are not exact figures – but instead assign keywords to clear sets based on search frequency.

For example, for keyword searches claimed to have a search frequency between 100,000 and 600,000, every single keyword will have only one of the following values:

    550,000
    450,000
    368,000
    301,000
    246,000
    201,000
    165,000
    135,000
    110,000

All the designated search volumes are within specific sets, arrayed in a clear mathematical pattern as above, for every single keyword.

In other words, each search frequency is effectively assigned an integer and a keyword associated with it.

What this reveals is that when making tweaks to the algorithm, Google can potentially apply only specific parts of the algo to specific keyword sets.

This is really interesting, because this is precisely what I observed in the aftermath of the Florida Update of November 2003.

What was really interesting about that update is that it appeared to only hit certain keyword groups, and this was expanded through 2005.

In addition, I’ve covered previously on discussions about Google Sandboxing (before it ever became widely accepted there was one) that this was keyword dependent, and that some keywords – usually higher traffic keywords – were more likely to be affected.

The result is that since then in discussions on sandboxing, I’ve referred to being unable to rank for “competitive” keywords with a reasonably new site. You can still rank a site, but not for highly competitive keywords such as “mortgages”.

After doing a little background research on this today, I can see I need to go back and do a lot more homework on LSI – Latent Semantic Indexing – not least after rediscovering a post Aaron Wall wrote a couple of years back about the impact of LSI on individual keywords.

Funnily enough, he even references a thread I started in the Search Engine Watch (SEW) forums where I’d listed various research references for further exploration of the topic – a thread that saw my having to defend the importance of reading patents for potential forewarnings of industry events, as opposed to simply being strategically reactive.

The bottom line is that there is easy potential for different keywords to rank by different criteria, not least according to search volume, within Google’s algorithm – and observation suggests that this has indeed been happening for a few years now.

The irony is that the information has always been out there – it’s simply been a case of joining up the dots. What will be very interesting is how a LSI model joins them up.

Discuss this in the Internet Business forums

Story link: Different Google algos for different keywords

 

8 Responses to “Different Google algos for different keywords”

  1. Michael on July 10th, 2008 1:56 pm

    I don’t think even Aaran Wall believes anymore that Google use LSI.

    When he wrote the post you refer to he said “Some of those well in the know attribute this to latent semantic indexing, which Google has been using for a while, but recently increased its weighting”. (From the Internet Archive).

    This now reads “Even if they are not using LSI, Google has likely been using other word relationship technologies for a while, but recently increased its weighting”.

    This surreptitious retrospective editing was probably done after Dr Garcia (who knows a thing or two about LSI) told Aaron he did not know what he was talking about. http://irthoughts.wordpress.com/2007/06/02/when-seos-are-caught-in-lies/

    Google using LSI is now an ancient myth but there are lots of new myths you could do your homework on instead Brian :)

  2. Brian Turner on July 10th, 2008 5:33 pm

    Let’s face it, nobody outside of the Google Search team know exactly what technologies Google is exactly using, but let’s be clear – we know Google’s purchase of Applied Technologies brought with it a range of semantic tools, which were especially used to powered Adsense.

    So for Aaron to simply modify his comments from “has” to “if” is perfectly reasonable, especially in the face of minority hostility.

    Personally, I think the argument of whether Google is using any particular programmatical method is completely missing the point – the most important aspect of search engine theory and patents for search marketers is knowledge of the concepts being considered for potential application.

    For example, the Hilltop patent was published around the year 2000 I think, and a number of concepts that appeared in that patent seemed to have been applied since the Florida update of November 2003.

    That doesn’t mean to say that Google was using the Hilltop patent in its original form (cf my first sentence in this comment), but certainly showed concepts Google covered in that patent were seen to be valuable and worth implementing to some degree.

    Whether Google are definitely using LSI or some related form is beside the point – we know they have both the previous technology and the current resources to implement something in this area. What exact form is or may be being used is academic outside of Google.

    In fact, I seem to recall that when Dr Garcia first raised his complaints that Google couldn’t apply LSI, it was on the grounds that Google would not have the processing power to carry out the calculations to any significant degree*.

    Funnily enough, I also seem to recall countering that Google could apply such processing on a keyword-dependent basis, targeting higher volume keywords, based on observations of behaviour over 2004+.

    A point I seem to have returned to.

    * I also seem to recall that it was categorically stated by people in Dr. Garcia’s peer group that no search engine could ever be capable of clustering technologies for search processing – until Teoma came out and did it.

  3. Michael on July 11th, 2008 3:43 am

    “…we know Google’s purchase of Applied Technologies brought with it a range of semantic tools, which were especially used to powered Adsense”.

    Brian I think you are referring to Google’s acquisition of Applied Semantics in April 2003.

    Applied Semantics was purchased for its semantic text processing and online advertising expertise derived from its patented CIRCA technology. http://www.google.com/press/pressrel/applied.html

    CIRCA uses a proprietary ontology which consists of hundreds of thousands of concepts and their relationships to each other. This ontology is developed by merging industry standard knowledge bases with automated tools together with guidance and direction from a team of lexicographers and computational linguists. The technology is outlined in two Applied Semantics patents; Meaning-based advertising and document relevance determination and Meaning-based information organization and retrieval.

    CIRCA has absolutely nothing to do with LSI.

    - Michael

  4. Brian Turner on July 11th, 2008 9:21 am

    “we know Google’s purchase of Applied Technologies brought with it a range of semantic tools, which were especially used to powered Adsense”

    Indeed, that’s the statement I made – I think you’re trying to read too much into what I stated.

    Refer to the comments I made here:

    “I think the argument of whether Google is using any particular programmatical method is completely missing the point – the most important aspect of search engine theory and patents for search marketers is knowledge of the concepts being considered for potential application.”

    Arguing as to whether LSI is exactly applied by Google or not is exactly what I was talking about in my reference to Hilltop above.

    We know that Google has the capacity to apply semantic relationships upon some model – therefore to read about semantic models already described can only put a search marketer in good stead in terms of trying to understand processes that are likely already in play.

  5. egarcia on July 13th, 2008 8:14 pm

    Thanks for the quotes, guys.

    Mr. Turner, regarding this

    “I also seem to recall that it was categorically stated by people in Dr. Garcia’s peer group that no search engine could ever be capable of clustering technologies for search processing – until Teoma came out and did it.”

    I am not sure that someone in my research or peer group (faculty colleages/students) have ever made such claim..

  6. Brian Turner on July 14th, 2008 10:19 am

    I think I would need to re-read Mike Grehan’s “Search Engine Book” to find the specific reference of which search engine and which search technology was supposedly beyond them. Unfortunately, I no longer have a copy on my hard drive so I can’t directly reference it easily.

  7. egarcia on July 21st, 2008 12:57 pm

    Thanks.

    Please, do.

    It might be possible that you are mistaking me for another “Garcia”. I am inclined to believe that Mike wrote his book before I ever met him.

  8. Brian Turner on July 23rd, 2008 12:30 pm

    Sorry, no time to look for the reference. However, it wasn’t intended as a claim that you yourself made a statement that was later disproven – as much as that what researchers may or may not claim is or is not happening within any search engine has no bearing on what any search engine may actually be doing.

Leave a Reply




 

Previous: «
Next: »

Visited 8718 times, 1 so far today

Posted in: SEO