Posted by: by Jim Gilbert, Tom on Oct 23, 2004 - 05:31 PM
Search_Engines
|
|
<FONT face=Arial><FONT size=2>Lately there's been much discussion and speculation about page themeing and link themeing, along with assumptions about Google's penchant for them. That got us wondering... <SPAN class=smallverdana>Has Google shifted toward favoring themed links? <FONT face=Arial><FONT size=2>Initially, we thought it was a simple question that could be definitively answered with a statistical study. However, as you will see, our original question led to other questions and answers over the course of our three-month study.
– <SPAN id=article_author>by Jim Gilbert, Tom Dahm, and DL Hawkins, PhD <FONT face=Arial><FONT size=2>Themeing explained <FONT face=Arial size=2>Let's start with a brief explanation of themeing. The theme of your site refers to your site's primary topic. For example, you might have a site for a business that runs whale watching tours. If you want to rank highly in Google for the keyphrase <SPAN class=smallverdanabold>whale watching, then your site should have a laser-beam focus on the topic of whale watching. <FONT face=Arial size=2>Effective themeing involves excluding as much content as possible that might distract from your site's primary theme. If you were to add a section on dolphin watching, you would be diluting your site's whale watching theme – and this could have a negative effect on your SE positioning for the phrase <SPAN class=smallverdanabold>whale watching. <FONT face=Arial size=2>Granted, it's a rare website that deals with one topic to the exclusion of all others. Ideally, however, if your site offers multiple products they should either be closely related and grouped together or else separated as much as possible whenever that are not closely related. <FONT face=Arial size=2>Using this technique, your site will more easily achieve an identifiable theme, and each page within your site will have its own very specific theme: Whale watching pages are exclusively focused on whale watching; dolphin watching pages are exclusively focused on dolphin watching; and theme dilution is thereby minimized. <FONT face=Arial size=2>Link themes <FONT face=Arial size=2>Link themes are built on the same logic. The theme of a link is defined by the keywords you choose to use in the text of a link <SPAN class=smallverdana>(aka, the anchor text). So, if you link to one of your whale watching pages with the words <SPAN class=smallverdanabold>whale watching in the anchor text, you are theoretically reinforcing that page's whale watching theme. <FONT face=Arial><FONT size=2>The common belief is that Google places a lot of weight on the keywords found in anchor text and themeing your links is one of the most effective ways to theme your pages. The question we sought to answer through our research was, is this actually true? ...and, if so, just how valuable does Google consider these themed links to be? <FONT face=Arial size=2>Bear in mind as you read that our study was based on solid research — no estimates, guesses, observations, or swags — only results from sound and in-depth statistical analysis. <FONT face=Arial><FONT size=2>The Data: <FONT face=Arial size=2>In the spirit of brevity, we're intentionally keeping this section short. Many of our data sources and variables are not listed here. This short list is intended to provide you with a simple summary of the type of data that had to be gathered to perform this type of work. The vast majority of data is related only to linking, since that was the focus of our analysis. on-page criteria was not part of this effort. <FONT face=Arial size=2>We selected many thousands of pages covering various topics for our project and analyzed every available characteristic of every inbound link to each of those pages. The data gathering itself was a major undertaking that utilized in-house custom-built tools. <FONT face=Arial size=2>From the ranking pages analyzed, a few of the characteristics we focused on were: <FONT face=Arial color=#3366cc size=2>PageRank<FONT face=Arial size=2>, inbound links, outbound links, links from same class C block, page title, page URL, and more. <FONT face=Arial size=2>From the pages that linked to the ranking pages, a few of the characteristics we gathered were: PageRank, inbound links, outbound links, page title, URL, link text used, unique linking domains involved, and more. <FONT face=Arial size=2>To answer certain hypotheses about Google's potential use of topic or themed rankings, we also had to create a few valid variables associated with topics and themes. To avoid complicated explanation, suffice it to say that we tested various themeing methods and settled on an approach that, from all statistical work, appeared appropriate and as accurate as possible. <TABLE style="BORDER-RIGHT: gray 1px solid; BORDER-TOP: gray 1px solid; MARGIN: 10px; BORDER-LEFT: gray 1px solid; BORDER-BOTTOM: gray 1px solid" cellSpacing=5 width=250 align=right border=0> | <TD style="FONT-WEIGHT: bold; FONT-SIZE: 12px; COLOR: white; BACKGROUND-COLOR: purple; TEXT-ALIGN: center" colSpan=2><FONT face=Arial size=2>All wannabe mathematicians and statisticians take note: <SPAN class=smallverdana><FONT face=Arial size=2>Google's ranking algorithm is VERY complicated and we do not have access to all the variables Google uses. So, using simple statistics — such as correlation, regression and averages — to find reliable predictors is a total waste of time!
To accomplish the type of analysis and hypothesis testing we performed, the Logistics process and Top-10 versus Non-Top-10 approach was much more reliable than trying to locate the exact set and worth of the variables capable of predicting any exact rankings. |
<FONT face=Arial><FONT size=2>The Statistics: <FONT face=Arial size=2>Having a mathematics and statistics degree <SPAN class=smallverdana>(as well as access to some very good Ph.D. statisticians) certainly helped. We set up all of our data gathering and statistical selection to be robust, comprehensive, and as accurate as possible – we decided that simple correlation and regression analysis was inaccurate and unacceptable! <FONT face=Arial size=2>After considerable testing, we found that the most significant and reliable statistical process was Logistics Stepwise Regression from SAS Institute's Statistical Analysis System. With this process we didn't need to attempt a specific prediction of ranking. Nor were we constrained by the simplistic "linear" nature inherent in the more limited statistical applications such as spreadsheets. Logistics allowed us to analyze results in a Top 10 or Not Top 10 fashion. <FONT face=Arial size=2>Simply put, the search engine ranking algorithms are too complicated to reverse engineer with the hope of predicting any specific rankings. However, analyses turn out to be much more reliable and useful when just trying to predict whether or not a page can achieve a Top 10 ranking. <FONT face=Arial><FONT size=2>The Unimportance Of PageRank Within Our Findings: <FONT face=Arial size=2>Not surprisingly, most of our findings parallel or confirm common SEO <SPAN class=smallverdana>(search engine optimization) beliefs. If you're an experienced SEO you are likely to find confirmation that what you have been doing is on the right track. Remember, however, that this effort was originally begun in hopes of finding an answer to the specific question: <FONT face=Arial size=2> <SPAN class=smallverdana><FONT face=Arial><FONT size=2>Has Google shifted toward favoring themed links? <FONT face=Arial size=2>So, it should be expected that within our preliminary findings we've confirmed some of the common SEO beliefs as well as learned some answers to additional questions that became apparent during the analysis. <FONT face=Arial size=2>It should be noted, however, we do not reference the <FONT face=Arial color=#3366cc size=2>PageRank<FONT face=Arial size=2> variable as anything of importance!<FONT face=Arial size=2> PageRank is a result metric – it is the byproduct of linking structures and linking quantity. PageRank is the effect, links are the cause. <FONT face=Arial size=2>Furthermore, the so-called PageRank (PR) that we see listed in the Google Toolbar is not the same PR that Google uses in its algorithm – the Toolbar version is only a discrete <SPAN class=smallverdana>(0 to 10) visual representation of a much more comprehensive ranking scale. The PR we see in the Toolbar is not important. What is important is the linking information that Google uses to build the "actual" PR. <FONT face=Arial size=2>Of course, this is not just our opinion – our statistical work proved with a very high degree of certainty that PR has an extremely high collinearity with other linking variables and their characteristics. <FONT face=Arial size=2>To clarify, and to avoid stirring argument, let's expand our statement regarding PR: Google's PR – the one we can't see – is hugely important to Google's ranking algorithm. However the PR we can see is not important <SPAN class=smallverdana>(especially in this analysis), because we have more specific and better visibility to the actual linking information that builds PR. <FONT face=Arial><FONT size=2>The Findings: - <FONT face=Arial size=2>The quantity of spiderable links to your site's pages IS important. We now know for a fact that the more incoming links you have, the better <SPAN class=smallverdana>(no surprise).
- <FONT face=Arial size=2>If an incoming link page <SPAN class=smallverdana>(the page that links to your site) has too many outbound links, it works against you. Although the "too many" number is not fixed, making it hard to specifically pin down, think links pages. The more outbound links a page has, the less help a link from that page will give you in the rankings <SPAN class=smallverdana>(again, no surprise).
<FONT face=Arial size=2>The pages that link to your site's pages have certain characteristics and some of those characteristics very much affect your rankings! - <FONT face=Arial size=2>Google unquestionably favors pages more whenever the incoming link pages themselves have many inbound links. In other words, when searching for link partners, you want links from pages that have many inbound links themselves.
<FONT face=Arial size=2>If this sounds confusing, perhaps the diagram below will help clarify: <FONT face=Arial size=2><IMG height=149 alt=" " src="http://www.searchengine-news.com/images/articles/google-links/link-diagram_350x149.gif" width=350 border=0> <FONT face=Arial size=2>It's the green page that benefits most due to the fact that its linking partner <SPAN class=smallverdana>(the very popular yellow page) has many inbound links. The more incoming links the yellow page has, the better the ranking benefit to the green page. The page clusters in the diagram above show how one page <SPAN class=smallverdana>(green) might have a PR=4 yet rank higher than a page <SPAN class=smallverdana>(red) with a PR=5. The diagram above also shows how a PR=5 page can be a much more valuable linking page than a PR=6. <FONT face=Arial size=2>Several additional statistical analyses of this showed over and over again that link pages with many inbound links are much, much more important than just the PR of the linking page. The statistical measures of significance on this single characteristic were extremely high. No other characteristic showed significance even close to this level.
<FONT face=Arial><FONT size=2>The Question Remains... <FONT face=Arial size=2>Has Google shifted toward favoring themed <SPAN class=smallverdana>(on topic) links? Based on purely statistical analysis we can only say... <FONT face=Arial size=2> <FONT face=Arial><FONT size=2>We don't know for sure! <FONT face=Arial size=2>However, there are a couple of reasonable themeing speculations that we feel safe inferring from our work. We believe that Google may favor themed links. And if they don't today, they are very likely to in the future. What the statistics did show is: - <FONT face=Arial><FONT size=2>If Google is classifying links as themed, they are not using the linking page <SPAN class=smallverdanabold>
|