An experimental system for the comparative valuation of PageRank
By contributing Author: Matt J Aird.
About the author.
At no time in history has so much information been available and accessible to the majority of mankind, but as the needles of information increase so does the size of the haystacks around them. The new need becomes a way to rapidly find the relevant information, a role filled by the search engines, with Google's PageRank based system currently ascendant.
With Google being the world's number one search tool, PageRank is naturally the subject of much debate and speculation by virtually anyone who wants to draw attention to any document or site on the web. In what I would consider this to be a very sensible precaution, Google themselves have maintained an aura of secrecy about the exactly how the system works and precisely why some pages are listed above other apparently equally relevant ones in their results pages.
Despite the secrecy, in my opinion it is possible to get a better view on the whole system by increasing the granularity of the grading from the simple zero to ten scale that Google display (but do not actually use).
I do this by use of what I call PagePoints, my own working title for the concept of assigning an estimated numeric value to the pagerank of documents on the web. This is an experimental system that correlates the PR of individual pages against a logarithmic scale to give more accurate comparisons of values.
The resultant figures are not meant to be taken as numerically exact, firstly because the base figures used are the displayed estimates (whole numbers from zero to ten) and Google will not allow further precision at this time, and secondly because Google have not and will not release the scale on which the PR figures are based.
Having said that, using their displayed figure for the former and educated guesses for the latter, it is possible to work out a reasonable and logical system of relative values between pages at different PR scores, and also demonstrate more clearly how PageRank itself interacts between individual documents on the web.
The basic driving ideal behind Google's PageRank formula is an attempt to understand what is important on the internet by using the structure of the web itself.
This idea is as simple as it has been successful; People will link their own sites and pages to other pages they consider to be important, so the more links that go to a page the more important that page is considered to be. Then the more important a page is considered (by Google), the more value any links from that page to another are given.
In Google's own words: 'PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important." '
This proved to be a revolutionary and popular system, because by using what is called 'off-page factors' to work out the importance of pages it generally gave more relevant results than the previous generation of search engines that worked on simple keyword matching. It was far too easy for commercial and other sites to 'stuff' their pages with keywords to raise their rankings on search results pages.
This is the basic formula for page rank. Google use a variation of this, in which there are believed to be a great deal of minor and variable factors used to influence the final result, but this is good enough an example to work with.
PR(A) = (1-d) + d(PR(t1)/C(t1) + ... + PR(tn)/C(tn))
'PR' is the page rank of an individual page, in this case page 'A'. 'd' is a dampening factor applied to incoming links (0.85), and 'C' is the quantity of outgoing links from a page that links to 'A'. 't1' to 'tn' are the pages linking to page 'A'.
At its simplest, this means
A page's PageRank = 0.15 + (0.85 x (the PageRank of pages linking to it / by the amount of pages that they link to))
However, actual difference between the displayed values is deceptive because, while the exact workings remain shrouded in secrecy, it is clear from usage that PageRank works on a logarithmic scale. It has purposefully been left unclear exactly what scale this is on, but it is generally accepted by professionals in the industry that the scale is around 5 or 6 (ie, a PR 2 page is considered to have 5 or 6 times the importance of a PR 1 page).
Personally I believe the scale probably variable within a region of about 5.5 and 6.66 (please, no apocalyptic emails!). As attempting precision with numbers left purposefully unclear is futile, I use 6 as a rule of thumb for estimates to simplify calculations.
PageRank in Practise
A new page is born, and comes kicking and screaming into the world. From Google's point of view, it has no PageRank at all. It does not exist in its index because Google has never been lead to find it, and therefor to all the many millions of users of Google's search engine it is a nonentity.
If however a page that Google has indexed was to link to it, the next time it is spidered this link will be picked up and eventually followed. Shortly afterwards, the page may appear as indexed by Google, with a PageRank of 0 out of 10.
Note: 0 out of 10 is very different from not appearing in the index. While it is labelled as 0 the base rank is actually 0.15, and thus it is possible for the page to appear in results of a search.
The page is then assigned its rank based on a portion of the PR of the page/s linking to it. This process is covered elsewhere, particularly in SEO community publications, so I will not dwell on it apart from pointing out several common misconceptions:
The Imprecision of PR
It should be clear that the value of a link from a pages is based on a number of factors, primarily the PR of the linking page and the amount of other links from that page. It follows that as the value of links coming from pages is based on PR divided by outgoing links, the eventual PR of the page linked to will not be a whole number.
Thus it is evident that although PR is described using the simple 0 to 10 scale Google have trademarked, this is indicative but not accurate. Two pages could both have a displayed PR of 1, while the first is PR1.001 and the second is PR1.999.
Google quite sensibly doesn't disseminate any more precise information on the PR of individual pages.
'PagePoints' Estimation System
In order to calculate within some reasonable range of accuracy the relative value of incoming links, I use a simple 'PagePoints' system.
As a disclaimer, I should note that this system is for use as a rule of thumb only, as it is an attempt to fill in gaps that have been purposefully left by Google in a formula that they do not want understood. It is simply an attempt to allow a basic level of insight into the relative PageRank value of incoming links.
For this system, I assign PagePoints to a page based on a base 6 logarithmic scale applied to Google's PR. Their scale is secret and it is unlikely that it is an exact whole number *, but thus I have found this to be accurate enough for the link value estimation process.
PagePoints, calculated on base 6
PagePoints, calculated on base 6, including PR 0
By this scale, you can see that a PR 10 page is considered approximately ten million times as important as a PR1 page by Google. You can also see that the higher a page gets on their PR score, the harder it is to get to the next level. It is also clear however that it is possible for one page to have 46656 'pagepoints' and another to have five times as much with 233280, and both will be seen as a PR7.
The observant may have noticed a deliberate error in the above table; if a PageRank of zero is the base equivalent of 0.15 PagePoints, the scale must be based on 6.66 recurring rather than 6. This was my original base, but observation has made something closer to 6 seem more likely, and it is possible that the base page rank is a variable.
Estimating the Value of links
Now lets look at a real life example I saw recently; a site offering to sell links (aka 'sponsored links') from one of its pages, stating that it was a PR6 site (*note, to anyone who wants to survive in online marketing for more than a week, don't ever buy links from a site that actually says it is selling PageRank. That is absolute anathema to Google).
They stated the value of a link from their PR6 site to be worth $250 per month. Ignore for a moment the impossibility of truly valuing a link, or even the value a business will derive from having more page rank on their own site, or the value of actual additional revenue this will provide, and lets assume that is indeed a fair valuation.
On closer examination, the links page on offer was deeper in the site, and had PR5. Always remember that PR is per page and not per site. Then, it had around 25 links out into their site, plus 10 existing sponsored links.
Let's hypothesize that I own a PR3 page and want to see it increased to PR4 for some reason and am willing to pay $250 to do so. This site seems an ideal candidate for a link purchase, even though I have realized that the charge will be for a link from a PR5 page instead of a PR6 one.
This highlights two important issues that anyone who either pays money for links or works hard to get 'organic' links should note:
You are unlikely to be able to encourage a site to link to you for free or get them to reduce their advertising link costs by attempting to convince them that their pages are able to assign less PR value. That is really not their concern. A far more useful application of this theory would be to make sure that when you can get a link or you do agree to a sponsorship, you get it from the best possible page.
Most importantly, and I am sorry to burst the bubbles of so many SEO people out there, but that page may not be the one with the highest PR.
Using this experimental PagePoints system, I have attempted to illustrate the way in which pages of similar on-page relevance to search queries are ranked relative to one another on Google. The numbers are all based on estimates, but I believe this more granular approach to both shed additional light on the way in which ranked pages interact with one another, the way they display on results, and the effects of linking and interlinking pages.
It also attempts to prove my own view that despite the fact that Google have been deliberately obfuscating the algorithm and working of the formula while instead providing a simple zero to ten scale, linking in order to increase an individual page's ranking and perceived relevance does not have to be approached in a haphazard way.
This is particularly important when it comes to purchasing links as a form of advertising in which PageRank is seen as a goal in addition to visitors actually clicking the link, but also when it comes to exchanging links and deciding on one's own site's internal link structure.
I have deliberately not taken a view in this document on the relative merits of how links are obtained, and whether there is some 'moral' difference between a link made organically by a 3rd party, a reciprocal link as an exchange, or a link purchased as a form of advertising. For the purpose of this analysis, that is not important and will largely depend on whether someone is approaching using the internet as an information resource, a new broadcast media for ideas or as a marketing tool.
What is important, in any form of advertising, any request for a free link, any purchase, or any exchange, is to know the Value.
About the author:
Matt J. Aird has 8 years experience in developing corporate websites, major business directories and search facilities. In recent years, this has been with the world's leading meta-search company, Infospace.
Projects have included work on search engines such as the Dogpile and Webcrawler.de (Germany) meta-search engines, as well as business directory search tools such as Thomweb (United Kingdom), TradePage (South Africa) and WorldOnline (Belgium).
About PagePoints: PagePoints is my own working title for the concept of assigning an estimated numeric value to the pagerank of individual pages on the web. It should be noted that the PR values of pages are purposefully left vague by Google, meaning that at best calculations can be helpful indicators of potential values and should not be taken as definitive. The goal of this article is to educate and promote debate rather than advise on marketing spend.
PageRank is the name of an algorithm for ranking documents on the internet developed by Lawrence Page, and is a trademark of Google.
This essay is a work in progress rather completed document. Any input, suggestions or corrections are welcome.
Contact the author firstname.lastname@example.org