Over the past two weeks, I worked on setting up Python 3.6 onto my computer, but it is still not working on my computer. I also read about PageRank so I can get the PageRank code to work on my computer as soon as Python is working. PageRank works like this:
PageRank of site = Sum (PageRank of inbound link / Number of links of page)
I read a good explanation of this on a site called eFactory they explained it like this:
PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))
- PR(A) is the PageRank of page A
- PR(Ti) is the PageRank of pages Ti which link to page A
- C(Ti) is the number of outbound links on page Ti
- d is a damping factor, which can be set between 0 and 1.
Here is an example:
We have a small web of three pages A, B, and C. Page A links to the pages B and C, page B links to page C, and page C links to page A. According to Page and Brin, the damping factor d is usually set to 0.85, but for this example, I will make it 0.5 for easy math. The exact value of the damping factor d admittedly has effects on PageRank, but it does not influence the fundamental principles of PageRank. Now we get the following equations for the PageRank calculation:
PR(A) = 0.5 + 0.5 PR(C)
PR(B) = 0.5 + 0.5 (PR(A) /2)
PR(C) = 0.5 + 0.5 (PR(A) / 2 + PR(B))
These equations can easily be solved to get:
PR(A) = 14/13 = 1.07692308
PR(B) = 10/13 = .76923077
PR(C) = 15/13 = 1.15384615
The sum of the pages is 3 and thus equals the total number of web pages. It is easy to find the rank of web pages using this method, but this example only contained three web pages. In practice, the web consists of billions of documents and it is not possible to find a solution by inspection.
Due to the size of the web, the Google search engine uses an approximate, iterative commutation of PageRank values. This means that each page is assigned an initial starting value and the PageRank of all pages are then calculated in several computation circles based on the equations determined by the PageRank algorithm. The iterative calculation will again be illustrated by our three-page example, whereby each page is assigned a starting PageRank value of 1.
For this example, we see that we start to get a good approximation after only a few iterations. You would need about 100 to get a good approximation of the PageRank values of the whole web.