Group Abstract

Message Boards

WOLFRAM COMMUNITY

7.6K Views

20 Replies

8 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Mathematica Wolfram Language Statistics and Probability

How to write Mathematica for this statistical model?

michelle yrw

Posted 12 years ago

One mathematical model of web surfing is given by starting at a random web site. Then, with 85% probability, the next page visited is selected randomly from pages linked to the current page. With 15% probability, the next page is selected randomly from all webpages. Consider, for example, the "mini" web of pages pictured below. If a surfer is at page 4, there's an 85% chance the surfer will follow a link to page 1, 3, 5, or 8 (each with equal probability), and a 15% chance the surfer will go to page 1, 2, 3, 4, 5, 6, 7, or 8 (again, each with equal probability)

POSTED BY: michelle yrw

20 Replies

Sort By:

michelle yrw

Posted 12 years ago

Great, I will test it. That is the graph for the website, now you got it. I have another challenge problem for you later. Nice. I will study on it now.

POSTED BY: michelle yrw

Marco Thiel

Marco Thiel, University of Aberdeen - Dept. of Physics/Mathematics

Posted 12 years ago

Regarding the probabilities do you mean this: probmatrix = Table[matrix[[k]] /. {0 -> 0.15/Sort[Tally[matrix[[k]]]][[1, 2]], 1 -> 0.85/Sort[Tally[matrix[[k]]]][[2, 2]]} , {k, 1, Length[matrix]}] // MatrixForm In that case the eigenvector for the largest eigenvalue (which is one!; probability matrix) is: Eigenvectors[probmatrix][[1]]/Total[Eigenvectors[probmatrix][[1]]] which gives {0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125} Cheers, M.

POSTED BY: Marco Thiel

Marco Thiel

Marco Thiel, University of Aberdeen - Dept. of Physics/Mathematics

Posted 12 years ago

Dear Michelle, thank you for the matrix. Here are some comments. I first save the matrix matrix = {{0, 1, 0, 0, 0, 0, 1, 0}, {0, 0, 0, 0, 0, 1, 1, 0}, {0, 1, 0, 0, 1, 0, 0, 0}, {1, 0, 1, 0, 0, 0, 1, 1}, {0, 0, 0, 0, 0, 1, 0, 1}, {1, 0, 0, 0, 0, 0, 0, 1}, {0, 0, 0, 1, 1, 1, 0, 0}, {0, 1, 0, 0, 0, 0, 1, 0}} I then plot it. AdjacencyGraph[matrix, VertexLabels -> "Name", ImagePadding -> 10] Looks nice: Next I calculate the eigenvalues - strictly speaking this is not necessary for your problem. Eigenvalues[matrix] // N This gives {2.3728, -0.680664 + 1.64275 I, -0.680664 - 1.64275 I, -1.23221, 0.295924 + 0.70383 I, 0.295924 - 0.70383 I, -0.371109, 0.} The normalised eigenvector to the largest (first) eigenvalue is obtained by Normalize[Part[Eigenvectors[matrix] // N, 1]] Now that gives the desired result {0.31902, 0.304159, 0.232608, 0.557762, 0.247773, 0.268897, 0.452812, 0.31902} which are the relative times to spend at the different nodes. If we normalise the sum to 1, we obtain the percentage of the time that a random walker would spend at a given node. Part[Eigenvectors[matrix] // N, 1] / Total[Part[Eigenvectors[matrix] // N, 1]] This gives {0.118066, 0.112566, 0.0860857, 0.206422, 0.0916982, 0.0995159, 0.167581, 0.118066} So about 12% of the time the random walker is at node one etc. Just to check the result, let's use the power algorithm to determine the eigenvector. I define an arbitrary starting vector: vecstart = Table[1., {i, 1, Length[matrix]}] ({1., 1., 1., 1., 1., 1., 1., 1.}) I then apply the matrix to the vector and iterate...say 100 times: Nest[Normalize[matrix.#] & , vecstart, 100] The result is {0.31902, 0.304159, 0.232608, 0.557762, 0.247773, 0.268897, 0.452812, 0.31902} which is identical to the one above. Normalising this to 1 in the numerically most clumsy way Nest[Normalize[matrix.#] & , vecstart, 100]/Total[Nest[Normalize[matrix.#] & , vecstart, 100]] gives the relative times spent at the nodes {0.118066, 0.112566, 0.0860857, 0.206422, 0.0916982, 0.0995159, 0.167581, 0.118066} I hope that this helps. Note that it does not take the probabilities that you gave into consideration. M.

POSTED BY: Marco Thiel

Marco Thiel

Marco Thiel, University of Aberdeen - Dept. of Physics/Mathematics

Posted 12 years ago

Dear Michelle, I just ran exactly your code (copy and paste), but it came out a straight line with a circle. Is it strange? No, that is not surprising. After all, we are performing a random walk on the network. Everytime you run it it will give you a different sequence of websites. This question asks " Use Monte Carlo Simulation to determine the percentage of time a user spends at each of the eight sites in the mini-web " I am not sure what mini-web you are referring to; can you upload the figure? If you are given a simple (toy) network, you would proceed as Sean Clarke has suggested. You would use Markov. So probably you want to find the eigenvector to the dominant eigenvalue. The values of the components of the vector will give you the relative times spent on the websites. M.

POSTED BY: Marco Thiel

michelle yrw

Posted 12 years ago

Hi, all, I have the following matrix. {0, 1, 0, 0, 0, 0, 1, 0}, {0, 0, 0, 0, 0, 1, 1, 0}, {0, 1, 0, 0, 1, 0, 0, 0}, {1, 0, 1, 0, 0, 0, 1, 1}, {0, 0, 0, 0, 0, 1, 0, 1}, {1, 0, 0, 0, 0, 0, 0, 1}, {0, 0, 0, 1, 1, 1, 0, 0}, {0, 1, 0, 0, 0, 0, 1, 0} For each i I have the probability for each "1" in the matrix, I have the probability 0.85 divided the no. of 1 in the row. Fox example, for the first each "1" is 0.85/2 for the second row is 0.85/2, but for the forth row, the probability is 0.85/4. and for each 0, I have the 0.15 / (how many 0 in the row), for the first row, the Probability is .15/6. How I can use Array to insert the all probability correctly in the matrix? ASAP thanks

POSTED BY: michelle yrw

michelle yrw

Posted 12 years ago

When I increase m to 1000, I got a beautiful picture.

POSTED BY: michelle yrw

michelle yrw

Posted 12 years ago

Do you have your email address, so I can sent the real question to you. SInce I still confuse something and I don't know how to load the pdf picture in the quesion here.

POSTED BY: michelle yrw

michelle yrw

Posted 12 years ago

Hi, all, I am newer for the code. This question asks " Use Monte Carlo Simulation to determine the percentage of time a user spends at each of the eight sites in the mini-web " Where I can see the percentage?

POSTED BY: michelle yrw

michelle yrw

Posted 12 years ago

I just ran exactly your code (copy and paste), but it came out a straight line with a circle. Is it strange?

POSTED BY: michelle yrw

michelle yrw

Posted 12 years ago

Re: Your link http://reference.wolfram.com/mathematica/example/StructureOfTheWeb.html is really beautiful. Marvelous!

POSTED BY: michelle yrw

michelle yrw

Posted 12 years ago

Hi, Here is the code I have, what do you think about this one? The idea: 1. Create a function that takes in a number of iterations to run for. 2. Use the "Module to create a list of variables you'll need to use in the function. One of these must be a counter to keep track of how many times you've hit each website. rankwebsite[n_] := Module[{a, b, c, list, count = {0, 0, 0, 0, 0, 0, 0, 0}}, (3. Create a list that represents the network inside the function) \ list = {}; (4. Generate a website to start with using the RandomInteger \ function) a = RandomInteger[{1, 8}]; (5. Increase the counter for that website in your count by 1) count[] = count[] + 1; (6. Create a for loop to go through the iterations) For[i = 1, i <= n, i++, (7. Create another RandomInteger to store the probability) b = RandomInteger[{1, 100}]; (8. If the probability is from 85-100, you want to go to another random website, so you reset your initial variable,a, and increase its counter by 1 as before. Otherwise, go to a website linked to the website you're currently on) If[b > 85, a = RandomInteger[{1, 8}]; count[] count[] + 1;, c = RandomInteger[]; (Access a number for the linked website \ you will now go to) a = list[][]; count[] = count[] + 1; (Increase counter again) ]; ]; (Now all you have to do is print out the websites in such a way \ that you see the rankings clearly) ;]

POSTED BY: michelle yrw

Marco Thiel

Marco Thiel, University of Aberdeen - Dept. of Physics/Mathematics

Posted 12 years ago

Well, my name is not Vote 1, but let me try to amend the code. :-) I have somewhat improved the code, but it still runs frequently into problems, i.e. it uses links to sign up for accounts and then there "is no way out". Anyway, the following piece of code is somewhat more reliable than the first bit. The odds are that it does not "survive" a random walk of length 100, but you can easily use the intermediate results. I borrowed two functions from this beautiful demonstrations project: http://reference.wolfram.com/mathematica/example/StructureOfTheWeb.html Here it comes: ClearAll["Global`*"] M = 100; webcrawler[rooturl_, depth_] := Flatten[Rest[ NestList[ Union[Flatten[ Thread[# -> Import[#, "Hyperlinks"]] & /@ Last /@ #]] &, {"" -> rooturl}, depth]]]; style = {VertexStyle -> White, VertexShapeFunction -> "Point", EdgeStyle -> Directive[Opacity[.5], Hue[.15, .5, .8]], Background -> Black, EdgeShapeFunction -> (Line[#1] &), ImageSize -> 500}; walk = {"http://www.wolfram.com"}; Monitor[ For[i = 1, i < M, i++, AppendTo[walk, Quiet[RandomChoice[ Select[webcrawler[walk[[-1]], 1], StringFreeQ[ ToString[#], {___ ~~ "Failed" ~~ ___, ___ ~~ "pdf", ___ ~~ "PDF", "mailto" ~~ ___, ___ ~~ "jpg", ___ ~~ "JPG"} ] &]][[2]]]]; DeleteCases[walk, HoldPattern[RandomChoice[{}][[2]]]]], i] Graph[Union[ Table[UndirectedEdge[Sort[{walk[[i]], walk[[i + 1]]}][[1]], Sort[{walk[[i]], walk[[i + 1]]}][[2]]], {i, 1, Length[walk] - 1}]], style] You will see that I created a kind of exclusion list with the command StringFreeQ. If I wanted to make the program more reliable, I would have to exclude further formats, such as gif etc. This piece of code is far from fully functional and very slow, but it might still serve as a starting point. Also, this random walk is unsupervised; as the walks get longer, you might end up at improper websites. This might be quite a problem! Depending on the legislation in your country and the policies of your work place you might actually get into trouble, by accessing illegal content. There is a 15 year old Nature article http://www.nature.com/nature/journal/v401/n6749/full/401130a0.html that might be relevant. That estimates the "diameter of the web", i.e. the shortest distance between any two points in the network. It claims that randomly chosen webpages are on average 19 clicks or so away. Now the web is a bit larger, but the average path length between any two pages won't have changed a lot. Given the amount of dubious websites, this might suggest that performing a random walk, there is a relatively high probability that inadvertently you end up at very nasty pages. Cheers, M.

POSTED BY: Marco Thiel

michelle yrw

Posted 12 years ago

To: Vote 1, When I use your code, I got something like Graph::supp: Mixed graphs and multigraphs are not supported. >> I couldn't figure out where the something I have to fix. Thanks

POSTED BY: michelle yrw

michelle yrw

Posted 12 years ago

Thanks for all of you above. Very helpful. See you for other questions.

POSTED BY: michelle yrw

Marco Thiel

Marco Thiel, University of Aberdeen - Dept. of Physics/Mathematics

Posted 12 years ago

There is also a quite nice book on this: "A Course on the Web Graph" by Anthony Bonato published by the American Mathematical Society. Many of the ideas in that book can be efficiently implemented in Mathematica. Actually, there are page ranking methods which aare based on this kind of thing, i.e. HITS (hyperlink-induced topic search), the well known page rank, and SALSA (Stochastic Approach for Link Structure Analysis). PageRank is sort of the stationary distribution of a random walk on a digraph; it represents the probability that a random surfer visits the page. The algorithm is just as you describe, i.e. a random walk with some probability of jumping to a completely unrelated page. On page 107 of "A Course on the Web Graph" you find a theorem that shows how the dominant eigenvector s, i.e. the one with eigenvalue 1, of the (suitably defined) transition matrix, is crucial here. The PageRank Markov Chain converges to a stationary distribution equalling s. Once the transition matrix is known it is, in theory, straight forward to calculate the dominant eigenvector with Mathematica. Of ocurse there are the obvious limits for extremely large graphs. Mathematica estimates the number of websites == estimated number of indexed web pages to be roughly 10.82 billion as of May 2012. I suppose that in order to calculate the dominant eigenvector for the "entire network" there are substantial numerical difficulties on standard computer systems, even if we use sparse matrices. It appears that Google use the power iteration scheme to calculate the page rank; this takes a week or so of intensive calculations on quite powerful computer systems. There are algorithms based on Monte Carlo approaches which converge quite quickly and can be easily parallelised: http://www-sop.inria.fr/members/Konstantin.Avratchenkov/pubs/mc.pdf Cheers, M. NB: The google toolbar allows you to see the PageRank of any given website: https://support.google.com/toolbar/answer/79837?hl=en-GB

POSTED BY: Marco Thiel

Sean Clarke

Sean Clarke, Wolfram Research

Posted 12 years ago

You could write a Monte-Carlo algorithm to compute this. Doing this would probably first require learning more about the basics of programming Mathematica. In the case of a well Markov Process, there's no need to resort to using a Monte-Carlo approximation to get this information. You can use the MarkovProcessProperties function to calculate the Mean Holding Time of each state. MarkovProcessProperties has a property called "HoldingTimeMean" which might be helpful.

POSTED BY: Sean Clarke

michelle yrw

Posted 12 years ago

Many thanks, I will take a look. Cheers.

POSTED BY: michelle yrw

Marco Thiel

Marco Thiel, University of Aberdeen - Dept. of Physics/Mathematics

Posted 12 years ago

Dear Michelle, you might want to try this code: ClearAll["Global`"] (length of the random walk) M = 20; style = {VertexStyle -> White, VertexShapeFunction -> "Point", EdgeStyle -> Directive[Opacity[.5], Hue[.15, .5, .8]], Background -> Black, EdgeShapeFunction -> (Line[#1] &), ImageSize -> 500}; (We start at the BBC new website*) walk = {"http://www.bbc.co.uk/news/"}; For[i = 1, i <= M, i++, string = RandomChoice [Import[#, "Hyperlinks"]] & @ Last[walk]; If[UnsameQ[string, RandomChoice[{}]], AppendTo[walk, string]]] Table[walk[[i]] -> walk[[i + 1]], {i, 1, Length[walk] - 1}]; Graph[Table[walk[[i]] -> walk[[i + 1]], {i, 1, Length[walk] - 1}], style] A standard output would be: There are lots of obvious ways to improve this. Also it does not include the 15% probability to jump to a random website yet, but that is straight forward to program into it. You can for example use http://www.randomwebsite.com/cgi-bin/random.pl and jump to where that redirects to get a new fresh starting site for your random walk. Cheers, M.

POSTED BY: Marco Thiel

michelle yrw

Posted 12 years ago

Actually, can we use the Monte Carlo Simulation to determine the percentage of time a user spends at each of the eight sites ? Thanks.

POSTED BY: michelle yrw

Sean Clarke

Sean Clarke, Wolfram Research

Posted 12 years ago

You've described a Discrete Markov Process. Mathematica 9 has a function specifically for these: http://reference.wolfram.com/mathematica/ref/DiscreteMarkovProcess.html

POSTED BY: Sean Clarke

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback