Group Abstract Group Abstract

Message Boards Message Boards

Probability distribution of two numbers randomly changing?

Posted 10 years ago

Hello,

I need help from this community, hoping that there is an experienced mathematician reading this post. I would like to calculate the probability distribution for the following problem: two numbers can change randomly with probability m from a given set of numbers of length a. For instance, they can get values from 1 to 6 like dices. In this case a=6. So what is the probability that after v trials the numbers reach a couple of target numbers? This is simple if m=1, that is, if the numbers change at every trial. In that case, the probability for any number to be the right one is p=1/a. On a given unsuccessful trial either both numbers are false or one of them right and the other false. So before both numbers are correct with probability p^2 on trial v, there were v-1 unsuccessful trials. So we have

P[v_] := ((1 - p)^2 + 2 (1 - p) p)^(v - 1) p^2

Now if m<1 one would expect that p=m/a with the same distribution than above, but this is far from being the case, as numerical results show. I used the following code to get these numerical results:

z = 10^3; Do[e[i] = 0, {i, 0, 10^7}];
a = 6; t = 2; target = {2, 5};
Do[
 v = 0; parent = RandomInteger[{1, a}, t];
 While[parent != target,
  v = v + 1;
  Do[
   mutate = RandomInteger[{1, 20}];
   If[mutate == 1, parent[[i]] = RandomInteger[{1, a}]],
   {i, 1, t}]];
 e[v] = e[v] + 1,
 {z}]

For big z, the quantity e[v]/z tends to the probability P(v) and m=1/20 (mutate). I found an exact analytical solution, but it is horribly complicated. I would need several pages to explain it here. This is why I wonder if there is not a simpler solution. I am sure that this calculation has already been made and is written down in some book. Has anyone an idea?

POSTED BY: Ulrich Utiger
11 Replies
Posted 8 years ago
POSTED BY: Ulrich Utiger

Interesting article with obviously a lot of work. My knowledge on this topic is almost zero but the direction of the thinking seems to be clear. Perhaps the conclusion is obvious but I felt it's not really clear. You ended your research with:

This is why the time is overdue to unmask this blind passenger, let him row back to his country of origin, the land of fantasy, and open the mind for the obvious origin of the immense and beautiful variety of all vegetal and animal species.

I just wondered, what is the obvious origin you're mentioning? Just curious.

POSTED BY: l van Veen
Posted 8 years ago

In case someone is interested, I published an update on ResearchGate. Do you know a journal that would be interested in publishing this kind of stuff?

POSTED BY: Ulrich Utiger
Posted 9 years ago

Hello,

I finally solved all problems and finished my paper. Took a lot of time... I published it on academia.edu. If you are interested, you can see it here. Any comments and suggestions are highly appreciated. You can do this directly in the article.

POSTED BY: Ulrich Utiger

After a bit of reflection, I see what's still missing from my story: everything I wrote is correct, but only for a process that does not terminate. That's where the nuance lies. The problem is now to find the probability distribution for the first time the process achieves a given state (which I haven't calculated yet). At the moment I don't have time to go into depth about this (and I'd have to figure this out for myself as well), but I suggest you take a look at FirstPassageTimeDistribution in Mathematica.

For example, if you have a single letter drawn from A, T, C and G that mutates with probability m, then Mathematica gives you as the probability for achieving the state A for the first time in v steps:

PDF[
  FirstPassageTimeDistribution[
     DiscreteMarkovProcess[{1, 1, 1, 1}/4, transitionMatrix[m]], 
     1
  ], v] = (9 (1 - m/3)^v m) / (4 (-3 + m)^2) 

(for v > 1). I suggest you play around with that.

POSTED BY: Sjoerd Smit
Posted 10 years ago

Maybe your method is much better than mine for this kind of problem but it still lacks something as it does not yield the same result than my solution, from which I know that it is exact because it agrees with the numerical solution (see the plot below: blue points=numerical result; red line=analytical result): Probability distribution: blue points=numerical; red points=analytic

What I got is the following for a=4, v=2 and v=3:

P[2] = -(3/256) m (-8 + 3 m);
P[3] = -((3 m (-128 + 80 m - 48 m^2 + 21 m^3))/4096);

Your solution does not seem to yield the same. Maybe this is because the letters are not independent from each other, because if a trial v is successful requires that all letters be correct, that is, correspond to the target. Also, whether a letter is correct or not on trial v depends on the issue on trial v-1, that is, on the previous one. For instance, if it was correct previously and does not mute it will be correct. This probability is: P(correct and does not mute) = 1/a (1-m). On the other hand: P(false and does mute to the correct number) = (1-1/a) m/a This is not the same. So there are conditional probabilities that possibly you did not consider. But your method may be improved. At the moment, I don't see clear since I am not familiar with Markov chains. But I will definitely take a closer look at this.

POSTED BY: Ulrich Utiger
POSTED BY: Sjoerd Smit
Posted 10 years ago

I am looking for an analytical solution for the problem Richard Dawkins has proposed numerically in his book "The Blind Watchmaker". His goal was to show that functional pieces of DNA can be formed by chance within reasonable time limits. In order to do this, he used a similar algorithm than mine shown in my first post. His target was the sentence from Shakespeare "methinks it is like a weasel". So a=27 here, that is, the length of the English alphabet with a space included. This sentence can be reached within only some steps if natural selection enters into the process. To simulate this, he created a parent in the first iteration, that is, an arbitrarily chosen series of 28 letters. Then he made kids of this parent by coping the parent a number of times. But these are not verbatim copies. Each letter can mutate with probability m when it is copied. Then he takes the kid that is the closest to the target and makes out of him the parent and the algorithm restarts. He showed that in this manner the target is reached after some 50 iterations or so. In my algorithm above, I did not include natural selection yet. Numerically, this is easy to do. But it is more difficult to get an analytical solution, which is what I am working on right now. I simplified the algorithm with numbers instead of letters (works faster) and it has a reduced alphabet. I found this probability distribution above with matrices in the case where the target is {2,5}. It doesn't matter what the target is. It could also be {1,1}. So my goal is to find an analytical probability distribution for a target of arbitrary length t: what is the probability P(v) that the target is attained in v trials. From this could then be calculated the mean value of v, which is a function of a, m, t and the number n of kids produced. With such a function, it would be possible to estimate whether natural selection is a probable process. This is the final goal.

POSTED BY: Ulrich Utiger

I get the impression that we don't really understand each other here. Could you repeat your basic problem again, but as clearly as possible? I'm not quite sure I understand what you're actually interested in and what you're doing.

POSTED BY: Sjoerd Smit
Posted 10 years ago
POSTED BY: Ulrich Utiger
POSTED BY: Sjoerd Smit
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard