Message Boards Message Boards

[WSG23] Daily Study Group: Introduction to Probability

A Wolfram U Daily Study Group on Introduction to Probability begins on February 27th 2023.

Join me and a group of fellow learners to learn about the world of probability and statistics using the Wolfram Language. Our topics for the study group include the characterisation of randomness, random variable design and analysis, important random distributions and their applications, probability-based data science and advanced probability distributions.

The idea behind this study group is to rapidly develop an intuitive understanding of probability for a college student, professional or interested hobbyist. A basic working knowledge of the Wolfram Language is recommended but not necessary. We are happy to help beginners get up to speed with Wolfram Language using resources already available on Wolfram U.

Please feel free to use this thread to collaborate and share ideas, materials and links to other resources with fellow learners.

REGISTER HERE

enter image description here

Wolfram U Banner

POSTED BY: Marc Vicuna
201 Replies

My solution to Mock Exam Question 3 Choose a uniformly random point in the unit square with corners in (0,0) and (1,1). What is the point's expected distance from the origin? is

Norm[Expectation[{x, y} - {0, 0}, {x, y} \[Distributed] 
   UniformDistribution[2]]]

I don't understand why this doesn't match the answer of approximately 0.76. What did I do wrong?

POSTED BY: Peter Burbery

The simple answer is: the expectation of the norm is not the same as the norm of the expectation .The norm is not a linear function, so the E[f[X] ] != f[ E[X ]]. So this should work instead:

Expectation[Norm[{x,y}],{x,y}\[Distributed]UniformDistribution[2]]
POSTED BY: Dave Middleton

Hello Peter,

Indeed, as Dave pointed out, you're basically abusing of the properties of expectation. Notice f(E(x)) is equal to E(f(x)) if f(x) is a linear function. For non-linear functions, f(E(x)) is not equal to E(f(x)). Norm is not a linear function. It's interesting to note you have access to an inequality for convex and concave functions, see Jensen's inequality.

POSTED BY: Marc Vicuna

Where can I download the updated study materials? I found it at https://amoeba.wolfram.com/index.php/s/LgfWrztXo9xELo4. Are these the notebooks that have been updated with the corrections mentioned in this community post? I noticed that the most recent modification dates were set to 2 months ago.

POSTED BY: Peter Burbery

Peter, I think all updated course notebooks can be found in the course framework: https://www.wolframcloud.com/obj/online-courses/introduction-to-probability/what-is-probability.html

POSTED BY: Dave Middleton

Hi Peter,

Indeed, as Dave said, please refer to the ressource found on Wolfram U for the latest version of any notebook.

POSTED BY: Marc Vicuna

Dear Marc, While catching up this month, I came across a number of suggested errata; see the text below. Thank you for putting together this course. It was a lot of fun as a fast track probability review. Cheers, Dave


MODERATOR NOTE: notebook Suggested Errata was moved to the attachment below and also can be viewed at https://www.wolframcloud.com/obj/5f715089-9e6c-4e90-9733-f3a69ddec8e9 Wolfram cloud notebook.

Attachments:
POSTED BY: Dave Middleton

Hello Dave,

Thank you for this exhausive review, this is currently being addressed and will be changed soon on the framework. Let me address these concerns one by one for you:

Lesson 2: correct, unique objects are used.

Exercice 7: correct, the function in the question will be modified.

Exercice 10: correct, the + are now added.

Exercise 15: correct, the coefficients will be added.

Exercise 18: 4: correct, the word will be changed. 5: incorrect, this is not a root, the only root taken is because of the transfer from variance to standard deviation.

Exercise 19: this will not be changed, since this is a question of interpretation of the dataset, a skill we want to develop in this course.

Exercise 21: slight changes were made to make it clearer. However, consider the only requirement here is to try to interpret clusters. Moreover, answers only always portray one possible way to answer. Giving steps would go against the many possible ways to answer the question.

Exercise 23: This was already corrected from another community post. Moreover, the -1 is not necessary, only that it is a negative number. There is no issue if your method is slightly different. Question 4: both are true. The questionnaire goes from 1 to 10, but the distribution is from 3 to 10. in other words, the probability of 1 to 2 is 0, as no one gave that answer. Question 5: this is found in the term "average" of distributions. A combination of distributions does not imply equal weights, but an "average" does.

Lesson 24: this seems to be a framework problem, this will be updated soon.

Exercise 24: No, there is no mathematical difference. You are adding 0.5, is 0.5 included or excluded? Not only does this make no difference in calculation, it also is impossible to argue one or the other semantically, as the approximation is independent of a finite point on an interval. So in the continuous case, it is preferrable to avoid equality symbols since it complicates the formulation without changing anything, even if we're aiming for the most absolute mathematical correctness.

Quiz 6: this was corrected.

Practice Exam: correct, this will be corrected soon.

Thank you again for this huge help in the review of this course. Good luck on your probabilistic endeavors!

POSTED BY: Marc Vicuna

Hi Marc;

Do you have an updated Mock Exam that includes all the corrections, that I can download. Also, were any changes/corrections made to the downloaded questions and other notebooks?

Thanks,

Mitch Sandlin

POSTED BY: Mitchell Sandlin

Hello Mitchell,

For all errors or bugs, it takes us about 1 or 2 days for it to be corrected and on the course framework. So to access the updated version, always refer to the notebooks found on the course framework. The practice exam and multiple exercise notebooks were corrected since the launch of this course, so please redownload those on the course framework if you were using the old original notebooks.

POSTED BY: Marc Vicuna

Lesson 24 Exercise 1 : " Use the normal approximation. Note 300 is excluded and 330 is included." seems like it should read "Use the normal approximation. Note 100 is excluded and 300 is included."

POSTED BY: Joseph Smith

Hello Joseph,

Indeed, this seems to be a mistake, it will be corrected. Thank you for noticing.

POSTED BY: Marc Vicuna

For Lesson 23 Exercise 5, what is the point of averaging data from 3 distributions? samples =

Table[Sum[
RandomVariate[
d], {d, {NormalDistribution[0, RandomReal[]], 
CauchyDistribution[0, RandomReal[]], 
StudentTDistribution[0, RandomReal[], 2]}}], 1000];

Why are we setting the standard deviations to a RandomReal[] number?

POSTED BY: Joseph Smith

Hello Joseph,

As said in the question, there are many approaches to the problem. You may set your standard deviations in whatever way you want. Maybe the only restriction is that it doesn't surpass the standard deviation of exercise 4:

N@StandardDeviation@DiscreteUniformDistribution[{3, 10}]

which is about 2.29. The key here is to experimentally explore the difference in convergence between the weak and strong law of large numbers.

POSTED BY: Marc Vicuna

For Lesson 23 Exercise 1, we are asked to calculate the upper bound to the probability a cell has a size between 13 and 37 µm? with the mean = 25. By computing P(|X - mu|)>=(sigma/k)^2 aren't we computing the upper limit on the probability that the cell size will be OUTSIDE the range of 13 - 37? Think about the answer: we are saying the upper limit on the probability that the value will be within 2 standard deviations of the mean is 1/4. Does that make sense?

POSTED BY: Joseph Smith

Hello Joseph,

Indeed, this doesn't make sense. This mistake was caught and corrected, please refer back to the framework exercise notebooks.

POSTED BY: Marc Vicuna

Hello,

Well, there is definitely a difference and its technical and significant.

Basically, in the lesson, the emphasis is that the normal distribution is shown for the student to see the division of the standard deviation by the root of number of samples. You are asked for the probability of a mean, thus you "created" the sample distribution and your probability is only applied to means, but you initial data was not means.

In the exercise you mention, your data is already a set of means. Therefore, you can just estimate the data and you have a sample distribution. In that case you don't use the factor of square root of the number of samples and you get the right answer.

This all has to do with what your initial data is. Is it a mean, or raw data from a simple element, individual or such?

I hope I made this clear.

POSTED BY: Marc Vicuna
Posted 1 year ago

That makes a lot more sense now, thanks!

POSTED BY: Parker Robb

In slide 8 for Lesson 16 it says that "From Newtonian physics, the horizontal distance function is v^2Sin[alpha]Cos[alpha]/g" . There should be a factor of 2 in the numerator. The rest of the slide correctly includes the factor of 2 in the calculations.

POSTED BY: Joseph Smith

Hello Joseph,

Thank you for noticing, this will be corrected shortly.

POSTED BY: Marc Vicuna
Posted 1 year ago

Howdy Marc,

One of the questions on the version of the final exam appears to be missing the referenced "sample" code,

StyleBox[Cell["What is the estimated variance of the savings ratio in the \:201cSample Data: Life Cycle Savings\:201d dataset? The dataset is normally distributed. Use the following code to obtain the data.", ExpressionUUID -> "7d7e813b-1bac-405a-b85e-d2849831a823"], "ProblemCaption", StripOnInput -> False] Thanks, John

POSTED BY: John Davidson

Hi John,

This is now fixed, thank you for noticing.

POSTED BY: Marc Vicuna
Posted 1 year ago

I found what I think is a simpler solution to Lesson 24 Exercise 4:

Nest[TransformedDistribution[
   x + y, {x \[Distributed] #, y \[Distributed] #}] &, 
 BinomialDistribution[2, p], 9]
POSTED BY: Updating Name
Posted 1 year ago

Also for Exercise 5 in the same lesson:

Nest[TransformedDistribution[
   x + y, {x \[Distributed] #, y \[Distributed] #}] &, 
 NormalDistribution[2, Sqrt@31/2], 2]
POSTED BY: Parker Robb

Hi Parker,

You seem to be missing the point here. This is also a valid solution, but a much more obscure one. How would you explain the standard deviation have to be divided by 2 for the addition of nested distributions? This result is from the variance, but explaining it is difficult.

The solution given restricts all standard deviations to be integer numbers, which is wildly unnecessary, but it facilitates the explanation and calculus.

POSTED BY: Marc Vicuna
Posted 1 year ago

I see what you mean.

When I did the problem I proceeded without the assumption that σ of each distribution had to be an integer. Since σ in the starting distribution can be whatever, I back-calculated a single factor for σ, instead of factoring by several different integers as the exercise solution does. I see after reading the solution that that factor comes about through the "distance formula" rule:

In[] := Nest[Sqrt[#^2 + #^2] &, Sqrt@31/2, 2]
Out[] := Sqrt[31]
POSTED BY: Parker Robb

Hi,

Indeed, that solution is much more efficient. However, the average student may not be familiar with recursion. The RecursionTable was used to show the recursive steps the function is going through throughout the process, to allow solving with complete understanding. But yes, purely for computation, the Nest function is better here.

POSTED BY: Marc Vicuna
Posted 1 year ago

StyleBox[Cell[Cell[RowBox[{RowBox[{"Select", "[", RowBox[{RowBox[{"ExampleData", "[", RowBox[{"{", RowBox[{"\"Statistics\"", ",", " ", "\"USEarthquakes\""}], "}"}], "]"}], ",", " ", RowBox[{RowBox[{RowBox[{"#", "[", RowBox[{"[", "1", "]"}], "]"}], " ", "<", " ", "1900"}], " ", "&"}]}], "]"}], "[", RowBox[{"[", RowBox[{"All", ",", " ", "7"}], "]"}], "]"}], "InlineCode", ExpressionUUID -> "a4b6a146-104b-452e-829d-1f5fde573d7b"], ExpressionUUID -> "304430c9-d846-45ec-9492-5c906e03147f"], "ProblemCaption", StripOnInput -> False]

Copy and paste doesn't seem to work in the final exam. I tried to to copy a line that starts out "Select[ExampleData" etc and got the above.

POSTED BY: Updating Name
Posted 1 year ago

On my exam, the following appeared:

​ LESSON11

Which distribution best describes the following data? Hint: use FindDistribution and EstimatedDistribution.

But there was no data.

POSTED BY: Updating Name

Hello,

We will correct this shortly, I'll post again to confirm it has been corrected.

POSTED BY: Marc Vicuna

The issue has been addressed and corrected.

POSTED BY: Marc Vicuna
Posted 1 year ago

In question 35 of the Practice exam, we consider the permutations of a group of letters to determine which are words. It seems that Mathematica finds an extra word if the letters are capitalized….

In[1]:= Tally[DictionaryWordQ/@StringJoin/@Permutations[{"r","e","s","e","t"}]]

Out[1]= {{True,6},{False,54}}

In[2]:= Tally[DictionaryWordQ/@StringJoin/@Permutations[{"R","E","S","E","T"}]]

Out[2]= {{True,7},{False,53}}

The word “STERE” is missing from the uncapitalized words...

POSTED BY: Byron Zollars

Hello Byron,

Indeed, the answer would then be 7/60. This will be corrected. Thank you for noticing!

POSTED BY: Marc Vicuna

On Exercise 1 of Exercises-23.nb, I believe the solution shows the upper bound for the probability that x<=13 or x>=37. Then 1 minus that probability gives a lower bound for P(13<=x<=37). The same thing happens in Problem 2 of Quiz 6.

Also, for Problem 5 in Quiz 6, one answer shows the probability of x>260 using a Normal to approximate Discrete Data, but no answer shows for x>260+.5.

On Exercises 1 and 2 from Exercises24.nb, the solutions use the variance instead of the standard deviation as the second parameter for NormalDistribution[].

On exercise 2 of Excercises25, the solution uses different attributes.

Hello Juan,

  1. This is new and will be corrected, indeed it should be a lower bound, 1-the probability.
  2. Why would this be +0.5? "At least" implies inclusion, so -0.5 is more appropriate.
  3. Right on! We will correct this! Personally this is one of my most reoccurring mistakes, I find it very counterintuitive to use standard deviation as a parameter instead of variance. Thank you for catching that.
  4. Definitely a small but significant mistake. We will correct this.

Thank you a lot of all those corrections!

POSTED BY: Marc Vicuna

Hello Juan,

  1. This is new and will be corrected, indeed it should be a lower bound, 1-the probability.
  2. Why would this be +0.5? "At least" implies inclusion, so -0.5 is more appropriate.
  3. Right on! We will correct this! Personally this is one of my most reoccurring mistakes, I find it very counterintuitive to use standard deviation as a parameter instead of variance. Thank you for catching that.
  4. Definitely a small but significant mistake. We will correct this.

Thank you a lot of all those corrections!

POSTED BY: Marc Vicuna

For exercises 10 problem 5,

I don't see how the condition x>1.5 in the calculation of variance represents the problem statement of "greater that 15 given that it is at least 10.


*MODERATOR NOTE: notebook Exercises10 Problem 5 was moved to the attachment below and also can be viewed at https://www.wolframcloud.com/obj/befe889f-2b79-4fd3-b80a-d22f6da3f9c8 Wolfram cloud notebook.*

Attachments:
POSTED BY: Joseph Smith
Posted 1 year ago

Hello Joseph,

There seems be two problems mixing here, so this will be corrected to:

The distribution of values of the retirement package offered by a company to new employees is modeled by the probability density function 1/5 e^(-(1/5)(x-5)) for x>5. Calculate the variance of the retirement package value, given that the value is at least 10.

The code would be:

Expectation[x^2 \[Conditioned] x > 10, 
  x \[Distributed] retirementDist] - 
 Expectation[x \[Conditioned] x > 10, 
   x \[Distributed] retirementDist]^2

This results in the same variance however, since this is condition independent.

Thank you for noticing.

POSTED BY: Updating Name

Hi;

  1. Is the following set logic correct - the notes were not real clear?

    P(A \[Union] B)=P(A) + P(B) - P(A \[Intersection] B)
    P(A \[Intersection] B) = P(A') + P(B') - P(A' \[Intersection] B')
    
  2. When creating a distribution using data from the repository, can the more recent data be weighted?

  3. I used the following to extract data; however, the FindDistribution function had a problem with the extracted data. Can you tell me what I am doing wrong.

    data = 
      QuantityMagnitude@
       Normal@ResourceData["Sample Data: Fisher's Irises"][
         All, {"SepalLength", "SepalWidth", "PetalLength", "PetalWidth"}];
    
    FindDistribution[data]
    

I finished the quizzes last Friday and the final yesterday. Some of the questions were quite challenging.

All in all, I really enjoyed your presentation and learned a lot about using Mathematica in calculating probability. However, I still have a few remaining questions.

Thanks again,

Mitch Sandlin

POSTED BY: Mitchell Sandlin

Hello Mitchell,

  1. The first is true, the second should be

    P(A' \[Union] B') = P(A') + P(B') - P(A' \[Intersection] B')
    

or

    1- P(A  \[Intersection] B) = P(A') + P(B') - P(A' \[Intersection] B')

Could you tell me where you found this so that I may correct it?

  1. Yes, you can assign any functionality to a weighted dataset with WeightedData, for example:

    data = RandomReal[{-5, 5}, 10^4];
    weightedData = WeightedData[data, PDF[NormalDistribution[], #] &]
    EstimatedDistribution[weightedData, 
     NormalDistribution[\[Mu], \[Sigma]]]
    

Some functionalities, like FindDistribution, may not work with WeightedData, but most functionalities do.

  1. As mentionned when introducting FindDistribution, that function is only for univariate data. You may want to FindDistribution for each dimension (independence assumption) or use for example a general multinormal distribution with EstimatedDistribution (normal assumption, but dependence is allowed).

I'm glad you enjoyed the course, I wish you the best!

POSTED BY: Marc Vicuna

Hi Marc; Thanks so much for the reply. To answer your question, I am not sure where I copied the code from since I was using my notes.

Now I am still not sure as to how the LHS of the statement is switched from a Union to an Intersection, in the 2nd Equation. For example, when you switch (A union B) to their complements (A',B'), does that switch the LHS to (A' intersection B')? I am confused as to how we are getting from a union to an intersection - "or" from an or relationship to an "and".

P(A \[Union] B)=P(A) + P(B) - P(A \[Intersection] B)
P(A' \[Union] B') = P(A') + P(B') - P(A' \[Intersection] B')

 1- P(A  \[Intersection] B) = P(A') + P(B') - P(A' \[Intersection] B')

Thanks again,

Mitch Sandlin

POSTED BY: Mitchell Sandlin

Hello Mitch,

In two words: De Morgan's!

P(A' \[Union] B') = P((A'' \[Complement] B'')') (De Morgan's Law of Set Theory)
P(A' \[Union] B') = P((A \[Complement] B)') (Double Negation of Set Theory)
P(A' \[Union] B') = 1 - P(A \[Complement] B) (Complement Law of Probability Theory)

Those laws are always useful, once in a while. I hope that helps.

POSTED BY: Marc Vicuna

Dear All, Exam , Q11 may have missing data. G

POSTED BY: Gabor Tarkanyi

Hello,

If you are referring to the mock exam, I can't see anything missing. If you are referring to the actual Final Exam, the numbering is random, so I will need more information to be able to pinpoint the question.

POSTED BY: Marc Vicuna
Posted 1 year ago

Dear Marc,

Will the course materials page contain all corrections to the Study Group notebooks e.g. Exercises?

I returned from Spring Break. In this Community Page there are too many discussions to keep track of, so I prefer to download the latest notebooks to work from.

Cheers,

Dave

POSTED BY: Updating Name

The course materials have been updated with the corrections :)

POSTED BY: Dave Middleton

Thanks for taking care of this! You, guys, are the best! I also started my own course and with your help I overcame many problems. I for example know this source https://phdessay.com/free-essays-on/cultural-identity/ that helps me a lot with my college writing, on any topic, like Cultural Identity; but I would like to know if there is a source that could help me with wolfram; this is until I'm doing pretty bad and if someone could help me I would be very glad, or maybe some AI who does this quite well, or maybe even a real person who could do this obviously for a fee. Thanks in advance.

Hello,

The lastest notebooks are on the course framework, or soon to be on the framework. If you find an error, just ctrl-find it on this page or suggest a change if you can't find it. But most mistakes were corrected in the framework by now. Hopefully that helps.

POSTED BY: Marc Vicuna

How do we calculate the Kurtosis of a distribution coming from data, as problem 6 from quiz 3 asks?

Hello Juan,

As repeated throughout the course, it all comes down to your capacity to recognize where to apply each distribution. If you can recognize the situation is appropriate for a specific distribution, use EstimatedDistribution on that abstract distribution together with the data. Then extract your measures from that distribution. Now, if you only have data and no information otherwise, use FindDistribution and actually use the found distribution to take your measures.

POSTED BY: Marc Vicuna

Marc:

That is what I did on the problem and got PoissonDistribution[3.71429] which has a Kurtosis of 3.26923, which does not appear as a valid answer on Problem 6 from Quiz 3.

Hello Juan,

I'm not sure how you are getting that result. The output for:

EstimatedDistribution[{1,3,6,0,3,4,5,7,4,2,11,1},PoissonDistribution[\[Lambda]]]

is

PoissonDistribution[3.91667]
POSTED BY: Marc Vicuna

That's interesting. The output for:

FindDistribution[{1, 3, 6, 0, 3, 4, 5, 7, 4, 2, 11, 1}, 
 TargetFunctions -> "Discrete"]

is:

PoissonDistribution[3.71429]
POSTED BY: Michael Gierhake

Hello Michael,

Indeed! This may see weird at first but this comes down to the approach. If you know what shape you're facing for the PDF, then it becomes a problem of classical probability theory to find the parameters corresponding to that PDF. However, FindDistribution is much more complex and based on heuristics, which make it much more uncertain. It tries to imagine the data that is not there, using heuristics. For example, with minimal data, it will prefer the Uniform and Normal distribution as they are common, even when the shape is different. This is why the course tends to emphasize EstimatedDistribution, whenever you have just a little bit of info more than raw data.

POSTED BY: Marc Vicuna

If you can recognize the situation is appropriate for a specific distribution, use EstimatedDistribution on that abstract distribution together with the data.

I would like to add that knowing the number of the parameters appropriate, makes big difference. Take for example StudentTDistribution, it has either one or three parameters. You can easily get a bad fit using EstimatedDistribution on your data with StudentTDistribution[[Nu]] while getting a much better one using StudentTDistribution[[Mu], [Sigma], [Nu]]

POSTED BY: Ahmed Elbanna

Hello Ahmed,

Indeed, you also need to be careful about that. When a parameter is not used in the function signature, it's usually because that parameter is assumed to be some standard value, like 0 or 1. If you cannot make such an assumption in your situation, it is in your interest to use the most general form of the distribution.

POSTED BY: Marc Vicuna

How do we calculate the variance for a Normal Distribution when it is not indicated? Problems 1 and 2 from Excercises-18.nb do it differently.

Posted 1 year ago

Hello Juan,

The normal distribution is the distribution of approximations, it is used to approximate two major distributions: the binomial and the Poisson distribution. In both approximation, just use the mean and variance of the exact distribution you're trying to approximate. If it's the binomial, take the binomial mean and variance for the normal. If it's Poisson, take the Poisson mean and variance.

POSTED BY: Updating Name

Thank you.

For exercises08 exercise 2 , given as "What is its expectation of the function Binomial[10,i]/2^10 for 0<=x<=10?" What is the "its" referred to in the problem statement ? The solution seems to be calculating the expectation of x for the distribution Binomial[10,i]/2^10 .

POSTED BY: Joseph Smith

Hello Joseph,

Indeed, it should be a "the", not an "its". Simple mistake, it will be corrected.

Thank you.

POSTED BY: Marc Vicuna
Posted 1 year ago

I also get a different output from Lesson 14 Exercise 4 with the same input as the exercise gives:

EstimatedDistribution[{3, 3, 10, 6, 6, 4, Sequence[
  5, 9, 3, 4, 7, 4, 7, 10, 8, 5, 6, 7, 11, 10, 5, 9, 7, 8, 6, 5, 6, 7,
    6, 8, 12, 9, 6, 3, 9, 5, 7, 5, 2, 9, 3, 5, 9, 9, 3, 5, 3, 8, 5, 6,
    5, 4, 7, 10, 6, 7, 8, 8, 11, 9, 8, 8, 9, 3, 11, 8, 7, 10, 5, 4, 5,
    10, 4, 8, 7, 7, 4, 3, 5, 10, 5, 4, 11, 5, 6, 10, 5, 7, 10, 11, 7, 
   5, 4, 7, 9, 5, 4, 5, 7, 5, 10, 11, 10, 5, 5, 7, 4, 7, 5, 4, 3, 4, 
   7, 10, 4, 8, 2, 7, 4, 4, 8, 4, 8, 8, 3, 9, 7, 7, 7, 7, 10, 5, 9, 8,
    11, 6, 8, 7, 7, 8, 3, 6, 7, 6, 7, 8, 8, 7, 2, 3, 4, 9, 7, 7, 6, 4,
    10, 6, 4, 8, 10, 7, 3, 10, 6, 6, 6, 5, 9, 7, 11, 6, 7, 1, 4, 8, 8,
    5, 5, 2, 8, 6, 7, 7, 5, 5, 6, 5, 6, 2, 12, 7, 6, 5, 7, 5, 9, 6, 4,
    8, 3, 8, 3, 7, 6, 3, 10, 6, 3, 6, 7, 8, 7, 3, 7, 4, 5, 4, 10, 8, 
   7, 10, 10, 7, 5, 9, 5, 4, 6, 4, 6, 11, 7, 9, 9, 6, 7, 4, 6, 7, 5, 
   5, 5, 5, 6, 4, 8, 4, 8, 7, 6, 4, 4, 5, 7, 8, 4, 2, 1, 5, 9, 2, 6, 
   11, 5, 4, 5, 12, 7, 7, 7, 0, 3, 7, 4, 6, 11, 5, 3, 5, 8, 4, 5, 2, 
   3, 8, 8, 6, 6, 1, 9, 4, 3, 8, 5, 4, 4, 5, 4, 5, 6, 6, 5, 7, 6, 1, 
   7, 3, 9, 8, 4, 8, 2, 9, 7, 13, 5, 5, 2, 8, 12, 8, 5, 5, 2, 3, 4, 9,
    11, 5, 6, 10, 5, 5, 5, 6, 5, 4, 3, 8, 8, 12, 7, 7, 8, 11, 2, 9, 
   10, 5, 4, 2, 8, 9, 6, 8, 7, 6, 1, 8, 7, 9, 10, 5, 10]}, 
 PoissonDistribution@\[Mu]]
Variance@%

PoissonDistribution[6.31507]
6.31507
POSTED BY: Parker Robb

Hello Parker,

Indeed, I'm not sure what gave such a weird output. It will be corrected.

Thank you for noticing.

POSTED BY: Marc Vicuna
Posted 1 year ago

The values used in Lesson 14 Exercise 2 do not match the values given in the question. For that question I get the following input and output:

Probability[x >= 25, 
  x \[Distributed] PoissonDistribution[0.06 500]] // N
Probability[x >= 25, x \[Distributed] BinomialDistribution[500, 0.06]]

0.842758
0.850619

Is this correct?

POSTED BY: Parker Robb

On exercise 5 of Exercises7 the solution gives P(17<x<19), which I believe is the probability of coming in between the 17 and 19 hour. I calculated the answer as P(x<19|x>=17).

On exercise 4 of Exercises 8 the solution gives P(A)^3 which I believe is the probability of being absent two days for three months in a row. I calculated the answer as P(x+y+z>=2) where each is distributed by 1/(ex!).

On exercises 13, I believe that more than one solution excludes equality when it should include it.

Am I wrong in these?

Posted 1 year ago

Hello Juan,

All correct! Thank you for noticing.

  1. Indeed, your answer makes more sense given the formulation. It will be corrected.
  2. Indeed, the formulation is too ambiguous, it will be corrected to make it more clear what is needed.
  3. Yes, for exercise 2 and 4, there are some equalities missing to the final solution.

Thank you!

POSTED BY: Updating Name

Problem 5 is stated as: "Consider a zoo visitor who arrives less than three hours before closing. How likely is that person to be able to stay for more than an hour?" What we have calculated by

Probability[17 < x < 19, x \[Distributed] zooDist] 

is the probability that a visitor, among ll the visitors of the day, will arrive between 17 and 19. Based on the way the problem is stated, wouldn't it be more correct to calculate the probability that a visitor arriving between 17 and 20 (less that three hours before closing) actually arrives between 17 and 19. Based on this reasoning, shouldn't the correct approach be to calculate

Probability[17 < x < 19, x \[Distributed] zooDist]/Probability[17 < x < 20, x \[Distributed] zooDist]

?

POSTED BY: Joseph Smith

Hello Joseph,

This is indeed also correct, and is an equivalent formulation of:

Probability[x < 19 \[Conditioned] x >= 17, x \[Distributed] dist]

As mentioned previously in this post.

POSTED BY: Marc Vicuna

I agree with you on both Robb.

Hello Parker,

Indeed, it seems the question and solution numbers don't correspond. This will be corrected.

Thank you for noticing.

POSTED BY: Marc Vicuna

Same question here. I obtained the same solution as shown in this post.

POSTED BY: Joseph Smith

Marc: I don't quite understand your statement "as soon as I have three balls the game is over". I did assume that once "you" draws a red ball the game is over. I reproduced the terms in your sum but if you carry the tree to the end you get some cases where there are 3 balls left and "you" have not yet drawn a red ball. I work this out in the attached notebook

POSTED BY: Joseph Smith

Hello Joseph,

It seems the solution for this problem is wrong and too complicated. To avoid this, let's use the power of the Wolfram language.

Here is the new solution.

Let's compute all arrangements of balls with your friend, where red balls are negative and black balls are positive. We are only interested in the balls that are received by the player, so let's take the odd columns (odd rounds).

possibilities = Permutations[{-1, -2, 1, 2, 3, 4}][[All, {1, 3, 5}]];

Now that we have the balls received in all probabilities, just measure the number of possibilities where we have at least one negative number against the total number of possibilities.

Length@Select[possibilities, AnyTrue[#, Negative] &]/
     Length@possibilities

And we get 4/5. As mentionned by William Weller.

POSTED BY: Marc Vicuna

Thanks for your response. I could not help thinking that there must be an easier way to solve this problem than by drawing a complex tree!

POSTED BY: Joseph Smith

WOW! I worked through your solution. It truly was an elegant tour de force!

POSTED BY: Joseph Smith

Hi Marc,

I am reviewing "Mock Exam" notebook and have found that the Question 31 solution is not correct. Given U, the transformed distribution with U-0.3 must be stated as:

Plot3D[{PDF[DirichletDistribution[{5, 3, 2}], {x, y}], 
  PDF[TransformedDistribution[{a - 0.3, 
     b - 0.3}, {a, b} \[Distributed] 
     DirichletDistribution[{5, 3, 2}]], {x, y}]}, {x, 0, 1}, {y, 0, 
  1}, Filling -> Bottom, PlotRange -> All, 
 PlotLegends -> {"U", "U-0.3"}]
POSTED BY: Hee-Young Shin

Hello Hee-Young,

Yes, it seems obvious that I forgot to adjust the question phrasing to U+0.3. Thank you for noticing, it will be corrected.

POSTED BY: Marc Vicuna

Hi Marc, It seems there are problems in both Quiz 6 #1 and #6, both of which are calculating the probability by normal approximation to binomial distribution. Please check these two questions in the "framework".

POSTED BY: Hee-Young Shin

Hello Hee-Young,

For the first question, you seem to be thinking Probability and NProbability are entirely equivalent, but they are not. NProbability is and will forever be an approximation, and sometimes a bad approximation. For this course, I suggest sticking to Probability, or even N@Probability or Probability[...] //N, as this is more likely to be a better approximation. Question 1 is one of those examples, with the binomial distribution.

As for question 6, I'm not sure what the issue is. You got the right answer with your normal approximation.

POSTED BY: Marc Vicuna

Thanks for this explanation, Marc. I have tried again using N@Probability. The 1st question still indicates that my response is wrong. As for the 6th question, it seems that there is no correct answer in the multiple choice. I suspect that there is either a technical glitch or wrong syntax in the "Framework". Please do the double check. Best wishes,

POSTED BY: Hee-Young Shin

I found the issue! Thank you for noticing, it will be corrected.

POSTED BY: Marc Vicuna

Can you clarify why you chose x>=440-0.5 for the argument for the Probability statement in your analysis of Exercise 6. I certainly see why 440 is chosen but I don't get the -0.5.

Response

The question mentions to use if appropriate the normal approximation. Read or listen again to Lesson 24 to know how to apply that approximation. Here 440 is included, so we include it in the probability.

I will look this over.

POSTED BY: Joseph Smith

Hello Joseph,

The question mentions to use if appropriate the normal approximation. Read or listen again to Lesson 24 to know how to apply that approximation. Here 440 is included, so we include it in the probability.

POSTED BY: Marc Vicuna

Hi Marc, Can you quickly check whether there is any technical error in your courseware ("framework")?

I am sorry. I figured out my mistake. The correct syntax is:

NProbability[x > 40, 
 x \[Distributed] BinomialDistribution[150, 0.2633]]
Attachments:
POSTED BY: Hee-Young Shin

The Daily Study Group "Introduction to Probability" Quizes and Level 1 Certifications have deadlines within the next two weeks. Next week will be Spring Break; personally I have been and will be very time constrained for the next few weeks.

Alternatively, I suppose we can also complete the course on the Wolfram-U page and request the Level 1 exam upon completion?

POSTED BY: Dave Middleton

Thanks for making the point about spring break, @Dave Middleton. We will extend both deadlines by a week, so quizzes should be done by March 24, and the exam by March 31. Yes, you can earn both completion and Level 1 independently in the interactive course, but course completion will require that you watch the video lessons within the framework. We run custom data pulls in order to verify completion with the Study Group.

POSTED BY: Jamie Peterson

Thank you Jamie. I plan to use the Study Group materials if time permits.

POSTED BY: Dave Middleton

Hi;

When you obtain the probability, it is easy to understand its exact meaning. However, the Expectation and RandomVariate are not so easy to understand. When would you use these two and what exactly are they telling me?

POSTED BY: Mitchell Sandlin

Hello Mitchell,

Let's go back to basics.

A probability distribution is a set of values associated to a set of probabilities. For a die, the values are {1,2,3,4,5,6} and the probabilities are {1/6,1/6,1/6,1/6,1/6,1/6}.

What is your probability? The likelihood of getting a certain value. Now, the RandomVariate is essentially a simulation. It randomly gives you back any value based on its probability in the context of the distribution. You use it to simulate the distribution, get an intuition of what that distribution is all about. To create artificial data, if needed.

Expectation is the following formula: the sum or integral of all probabilities times a mathematical expression of the value. It turns out this formula is really useful, as demonstrated by the lesson on expectation. It basically expresses the mean or average of a given mathematical expression, where the express has as a variable the values of the distribution. The expectation of the value itself is the mean, or average of the distribution. But as you can see in the lesson, the Expectation formula can be used for many other measures. The interpretation depends on the expression of the value used. You use it for many measures of a distribution, to find what is the expected, predicable value that will come, or at least close to it.

POSTED BY: Marc Vicuna

I have been puzzled about conditional probability, because is the situation usually as clear as that? The probability of an event occurring, given that another event has already occurred. It seems to me that in real world situations there can be all kinds of complex correlations between the events and the external world, not captured by the basic conditional probability formula.

POSTED BY: Anders Lindman

Hello Anders,

Indeed, a lot can go into conditional probability. While the formula is correct, it's most often difficult to pinpoint exactly the conditions of an event. Besides in quantum physics, most events can be predicted based on a variety of factors. In data science and statistics, we often talk about measured and unmeasured attributes to explain randomness.

Overall, it boils down to this. Measure everything that you think may have an impact. Based on this data, test the predictability of your next point. If it is predictable enough, you measure enough. If not, you might be missing too much information. Back to the drawing board.

POSTED BY: Marc Vicuna

Interesting, thanks.

POSTED BY: Anders Lindman

In yesterday's session about JointDistributions, the example of the Dirichlet distribution (about cutting a rope @ at about 4:30 in the framework video, https://www.wolframcloud.com/obj/online-courses/introduction-to-probability/joint-distributions.html) is completely different from the example in the notebook ("Lesson 22 - Join Distributions.nb", slide 8, as well as in the framework Lesson notebook) which is about mileage of a car.

Isn't the rope example more typical of the Dirichlet distribution than car mileages?

Hello Hakan,

Indeed, the version on the framework seems to be the wrong one! Thank you for telling us, this will be corrected shortly.

POSTED BY: Marc Vicuna

I would like to suggest another clarification to section 3.

POSTED BY: Joseph Smith

Hello Joseph,

First, since we aim to fit every phrase in a single line, we tend to shorten definitions like this one in interest of simplicity, while maintaining correctness.

Second, I think I actually disagree with your definition.

  1. Set theory is more basic than probability theory, that much should be clear. Thus, set operation definitions should not be dependent on the definitions of probability theory of event and sample space. So the vocabulary is, dare I say, anachronic.

  2. In set theory, your confusion is actually correct. Due to the trivial existence of the theoretical universal set, the complement of a given set can be used despite not referring to any exterior set. It's just another variable, that may or may not contain every possible element or not. The generality of the statement doesn't have to be lost. So in that sense, the definition we gave is valid, and yours is too restrictive. However, as we discussed in the study group, computers are rarely so theoretical. There is no purely mathematical set in the Wolfram language, only lists. In the same way, there is no purely theoretical complement implemented, only a computationally approximation of that concept.

POSTED BY: Marc Vicuna

Thanks for your response!

POSTED BY: Joseph Smith

Suggestion for Clarifying Section 3 Slide 6

POSTED BY: Joseph Smith

Hello Joseph,

Your suggestion is actually what initially intended, but here is the issue: for this demonstration, no interactivity is possible due to the clickable interactions and the framework of Wolfram U. This initially lead to some confusion in the early stages. Our solution was to give a more print friendly graph, keep the interactivity in the video and give the link to the demonstration for those that wanted to experiment with it (as you did?).

Seeing this also led you to more confusion, I'll consider rebuilding that demonstation myself with the Wolfram U framework in mind.

POSTED BY: Marc Vicuna

For Section 2 Exercise 3. how do all the paths to (4,2) add up to 7 segments? It looks like 6 to me.

Attachments:
POSTED BY: Joseph Smith

Hello Joseph,

Indeed, this is an error, as noted by the post by Juan Ortiz Navarro, here's my answer:

As for problem 3, this is also an error, but the anwser should be Binomial[6,2] or Multinomial[4,2], yes. Thank you for informing us!

For the solution to make sense, just change the question to (0,0) to (4,3).

POSTED BY: Marc Vicuna
Posted 1 year ago

Hi!. In the combinatorics, at the start of the course we saw how to count [Not-Replacement +Order Not Relevant] --> Binomial. What would be the approach to count [Replacement + Order Not Relevant], maybe we saw that, but I can't find it. Thanks!

POSTED BY: J.Edi Gran

Hello J,

For combinatorics, it's easier to think of as a tree of decisions rather than a matrix. Some cases may not have any real problem. The binomial is [Order not relevant + 1 group], and we usually assume without replacement. Why? Because a set with multiple examples of the same element is still the same set! Consider this:

A set is a collection of non-repeated elements without order.

A multiset is a collection of elements without order.

Therefore, the approach to count (Replacement + Order Not Relevant) requires the definiton of a multiset. This multiset can be counted using Binomial(n+k-1,n). See this for more explanations.

POSTED BY: Marc Vicuna
Posted 1 year ago

Thanks Marc.. I do believe there may be some impossible (in the sense of not meaningful) cases. I still try to figure out what makes sense to ask and what doesn't .But ok, here I go with a clarification of my question. I still believe this is applicable in real life, but I may be wrong. I tried to depict the situation with an "entry level skills" notebook, so my question gets easier to be assessed..

POSTED BY: J.Edi Gran

Hello J,

Here's one way to reformulate your situation. Out of a group of 5, you want to choose 1 to 3 elements. Thus,

Binomial[5, 3] + Binomial[5, 2] + Binomial[5, 1]

or equivalently:

Sum[Binomial[5, i], {i, 1, 3}]

This gives you the 25 you got. Choosing from a range of groups may be required in some situations indeed.

POSTED BY: Marc Vicuna
Posted 1 year ago

Thanks Marc, thanks a lot!

POSTED BY: J.Edi Gran

The solution for exercise 1 from section 2 seems disconnected. Q: Twenty-five runners compete in the 200m event. How many top 10 arrangements are possible? A: The arrangements are ordered, but you were only asked to consider 10 elements out of 25. Thus, this corresponds to a permutation: y1 = Integrate[(-s + 0), {s, 0, x}]

What does this statement have to do with the problem?

Shouldn't the answer be 25!/15! ?

POSTED BY: Joseph Smith

Hello Joseph,

Indeed, on problems 1 and 3, there seems to be text from another part of the course. This is an error, as mentionned by Juan Ortiz Navarro, posted 2 days ago Regarding excercises-02.nb The solution to problem 1 (...)

So yes the answer is 25!/15!.

POSTED BY: Marc Vicuna

Thanks!

POSTED BY: Joseph Smith

On Excercise 4 of excercises-06.nb, "A smoker is twice as likely to have an ectopic pregnancy as a non-smoking pregnant woman." is denoted as P(E|S)=2P(E). I thought it to be denoted as P(E|S)=2P(E|S'), and P(S|E)=P(E|S)P(S)/(P(E|S)P(S)+P(E|S')P(S'))=0.461538

How can P(E|S)=2P(E|S') be said in words?

Hello Juan,

I agree. This is a mistake, it should say: A smoker is twice as likely to have an ectopic pregnancy as any pregnant woman.

That way the statement would make sense. Your calculation makes sense considering the formulation. Thank you for noticing, it will be corrected.

POSTED BY: Marc Vicuna

Thanks @Juan Ortiz Navarro, I had just calculated the same solution using Bayes theorem. As the solution of "Exercise 4" in the "exercises-06" notebook is different from mine, I consulted this community page.

enter image description here

P( S | E) =2 / (1+1/0.3) = 0.46

Personally, I find the original wording and this solution more interesting..

POSTED BY: Dave Middleton

Hello all,

For a bit of context for the exponential family of distributions, here are some ressources:

A well made short video series introducing the subject, by Mutual Information.

An introduction article to get familiar with this, from Berkeley EECS.

A short textbook of Statistical Theory focused on the exponential family, from the University of Oxford.

A paper on the link with machine learning, from Princeton University.

The exponential family is usually covered in any course of Statistical Theory. Go satisfy your curiosity!

POSTED BY: Marc Vicuna
Posted 1 year ago

Mind blowing!...(for me.. for sure there are more advance users for whom this may be already natural.)..but now I can appreciate the flexibility (and why not beauty) of exponential when "connected" to other "devices" to cast a spectrum of other functions. Never expected that to be possible.

POSTED BY: J.Edi Gran

Lecture 1: How many different teams are possible given that it must include 3 Swiss (original group is 4) and 2 Ethiopians (original group is 5). You give the answer Binomial[4,2]*Binomial[5,2] is this the correct answer? How did you derive it? Is my answer Binomial[4,3]*Binomial[5,2] not correct?

POSTED BY: Alex Kuznetsov

Hello Alex,

As noted in the post by John Burke, there is a mistake and the answer is 40, with Binomial[4,3]*Binomial[5,2].

POSTED BY: Marc Vicuna

POSTED BY: Zbigniew Kabala

Hi Zbigniew,

The first is also our mistake, now that it is edited. Refer to the original post by John Burke.

The second is a mistake, and will be corrected. Thank you for telling us!

POSTED BY: Marc Vicuna

It's funny that commenting on typos I made a typo myself by inexplicably replacing a multiplication sign with a plus sign. This notwithstanding, the first is also a mistake. In your documentation, the calculation reads:

Binomial[4, 2] * Binomial[5, 3], which is equal to 60,

whereas it should read

Binomial[4, 3] * Binomial[5, 2], which is equal to 40

NOTE: I edited my original post and fixed my typo and its consequence.

POSTED BY: Zbigniew Kabala
Posted 1 year ago

Hello... While reading the second excersise that states "While surfing the web, you encounter ad A 7 times and ads B and C 3 times each. How many arrangements are possible? " I interpreted as having next sequence: AAAAAAABBBCCC or (any other with seven As, three Bs and three Cs, as the precise sequence is not established). So I understood it like n=3 (A,B,C) al possible value reutilization=Yes k (trials)=13 , as the sequence provided, no matter the order is 13 positions long.

So my solution before reading the answer was n^k -> 3^13 possible arrangements. But when reading I found "multinomial". In the question what is the part that suggest that Multinomial approach is the right approach? Thanks

POSTED BY: J.Edi Gran

Hello J,

That's a pretty fun question actually. Let's go through it.

You encounter A 7 times, B 3 times and C 3 times, 13 in total. How count this have happened? You have your limited ressource of 7 As, 3 Bs and 3 Cs, so all the orders are going to be 13!, not 3^13, because this would imply you may not have encountered the given number of ads.

But, can you distinguish the difference between A and A? No, the elements are not distinguishable. So you need to divide by 7! for all the orders of A, 3! for B, and 3! for C. You get 13!/(7!3!3!) which is the Multinomial[7,3,3].

POSTED BY: Marc Vicuna
Posted 1 year ago

Oh that's interesting, as the assumption for the multinomial is that all possible Ads are "exhausted" at the last observation. My intuition was that, while surfing the internet, I used a finite amount of time, and during that undetermined time Window I saw only 13 Ads, like if I keep browsing I may perfectly see more ads. Let say if I would double the browing time, I may end seeing 26 ads or 30 ads in total, .. So I guess I get your point if all possible ad instances are contained in the limited length sample (of 13 total ads). Otherwise, if not limiting length is stated, I guess it may be fair to assume that As Bs or Cs are never exhausted, and browsing more...means watching more ads..(like the "YouTube Premium thing...that seems to never end.. ;) ....) Thanks Marc for the clarification

POSTED BY: J.Edi Gran

Hello J,

To be clear, I think your situation could be also a valid situation.

When you read the situation, you are given a specific instance of what happened. 7, 3, 3. Given that specific instance, I ask about the unknown information: the ordering. But I may have asked given 13 ads and 3 ad types, or given 3 ad types (even more general). You should need to be careful what is the specific situation and not generalize too fast.

POSTED BY: Marc Vicuna
Posted 1 year ago

Thanks, Marc. I'll keep that advice in mind!

POSTED BY: J.Edi Gran

Regarding excercises-04.nb

On problem 4, I think the solution is not complete. I believe, the answer displayed is for P(R' intersection B'), which one needs to add 1-P(R) and 1-P(B).
Or P(R' union B')=1-P(R intersection B)=1-.1=.9.

What mistake am I doing?

Hello Juan,

This was addressed earlier on this post, but the 0.1 is a mistake. Here is the solution:

P(A'[Intersection]B')=P((A[Union]B)')=1-P(A[Union]B)=1-0.4=0.6

Basically using De Morgan's Law of sets.

Thus, the probability that neither A nor B occurs is 0.6.

1-P(R)=P(R') and 1-P(B)=P(B'). The probability neither happen mean both are false at the same time, thus P(A'[Intersection]B'), a conjunction.

Hopefully that answered your question, although I'm not entirely sure. Let me know if you're still confused.

POSTED BY: Marc Vicuna

Thanks Marc. I was confused on "neither red nor blue". I understand now that it is a conjunction.

Regarding excercises-02.nb The solution to problem 1 seems to took a turn I do not understand. Should it be factorial(25)/factorial(10)?

And on problem 3, going from (0,0) to (4,2), are paths of length 7 or 6?

So Binomial[6,2] or Multinomial (4,2)?

Hello Juan,

exercises-02, Problem 1. Honestly, this issue is a great surprise to me. It definitely took a turn I was not expecting either. Thank you for noticing. This will be corrected, but here is the answer:

25!/15!

or equivalently:

FactorialPower[25, 10]

As for problem 3, this is also an error, but the anwser should be Binomial[6,2] or Multinomial[4,2], yes. Thank you for informing us!

POSTED BY: Marc Vicuna

Thanks. I meant to write 25!/15! indeed.

Posted 1 year ago

While trying to confirm concepts in the course with 3rd party reference, I found the attached. It states that the "Sample Space" for the times to failure for a certain machine is a sequence (T1...Tn). In my opinion, sample space must contain all possible times to failure, as opposed to certain specific sample. So I'd say the "space" would be all the real numbers, or perhaps all the real numbers below the maximum allowable age for the equipment being analyzed. Am I right on that the reference is confusing a "Sample space" with a specific sample?

I'd appreciate your comments. Jorge

a enter image description here

POSTED BY: J.Edi Gran

Hi J,

This is a bit complicated, so let's discuss this one thing at a time. This textbook expresses a sample space where each data point is in itself a sequence of multiple numbers. In that sense, this is a multivariate random variable. Within a single sample, or data point, there are multiple times, which are the periods of time between breakdowns. Let's say there are n such periods in each sample. Thus, the domain of any single outcome is R(>0)^n, that is, the positive reals in n dimensions (periods of time are always positive).

In this context, the sample space is the set of all possible sequences of periods between breakdowns, possibly R(>0)^n itself.

So overall, I believe you are right with your sample space, but also that you are wrongly interpreting their explaination of the sample space. Hopefully that explanation helped.

POSTED BY: Marc Vicuna
Posted 1 year ago

Oh!!!! Aweeeeeesomee!!! Thanks...thanks a lot... I see my mistake!

POSTED BY: J.Edi Gran
Posted 1 year ago

Hi Marc, In Lesson 7, Slide 10 and 11, you showed two different examples of how to load and aggregate. I have difficulty in understanding how to aggregate :

height[h_?(60 <= # <= 76 &)] := aggdata[[2, h - 59]]/Length[data] and roll[n_Integer?(2 <= # <= 12 &)] := dice[[n - 1, 2]]/36

Could you elaborate a bit, or use simpler and understandable codes?

Thanks, Lewis

POSTED BY: Laising Yen

Hi Laising,

Lesson 7 is infamously trying to use data functionality that we don't have access to at that point in the course. That is, you could accomplish the same thing using EmpiricalDistribution, SmoothKernelDistribution or HistogramDistribution. But here's an explanation of what I'm doing to avoid those functions.

  1. Aggregate data by value and frequency using the Tally function.
  2. Normalize the frequency by dividing the list of frequencies by to total number of occurences. This now becomes your list of probabilities.
  3. Map the probabilities to the correct values. You now have your PDF. I also bound the values to the PDF to make it clearly where is the domain of my values.

Again, this is not necessary for you, as this will be better addressed in other lessons. This was merely the first jab at the subject of data-driven distribution.

POSTED BY: Marc Vicuna

Marc,

You asked for mistakes in the documents. So, please have a look at “Lesson 2, Slide 9”: enter image description here Your equations are wrong, which means LHS is not equal to RHS. Right?

POSTED BY: Jürgen Kanz

Indeed, the LHS exponent should be 5, not 3. This will be corrected.

Thank you for noticing and informing us.

POSTED BY: Marc Vicuna
Posted 1 year ago

When is the new study group for Introduction to Probability starting? I missed last week and would rather start from the beginning without worry and rushing.

POSTED BY: Updating Name

Does anybody have the link to the course materials that they could post on the community thread?

I want to get started on the exercises but I missed the last meeting and the recording does not show the chat pane with the links.

Thanks

POSTED BY: Joseph Smith
Posted 1 year ago

thanks!

POSTED BY: Joseph Smith

Marc, I am enjoying the course very much. Thank you for all the effort you are putting into it. When you get time, would you please check Exercise 3 from exercises-04.nb. I did the problem in two different ways, and I still get an answer of 0.6 which does not agree with 0.1 which is your answer. Thanks

POSTED BY: William Weller

Indeed! You caught my second mistake!

P(A'[Intersection]B')=P((A[Union]B)')=1-P(A[Union]B)=1-0.4=0.6

Thus, the probability that neither A nor B occurs is 0.6.

Thank you for noticing, it will be corrected.

POSTED BY: Marc Vicuna

Thanks Marc for your prompt consideration. I have another for you! In Exercise 1 of exercises-05.nb, the event you describe out of the 36 possible equally likely outcomes is E={{1, 1}, {2, 2}, {3, 3}, {4, 4}, {5, 5}, {6, 6}, {1, 6}, {6, 1}, {2, 6}, {6, 2}, {3, 6}, {6, 3}, {4, 6}, {6, 4}, {5, 6}, {6, 5}}. Hence, P(E) =16/36=4/9. Thus, the odds are 4 to 5. Would you please look into this also. Thanks.

POSTED BY: William Weller

Indeed, that is correct.

There are 6*6 combinations, 6 with doubles, 6 starting with the number 6 and 6 ending with the number 6. The three events have the occurrence (6,6) in common, so one occurrence is counted three times, thus:

(6 + 6 + 6 - 2)/(6*6)

is indeed 4/9, thus odds of 4 to 5.

Thank you for informing us, it will be corrected.

POSTED BY: Marc Vicuna

Marc, I have another one for you. In Exercise 3 of exercises-05.nb, there should be an additional branch to your tree; namely b,b,b,b,r,r. This will give you an additional 1/15 for an answer of 12/15. I found that a simpler way for me to attack the problem was to consider the probability of failure to draw at least one red ball. This means you only have a sub-tree with 3 branches to keep track of. You will then get 3/15 as probability of failure, and hence 12/15 is the probability of success. Thanks for staying on top this.

POSTED BY: William Weller

William:
I agree with you but I see it as which parts of the tree will give at least one red, taking into account that as soon as I have three balls the game is over. So in the first round I have a 2/6 probability of getting a red one. If I do not succeed on that, then on the second round I have 4/62/5 probability of getting a red one. And on the third round, I have 4/63/51/2 of getting a red one and the game is over since the other player took the other three balls:
2/6 + 4/6
2/5 + 4/63/52/4=4/5
or the complement of not getting red balls, which will be all black balls: 1-4/63/51/2.
Does that make sense?

EDIT: I seem to have answered that half asleep, I can't even read my answer. Please disregard it.

POSTED BY: Marc Vicuna
Posted 1 year ago

The question never states that the game is over as soon as a red ball is taken. Without that information, I interpreted the question as there being three outcomes at the end after all balls are taken, one set for "you" and another for the "friend":

{{r,r,b},{b,b,b}}
{{r,b,b},{r,b,b}}
{{b,b,b},{r,r,b}}

This gives a ⅔ probability of having a red ball in "your" set at the end.

Is that line of thinking correct given my assumptions?

POSTED BY: Parker Robb

Hello Parker,

This is incorrect due to the fact the probabilities change as this situation is ordered. But naming all the possibilities may not be a bad idea, this will become my new solution. Thank you for the idea.

POSTED BY: Marc Vicuna

Marc: I don't quite understand your statement "as soon as I have three balls the game is over". I did assume that once "you" draws a red ball the game is over. I reproduced the terms in your sum but if you carry the tree to the end you get some cases where there are 3 balls left and "you" have not yet drawn a red ball. I work this out in the attached notebook,

Attachments:
POSTED BY: Joseph Smith

The solution to this problem is to complicated for its own good. Refer to your later post on this, were I give a better solution.

POSTED BY: Marc Vicuna

Marc, Would you check Your answer to Question 8 of the Mock Exam? I get 20*Binomial[19,7] =1,007,760. Thanks.

POSTED BY: William Weller

Hello William,

Indeed, the written answer is right but the calculation is wrong, it will be corrected.

Thank you, it will be corrected.

POSTED BY: Marc Vicuna

Hello Willian,

It seems the solution for this problem is wrong and too complicated. To avoid this, let's use the power of the Wolfram language.

Here is the new solution.

Let's compute all arrangements of balls with your friend, where red balls are negative and black balls are positive numbers. We are only interested in the balls that are received by the player, so let's take the odd columns (odd rounds).

possibilities = Permutations[{-1, -2, 1, 2, 3, 4}][[All, {1, 3, 5}]];

Now that we have the balls received in all probabilities, just measure the number of possibilities where we have at least one negative number against the total number of possibilities.

Length@Select[possibilities, AnyTrue[#, Negative] &]/
     Length@possibilities

And we get 4/5. As mentionned by you with the three branches. Surely this way we're less likely to get lost.

POSTED BY: Marc Vicuna
Posted 1 year ago

Hi Marc, Very Nice approach to this problem!! Well done. I like it because, as you point out, it does demonstrate the elegance and power of the Wolfram Language.

There is no need to give up on your "Tree" approach, though. What your Wolfram solution shows is that in a problem of this type, counterintuitively, every COMPLETE branch is equally likely. Thus, you don't have to laboriously track ever changing probabilities down each branch! Hence, our solution is: The number of favorable branches devided by the number of total branches which is Binomial[6,4]. Of Course, the numbers 6 and 4 could be replaced by even integers, n and k, with n>k>0.

POSTED BY: Updating Name

Hello all,

We had a more difficult question today, given as:

Given a vector with m element (real numbers) I want to compute the PDF for the sum of the m elements considering that each element will have positive or negative sign with equal probability.

First, consider that each element must have its own probability distribution. Since it is real, let me for example assume any element is uniformly distribution between -10 and 10.

UniformDistribution[{-10, 10}]

If all elements are distributed in the same way, the PDF of the sum will be a result of m elements added. So we get:

PDF[TransformedDistribution[
  m*x, x \[Distributed] UniformDistribution[{-10, 10}]], x]

However, if you assume a normal distribution centered as 0 for your elements, then you could write:

PDF[TransformedDistribution[
  m*x, x \[Distributed] NormalDistribution[0, 1]], x]

Finally, your elements may be distributed in different ways, which could look like something like this, assuming let's say m is 4:

PDF[TransformedDistribution[
  a + b + c + d, {a \[Distributed] NormalDistribution[0, 1], 
   b \[Distributed] NormalDistribution[0, 4], 
   c \[Distributed] NormalDistribution[0, 19], 
   d \[Distributed] UniformDistribution[{-10, 10}]}], x]

Another way to intepret your question is to each element is given a binary choice between -1 or 1, but the values are constant. In that case, you may want to apply many distributions to a single sum, which results in this, assume for example this vector:

v = {1.77, 5.65, 10.14, 195.14}
PDF[TransformedDistribution[
  v[[1]]*(-1)^a + v[[2]]*(-1)^b + v[[3]]*(-1)^c + 
   v[[4]]*(-1)^d, {a, b, c, d} \[Distributed] 
   Table[BernoulliDistribution[0.5], Length[v]]], x]

Hopefully that answers the question. Transformed distributions will be seen in Lesson 20.

POSTED BY: Marc Vicuna

Hi;

A couple of questions regarding the Decision Tree to decide how to count specific outcomes:

Under the Count, Without Order, One Group, (n over i ) - what does (n over i) mean or what operation are we performing here?

Under Count, With Order, Remplacing - what is Remplacing?

Thanks,

Mitch Sandlin

POSTED BY: Mitchell Sandlin

Hi Mitch,

The (n i) (vertical) is the mathematical notation for Binomial[], this is a binomial coefficient. Same goes for the Multinomial coefficient.

For replacing part, I'm sorry for the confusion, but this is meant to say replacing, not remplacing, this is just a syntax error. As for what does replacing mean, it mean that when you "use" an element, you replace it by another element of the same value. For example, a password of 7 digits. You may "use" the digit 5 as your first characters, but it is still available for your next choice. That means you replaced the used digit 5 by another digit 5, for the set to maintain the same size.

This vocabulary comes from the historical example of taking balls out of an urn. You may put another equivalent ball after you've taken one out (with replacing) or you may not do that, therefore the size of the group is less after each step (without replacing).

POSTED BY: Marc Vicuna
Posted 1 year ago

Will this course be repeated? I am having struggles with it right now, and would like to work on Computational X-plorations or something else. Thanks.

POSTED BY: Updating Name

Hi,

It may be best for you to prioritize other engagements, as this course is made to be taken at any time.

The course will soon be fully released with all materials and lessons freely accessible. Moreover, the daily study group sessions are all recorded and also freely accessible. There is no planned second study group for this course for now, but it may happen if there is interest for it. Feel free to also continue asking questions in this post even after the study group has ended.

POSTED BY: Marc Vicuna
Posted 1 year ago

Would you please provide details of the proof for the inequality on Slide 12 of Lesson 10? Thanks very much.

POSTED BY: Bob Renninger

Hi Bob,

I found this proof, from the University of Arizona, which stays within the bounds of the course and you don't need too much external knowledge of other branches of mathematics besides Multivariable Calculus. This is not too much of a difficult proof, but it remains fairly theoretical. The actual proof is short and at the end, but the rest of the paper gives great context.

POSTED BY: Marc Vicuna
Posted 1 year ago

Thanks Marc, this is exactly what I was looking for. Sometimes a simple equation has an easy proof, and I am relieved to see that this is not such a case! Can you provide a reference to the book mentioned in the article? That seems to be very compatible with this course.

POSTED BY: Bob Renninger

Hi Bob,

The textbook of the proof is Probability - An Introduction by Geoffrey Grimmett and Dominic Welsh. It is concise but a bit short on examples. A great book for examples would be Introduction to Probability by Charles Grinstead and Laurie Snell. Less mathematical, more examples, more discussion. It's also freely available.

Hopefully that helps.

POSTED BY: Marc Vicuna

In lesson 8, "Discrete Random Variables" we use the statements

children[n_] := 1/((n + 1)^3*Zeta[3]);
distChildren = 
 ProbabilityDistribution[children[n], {n, 0, Infinity, 1}]

Does the "1" in the ProbabilityDistribution function indicate that n is a discrete variable?

POSTED BY: Joseph Smith

Hello Joseph,

Basically yes. It indicates you're making jumps of 1, and the fact you're jumping over values indicates you are taking discrete steps; thus, you are using a discrete variable.

Notice you could also theoretically also jump of 2, or 0.5, and this would still be a discrete distribution. The point is that you are jumping values instead of using a continuous range.

POSTED BY: Marc Vicuna

Thank you for the wonderful and edifying course. However, some of the questions you ask seem to be ambivalent. For example, let's take look at one of the questions posed today:

enter image description here

The second option was announced to be correct. But one could easily argue that the last one is also correct. Say, if you want to see how close the head frequency in tossing a coin 100 times would get to the expected value, you would use the RandomVariate function in MODELING the problem. Having potentially more than one correct answer and only one deemed correct by an automated exam is not a problem for quizzes, which you can take more than once with the same questions. However, it would be a problem for a certification exam, which you can take only once with the same questions.

In summary, the fuzziness/ambivalence of some of the questions presented in the course worries me in the context of the future certification exam.

POSTED BY: Zbigniew Kabala

Hello Zbigniew,

First, let's be clear, I am taking a few liberties with the poll questions, such as having general comprehension questions and multiple true answers. The same fuzziness will not be found in quizzes or certifications exams, and only one answer will always be true for those.

Second, the 4th choice remains a bad choice. You can use RandomVariate to model, but should you? A mathematical model is an abstract description of a concrete system using mathematical concepts and language. (Wikipedia) RandomVariate will give you an approximation of the base model, but never exactly the base model. You are losing information when you model only based on the RandomVariate data.

"Say, if you want to see how close the head frequency in tossing a coin 100 times would get to the expected value, you would use the RandomVariate function in MODELING the problem."

In this case, you want to use the expectation of the difference between theoretical expectation and the random variable. Check for the reduction of sample mean error, or even the CentralMoment. The problem with using the RandomVariate is that you might be able to see how close the head frequency should be, or you might not. The RandomVariate is inexact. You may get your measurement, but it is also possible your sample may be heavily biased or not enough biased. Moreover, seeing how close a head frequency is does not constitute a model. I fail to see how your problem fits the definition of modelling.

POSTED BY: Marc Vicuna

Thanks, Marc, for clarifying. I'll get used to your precise definitions, including that of modeling. Again, thanks 10^6 for a so-far wonderful course.

POSTED BY: Zbigniew Kabala
Posted 1 year ago

Adding to this, I believe there's always a catch when trying to rely on a single approach to assess all possible analytical situation. Of course, having an equation beforehand as a model it is an ideal situation, but that's not always doable. As an example, forecasting the production profile of an Industrial Plant where hundreds or thousands of assets interact (where you have random climate, random times to failure, sometimes with clear patterns and sometimes not), forcing one to have an equation as a model may turn the analysis into not-viable. In Modeling, there's regularly an "elucidation" process where one plays and may end having an equation possible to fit, or not. I believe in those cases, if one wants to approach them cost-effectivelly, I'd say it is perfectly valid play with the "RandonNumber" generation as part of the Modeling process because we're just trying to figure out what's going on. That's why approaches like Monte Carlo (despite being loved or hated) provides speed to the analysis process.

POSTED BY: J.Edi Gran
Posted 1 year ago

Hi everybody!! I'm having some issues while trying to Plot a PDF. I see some unexpected changes at the staring of the PDF plot. Am I doing something wrong in the PDF definition or invoking the Plot WLFunction? Thanks in advance!


MODERATOR NOTE: notebook Some wierd behavoir when trying to use a PDF was moved to the attachment below and also can be viewed at https://www.wolframcloud.com/obj/29289cee-c540-4d5e-9507-dbfcc2001739 Wolfram cloud notebook.

Attachments:
POSTED BY: J.Edi Gran

Some weird behavior when trying to use a PDF

Attachments:
POSTED BY: Jürgen Kanz
Posted 1 year ago

Oh!! Awesome! Great advice Jürgen!!..Thanks a lot!

POSTED BY: J.Edi Gran

Actually, Mathematica does a good job, even though it confuses you by selecting different PlotRange options. But look carefully, and note that the value at which your distribution starts is 1/(1+3Pi/2) = 0.175058...

The best way to avoid this illusion is to specify the same PlotRange option for all your plots, say, PlotRange ->{0,1}, or PlotRange -> All, or PlotRange -> Full.

POSTED BY: Zbigniew Kabala

Hi;

I noticed that a squared off D (esc cond esc) was being used to indicate a condition. Also, Mathematica use a /; to indicate a condition. Are these two interchangeable?

Thanks,

Mitch Sandlin

POSTED BY: Mitchell Sandlin

Hello Mitchell,

No, these are not equivalent. Those are for very different contexts.

Conditioned is only used in the context of Probability to symbolize the classical P(A|B) or probability of A given B, basically replacing the word given.

Condition is used in pattern recognition, always jointly with a pattern, giving an iterative test to accomplish on a list usually, as a much more core mechanic of the language.

POSTED BY: Marc Vicuna
Posted 1 year ago

Hi,

In the material of Lesson 4, when referring to the probability of the event "E" , and the need for Kolmogorov axioms, appears this..

However, k/M must be constant to be relevant, which is problematic<

Every time I "test" the 1/6 of a perfect dice, the number of k ones / M trials is not constant, certainly because random events are not constant. So I don't believe I get the core idea of why k/M must be constant or why this is "problematic". From an Statistical perspective that k/M is how we ( or at least I ) estimate (guess) probability. So, I'm confused.

POSTED BY: J.Edi Gran

Hello J,

A valid question for sure. Let me try to be as explicit as possible.

First, why must a probability be a constant measure? Well, a quick Google search gives us that a measure is "A reference standard or sample used for the quantitative comparison of properties." In other words, for something to be a measure, it needs to be reliable, a reference. An uncertain number that has some convergence is not evidently reliable, thus not a good measure. For us to define an entire branch of mathematics, we need more reliable things.

Second, why is the assumption of convergence of any frequency problematic? Simply, because it is too much of a strong statement. If I start by assuming a big and complicated statement, then any possible exception to that statement may render worthless all the theorems I’ve built on that statement. It's really a question of what is considered conventionally true.

Look at it this way. We want the most basic statements for axioms. You can prove the convergence of frequency through Kolmogorov's axioms. However, you cannot prove Kolmogorov's axioms from the assumption of the convergence of frequency. This implies Kolmogorov's axioms are more "basic".

POSTED BY: Marc Vicuna
Posted 1 year ago

Awesome.. Got it! Thanks!

POSTED BY: J.Edi Gran

The example in lesson 3 for the probability that the sum of two dice will be even comes up with a probability of 6/11. This seemed odd and a little bit of extra thought suggests that you should not delete duplicate outcomes as they are part of the sample space of outcomes. When you consider duplicate outcomes in both numerator and denominator, you get the more intuitively satisfying answer of a probability of 1/2.

Attachments:
POSTED BY: Joseph Smith
Posted 1 year ago

This doesn't dare to be an "answer", but my interpretation, as I had the same "feeling" when reviewed that slide.

As the explicit statement in the slide is {2,4,6,8,10,12}, and those were examples on the topic of "Sample spaces and Events" I saw that list as the "Event Definition" that one would use to calculate the mentioned probability. That list is actually the Event Definition against which the EvenQ function tests the full range of outputs. And I'd humbly agree the probability of an Even Sum is 1/2

.enter image description here

POSTED BY: J.Edi Gran

Hello Joseph,

Let's be careful with the terms here. In lesson 3, we defined the sample space, the set of all possible outcomes. For the sum of two dice, this is {2,3,4,5,6,7,8,9,10,11,12}, as a set does not need to repeat instances. The probability of each of those events is not given at that point.

Now, if we discuss probability, you can obviously see that the event of sum 2 will only happen for {1,1} dice, but the event of sum 7 will happen for {1,6}, {2,5}, {3,4}, {4,3}, {5,2}, {6,1} dice, in other words, many combinations. From this you can affirm the events {2,3,4,5,6,7,8,9,10,11,12} most definitely don't have the same probability. Thus, your equally likely assumption leading to 6/11 is wrong. You need to assign to each event its correct probability.

POSTED BY: Marc Vicuna

Marc

Thanks for your response. I agree that the set of all possible outcomes should not include repeats. But is the probability of an even sum 1/2? .

I'm enjoying these study group sessions!

Joe

POSTED BY: Joseph Smith
POSTED BY: Marc Vicuna