Message Boards Message Boards

[WSG23] Daily Study Group: Introduction to Probability

A Wolfram U Daily Study Group on Introduction to Probability begins on February 27th 2023.

Join me and a group of fellow learners to learn about the world of probability and statistics using the Wolfram Language. Our topics for the study group include the characterisation of randomness, random variable design and analysis, important random distributions and their applications, probability-based data science and advanced probability distributions.

The idea behind this study group is to rapidly develop an intuitive understanding of probability for a college student, professional or interested hobbyist. A basic working knowledge of the Wolfram Language is recommended but not necessary. We are happy to help beginners get up to speed with Wolfram Language using resources already available on Wolfram U.

Please feel free to use this thread to collaborate and share ideas, materials and links to other resources with fellow learners.

REGISTER HERE

enter image description here

Wolfram U Banner

POSTED BY: Marc Vicuna
201 Replies

In yesterday's session about JointDistributions, the example of the Dirichlet distribution (about cutting a rope @ at about 4:30 in the framework video, https://www.wolframcloud.com/obj/online-courses/introduction-to-probability/joint-distributions.html) is completely different from the example in the notebook ("Lesson 22 - Join Distributions.nb", slide 8, as well as in the framework Lesson notebook) which is about mileage of a car.

Isn't the rope example more typical of the Dirichlet distribution than car mileages?

Hello Joseph,

First, since we aim to fit every phrase in a single line, we tend to shorten definitions like this one in interest of simplicity, while maintaining correctness.

Second, I think I actually disagree with your definition.

  1. Set theory is more basic than probability theory, that much should be clear. Thus, set operation definitions should not be dependent on the definitions of probability theory of event and sample space. So the vocabulary is, dare I say, anachronic.

  2. In set theory, your confusion is actually correct. Due to the trivial existence of the theoretical universal set, the complement of a given set can be used despite not referring to any exterior set. It's just another variable, that may or may not contain every possible element or not. The generality of the statement doesn't have to be lost. So in that sense, the definition we gave is valid, and yours is too restrictive. However, as we discussed in the study group, computers are rarely so theoretical. There is no purely mathematical set in the Wolfram language, only lists. In the same way, there is no purely theoretical complement implemented, only a computationally approximation of that concept.

POSTED BY: Marc Vicuna

Hello Joseph,

Your suggestion is actually what initially intended, but here is the issue: for this demonstration, no interactivity is possible due to the clickable interactions and the framework of Wolfram U. This initially lead to some confusion in the early stages. Our solution was to give a more print friendly graph, keep the interactivity in the video and give the link to the demonstration for those that wanted to experiment with it (as you did?).

Seeing this also led you to more confusion, I'll consider rebuilding that demonstation myself with the Wolfram U framework in mind.

POSTED BY: Marc Vicuna

I would like to suggest another clarification to section 3.

POSTED BY: Joseph Smith

Suggestion for Clarifying Section 3 Slide 6

POSTED BY: Joseph Smith
Posted 1 year ago

Thanks Marc, thanks a lot!

POSTED BY: J.Edi Gran

Hello J,

Here's one way to reformulate your situation. Out of a group of 5, you want to choose 1 to 3 elements. Thus,

Binomial[5, 3] + Binomial[5, 2] + Binomial[5, 1]

or equivalently:

Sum[Binomial[5, i], {i, 1, 3}]

This gives you the 25 you got. Choosing from a range of groups may be required in some situations indeed.

POSTED BY: Marc Vicuna
Posted 1 year ago

Thanks Marc.. I do believe there may be some impossible (in the sense of not meaningful) cases. I still try to figure out what makes sense to ask and what doesn't .But ok, here I go with a clarification of my question. I still believe this is applicable in real life, but I may be wrong. I tried to depict the situation with an "entry level skills" notebook, so my question gets easier to be assessed..

POSTED BY: J.Edi Gran

Hello Joseph,

Indeed, this is an error, as noted by the post by Juan Ortiz Navarro, here's my answer:

As for problem 3, this is also an error, but the anwser should be Binomial[6,2] or Multinomial[4,2], yes. Thank you for informing us!

For the solution to make sense, just change the question to (0,0) to (4,3).

POSTED BY: Marc Vicuna
Posted 1 year ago

The question never states that the game is over as soon as a red ball is taken. Without that information, I interpreted the question as there being three outcomes at the end after all balls are taken, one set for "you" and another for the "friend":

{{r,r,b},{b,b,b}}
{{r,b,b},{r,b,b}}
{{b,b,b},{r,r,b}}

This gives a ⅔ probability of having a red ball in "your" set at the end.

Is that line of thinking correct given my assumptions?

POSTED BY: Parker Robb

Thanks!

POSTED BY: Joseph Smith

For Section 2 Exercise 3. how do all the paths to (4,2) add up to 7 segments? It looks like 6 to me.

Attachments:
POSTED BY: Joseph Smith

Hello J,

For combinatorics, it's easier to think of as a tree of decisions rather than a matrix. Some cases may not have any real problem. The binomial is [Order not relevant + 1 group], and we usually assume without replacement. Why? Because a set with multiple examples of the same element is still the same set! Consider this:

A set is a collection of non-repeated elements without order.

A multiset is a collection of elements without order.

Therefore, the approach to count (Replacement + Order Not Relevant) requires the definiton of a multiset. This multiset can be counted using Binomial(n+k-1,n). See this for more explanations.

POSTED BY: Marc Vicuna

Hello Joseph,

Indeed, on problems 1 and 3, there seems to be text from another part of the course. This is an error, as mentionned by Juan Ortiz Navarro, posted 2 days ago Regarding excercises-02.nb The solution to problem 1 (...)

So yes the answer is 25!/15!.

POSTED BY: Marc Vicuna
Posted 1 year ago

Hi!. In the combinatorics, at the start of the course we saw how to count [Not-Replacement +Order Not Relevant] --> Binomial. What would be the approach to count [Replacement + Order Not Relevant], maybe we saw that, but I can't find it. Thanks!

POSTED BY: J.Edi Gran

The solution for exercise 1 from section 2 seems disconnected. Q: Twenty-five runners compete in the 200m event. How many top 10 arrangements are possible? A: The arrangements are ordered, but you were only asked to consider 10 elements out of 25. Thus, this corresponds to a permutation: y1 = Integrate[(-s + 0), {s, 0, x}]

What does this statement have to do with the problem?

Shouldn't the answer be 25!/15! ?

POSTED BY: Joseph Smith

Hello Juan,

I agree. This is a mistake, it should say: A smoker is twice as likely to have an ectopic pregnancy as any pregnant woman.

That way the statement would make sense. Your calculation makes sense considering the formulation. Thank you for noticing, it will be corrected.

POSTED BY: Marc Vicuna

On Excercise 4 of excercises-06.nb, "A smoker is twice as likely to have an ectopic pregnancy as a non-smoking pregnant woman." is denoted as P(E|S)=2P(E). I thought it to be denoted as P(E|S)=2P(E|S'), and P(S|E)=P(E|S)P(S)/(P(E|S)P(S)+P(E|S')P(S'))=0.461538

How can P(E|S)=2P(E|S') be said in words?

Posted 1 year ago

Mind blowing!...(for me.. for sure there are more advance users for whom this may be already natural.)..but now I can appreciate the flexibility (and why not beauty) of exponential when "connected" to other "devices" to cast a spectrum of other functions. Never expected that to be possible.

POSTED BY: J.Edi Gran

Hello all,

For a bit of context for the exponential family of distributions, here are some ressources:

A well made short video series introducing the subject, by Mutual Information.

An introduction article to get familiar with this, from Berkeley EECS.

A short textbook of Statistical Theory focused on the exponential family, from the University of Oxford.

A paper on the link with machine learning, from Princeton University.

The exponential family is usually covered in any course of Statistical Theory. Go satisfy your curiosity!

POSTED BY: Marc Vicuna

It's funny that commenting on typos I made a typo myself by inexplicably replacing a multiplication sign with a plus sign. This notwithstanding, the first is also a mistake. In your documentation, the calculation reads:

Binomial[4, 2] * Binomial[5, 3], which is equal to 60,

whereas it should read

Binomial[4, 3] * Binomial[5, 2], which is equal to 40

NOTE: I edited my original post and fixed my typo and its consequence.

POSTED BY: Zbigniew Kabala
Posted 1 year ago

Thanks, Marc. I'll keep that advice in mind!

POSTED BY: J.Edi Gran

Hello J,

To be clear, I think your situation could be also a valid situation.

When you read the situation, you are given a specific instance of what happened. 7, 3, 3. Given that specific instance, I ask about the unknown information: the ordering. But I may have asked given 13 ads and 3 ad types, or given 3 ad types (even more general). You should need to be careful what is the specific situation and not generalize too fast.

POSTED BY: Marc Vicuna

Hello Alex,

As noted in the post by John Burke, there is a mistake and the answer is 40, with Binomial[4,3]*Binomial[5,2].

POSTED BY: Marc Vicuna

Hi Zbigniew,

The first is also our mistake, now that it is edited. Refer to the original post by John Burke.

The second is a mistake, and will be corrected. Thank you for telling us!

POSTED BY: Marc Vicuna

William:
I agree with you but I see it as which parts of the tree will give at least one red, taking into account that as soon as I have three balls the game is over. So in the first round I have a 2/6 probability of getting a red one. If I do not succeed on that, then on the second round I have 4/62/5 probability of getting a red one. And on the third round, I have 4/63/51/2 of getting a red one and the game is over since the other player took the other three balls:
2/6 + 4/6
2/5 + 4/63/52/4=4/5
or the complement of not getting red balls, which will be all black balls: 1-4/63/51/2.
Does that make sense?

EDIT: I seem to have answered that half asleep, I can't even read my answer. Please disregard it.

POSTED BY: Marc Vicuna

Lecture 1: How many different teams are possible given that it must include 3 Swiss (original group is 4) and 2 Ethiopians (original group is 5). You give the answer Binomial[4,2]*Binomial[5,2] is this the correct answer? How did you derive it? Is my answer Binomial[4,3]*Binomial[5,2] not correct?

POSTED BY: Alex Kuznetsov

POSTED BY: Zbigniew Kabala
Posted 1 year ago

Oh!!!! Aweeeeeesomee!!! Thanks...thanks a lot... I see my mistake!

POSTED BY: J.Edi Gran
Posted 1 year ago

Oh that's interesting, as the assumption for the multinomial is that all possible Ads are "exhausted" at the last observation. My intuition was that, while surfing the internet, I used a finite amount of time, and during that undetermined time Window I saw only 13 Ads, like if I keep browsing I may perfectly see more ads. Let say if I would double the browing time, I may end seeing 26 ads or 30 ads in total, .. So I guess I get your point if all possible ad instances are contained in the limited length sample (of 13 total ads). Otherwise, if not limiting length is stated, I guess it may be fair to assume that As Bs or Cs are never exhausted, and browsing more...means watching more ads..(like the "YouTube Premium thing...that seems to never end.. ;) ....) Thanks Marc for the clarification

POSTED BY: J.Edi Gran

Thanks Marc. I was confused on "neither red nor blue". I understand now that it is a conjunction.

Hello J,

That's a pretty fun question actually. Let's go through it.

You encounter A 7 times, B 3 times and C 3 times, 13 in total. How count this have happened? You have your limited ressource of 7 As, 3 Bs and 3 Cs, so all the orders are going to be 13!, not 3^13, because this would imply you may not have encountered the given number of ads.

But, can you distinguish the difference between A and A? No, the elements are not distinguishable. So you need to divide by 7! for all the orders of A, 3! for B, and 3! for C. You get 13!/(7!3!3!) which is the Multinomial[7,3,3].

POSTED BY: Marc Vicuna

Hello Juan,

This was addressed earlier on this post, but the 0.1 is a mistake. Here is the solution:

P(A'[Intersection]B')=P((A[Union]B)')=1-P(A[Union]B)=1-0.4=0.6

Basically using De Morgan's Law of sets.

Thus, the probability that neither A nor B occurs is 0.6.

1-P(R)=P(R') and 1-P(B)=P(B'). The probability neither happen mean both are false at the same time, thus P(A'[Intersection]B'), a conjunction.

Hopefully that answered your question, although I'm not entirely sure. Let me know if you're still confused.

POSTED BY: Marc Vicuna

Thanks. I meant to write 25!/15! indeed.

Hello Juan,

exercises-02, Problem 1. Honestly, this issue is a great surprise to me. It definitely took a turn I was not expecting either. Thank you for noticing. This will be corrected, but here is the answer:

25!/15!

or equivalently:

FactorialPower[25, 10]

As for problem 3, this is also an error, but the anwser should be Binomial[6,2] or Multinomial[4,2], yes. Thank you for informing us!

POSTED BY: Marc Vicuna

Hi J,

This is a bit complicated, so let's discuss this one thing at a time. This textbook expresses a sample space where each data point is in itself a sequence of multiple numbers. In that sense, this is a multivariate random variable. Within a single sample, or data point, there are multiple times, which are the periods of time between breakdowns. Let's say there are n such periods in each sample. Thus, the domain of any single outcome is R(>0)^n, that is, the positive reals in n dimensions (periods of time are always positive).

In this context, the sample space is the set of all possible sequences of periods between breakdowns, possibly R(>0)^n itself.

So overall, I believe you are right with your sample space, but also that you are wrongly interpreting their explaination of the sample space. Hopefully that explanation helped.

POSTED BY: Marc Vicuna
Posted 1 year ago

Hello... While reading the second excersise that states "While surfing the web, you encounter ad A 7 times and ads B and C 3 times each. How many arrangements are possible? " I interpreted as having next sequence: AAAAAAABBBCCC or (any other with seven As, three Bs and three Cs, as the precise sequence is not established). So I understood it like n=3 (A,B,C) al possible value reutilization=Yes k (trials)=13 , as the sequence provided, no matter the order is 13 positions long.

So my solution before reading the answer was n^k -> 3^13 possible arrangements. But when reading I found "multinomial". In the question what is the part that suggest that Multinomial approach is the right approach? Thanks

POSTED BY: J.Edi Gran

Regarding excercises-04.nb

On problem 4, I think the solution is not complete. I believe, the answer displayed is for P(R' intersection B'), which one needs to add 1-P(R) and 1-P(B).
Or P(R' union B')=1-P(R intersection B)=1-.1=.9.

What mistake am I doing?

Regarding excercises-02.nb The solution to problem 1 seems to took a turn I do not understand. Should it be factorial(25)/factorial(10)?

And on problem 3, going from (0,0) to (4,2), are paths of length 7 or 6?

So Binomial[6,2] or Multinomial (4,2)?

Posted 1 year ago

While trying to confirm concepts in the course with 3rd party reference, I found the attached. It states that the "Sample Space" for the times to failure for a certain machine is a sequence (T1...Tn). In my opinion, sample space must contain all possible times to failure, as opposed to certain specific sample. So I'd say the "space" would be all the real numbers, or perhaps all the real numbers below the maximum allowable age for the equipment being analyzed. Am I right on that the reference is confusing a "Sample space" with a specific sample?

I'd appreciate your comments. Jorge

a enter image description here

POSTED BY: J.Edi Gran

Hi Laising,

Lesson 7 is infamously trying to use data functionality that we don't have access to at that point in the course. That is, you could accomplish the same thing using EmpiricalDistribution, SmoothKernelDistribution or HistogramDistribution. But here's an explanation of what I'm doing to avoid those functions.

  1. Aggregate data by value and frequency using the Tally function.
  2. Normalize the frequency by dividing the list of frequencies by to total number of occurences. This now becomes your list of probabilities.
  3. Map the probabilities to the correct values. You now have your PDF. I also bound the values to the PDF to make it clearly where is the domain of my values.

Again, this is not necessary for you, as this will be better addressed in other lessons. This was merely the first jab at the subject of data-driven distribution.

POSTED BY: Marc Vicuna
Posted 1 year ago

Hi Marc, In Lesson 7, Slide 10 and 11, you showed two different examples of how to load and aggregate. I have difficulty in understanding how to aggregate :

height[h_?(60 <= # <= 76 &)] := aggdata[[2, h - 59]]/Length[data] and roll[n_Integer?(2 <= # <= 12 &)] := dice[[n - 1, 2]]/36

Could you elaborate a bit, or use simpler and understandable codes?

Thanks, Lewis

POSTED BY: Laising Yen

Marc, I have another one for you. In Exercise 3 of exercises-05.nb, there should be an additional branch to your tree; namely b,b,b,b,r,r. This will give you an additional 1/15 for an answer of 12/15. I found that a simpler way for me to attack the problem was to consider the probability of failure to draw at least one red ball. This means you only have a sub-tree with 3 branches to keep track of. You will then get 3/15 as probability of failure, and hence 12/15 is the probability of success. Thanks for staying on top this.

POSTED BY: William Weller

Indeed, that is correct.

There are 6*6 combinations, 6 with doubles, 6 starting with the number 6 and 6 ending with the number 6. The three events have the occurrence (6,6) in common, so one occurrence is counted three times, thus:

(6 + 6 + 6 - 2)/(6*6)

is indeed 4/9, thus odds of 4 to 5.

Thank you for informing us, it will be corrected.

POSTED BY: Marc Vicuna

Indeed, the LHS exponent should be 5, not 3. This will be corrected.

Thank you for noticing and informing us.

POSTED BY: Marc Vicuna

Marc,

You asked for mistakes in the documents. So, please have a look at “Lesson 2, Slide 9”: enter image description here Your equations are wrong, which means LHS is not equal to RHS. Right?

POSTED BY: Jürgen Kanz

thanks!

POSTED BY: Joseph Smith

Thanks Marc for your prompt consideration. I have another for you! In Exercise 1 of exercises-05.nb, the event you describe out of the 36 possible equally likely outcomes is E={{1, 1}, {2, 2}, {3, 3}, {4, 4}, {5, 5}, {6, 6}, {1, 6}, {6, 1}, {2, 6}, {6, 2}, {3, 6}, {6, 3}, {4, 6}, {6, 4}, {5, 6}, {6, 5}}. Hence, P(E) =16/36=4/9. Thus, the odds are 4 to 5. Would you please look into this also. Thanks.

POSTED BY: William Weller
Posted 1 year ago

When is the new study group for Introduction to Probability starting? I missed last week and would rather start from the beginning without worry and rushing.

POSTED BY: Updating Name

Indeed! You caught my second mistake!

P(A'[Intersection]B')=P((A[Union]B)')=1-P(A[Union]B)=1-0.4=0.6

Thus, the probability that neither A nor B occurs is 0.6.

Thank you for noticing, it will be corrected.

POSTED BY: Marc Vicuna
Posted 1 year ago

Does anybody have the link to the course materials that they could post on the community thread?

I want to get started on the exercises but I missed the last meeting and the recording does not show the chat pane with the links.

Thanks

POSTED BY: Joseph Smith

Marc, I am enjoying the course very much. Thank you for all the effort you are putting into it. When you get time, would you please check Exercise 3 from exercises-04.nb. I did the problem in two different ways, and I still get an answer of 0.6 which does not agree with 0.1 which is your answer. Thanks

POSTED BY: William Weller

Hello all,

We had a more difficult question today, given as:

Given a vector with m element (real numbers) I want to compute the PDF for the sum of the m elements considering that each element will have positive or negative sign with equal probability.

First, consider that each element must have its own probability distribution. Since it is real, let me for example assume any element is uniformly distribution between -10 and 10.

UniformDistribution[{-10, 10}]

If all elements are distributed in the same way, the PDF of the sum will be a result of m elements added. So we get:

PDF[TransformedDistribution[
  m*x, x \[Distributed] UniformDistribution[{-10, 10}]], x]

However, if you assume a normal distribution centered as 0 for your elements, then you could write:

PDF[TransformedDistribution[
  m*x, x \[Distributed] NormalDistribution[0, 1]], x]

Finally, your elements may be distributed in different ways, which could look like something like this, assuming let's say m is 4:

PDF[TransformedDistribution[
  a + b + c + d, {a \[Distributed] NormalDistribution[0, 1], 
   b \[Distributed] NormalDistribution[0, 4], 
   c \[Distributed] NormalDistribution[0, 19], 
   d \[Distributed] UniformDistribution[{-10, 10}]}], x]

Another way to intepret your question is to each element is given a binary choice between -1 or 1, but the values are constant. In that case, you may want to apply many distributions to a single sum, which results in this, assume for example this vector:

v = {1.77, 5.65, 10.14, 195.14}
PDF[TransformedDistribution[
  v[[1]]*(-1)^a + v[[2]]*(-1)^b + v[[3]]*(-1)^c + 
   v[[4]]*(-1)^d, {a, b, c, d} \[Distributed] 
   Table[BernoulliDistribution[0.5], Length[v]]], x]

Hopefully that answers the question. Transformed distributions will be seen in Lesson 20.

POSTED BY: Marc Vicuna

Hi Mitch,

The (n i) (vertical) is the mathematical notation for Binomial[], this is a binomial coefficient. Same goes for the Multinomial coefficient.

For replacing part, I'm sorry for the confusion, but this is meant to say replacing, not remplacing, this is just a syntax error. As for what does replacing mean, it mean that when you "use" an element, you replace it by another element of the same value. For example, a password of 7 digits. You may "use" the digit 5 as your first characters, but it is still available for your next choice. That means you replaced the used digit 5 by another digit 5, for the set to maintain the same size.

This vocabulary comes from the historical example of taking balls out of an urn. You may put another equivalent ball after you've taken one out (with replacing) or you may not do that, therefore the size of the group is less after each step (without replacing).

POSTED BY: Marc Vicuna

Hi Bob,

The textbook of the proof is Probability - An Introduction by Geoffrey Grimmett and Dominic Welsh. It is concise but a bit short on examples. A great book for examples would be Introduction to Probability by Charles Grinstead and Laurie Snell. Less mathematical, more examples, more discussion. It's also freely available.

Hopefully that helps.

POSTED BY: Marc Vicuna

Hi;

A couple of questions regarding the Decision Tree to decide how to count specific outcomes:

Under the Count, Without Order, One Group, (n over i ) - what does (n over i) mean or what operation are we performing here?

Under Count, With Order, Remplacing - what is Remplacing?

Thanks,

Mitch Sandlin

POSTED BY: Mitchell Sandlin

Hi,

It may be best for you to prioritize other engagements, as this course is made to be taken at any time.

The course will soon be fully released with all materials and lessons freely accessible. Moreover, the daily study group sessions are all recorded and also freely accessible. There is no planned second study group for this course for now, but it may happen if there is interest for it. Feel free to also continue asking questions in this post even after the study group has ended.

POSTED BY: Marc Vicuna
Posted 1 year ago

Thanks Marc, this is exactly what I was looking for. Sometimes a simple equation has an easy proof, and I am relieved to see that this is not such a case! Can you provide a reference to the book mentioned in the article? That seems to be very compatible with this course.

POSTED BY: Bob Renninger
Posted 1 year ago

Will this course be repeated? I am having struggles with it right now, and would like to work on Computational X-plorations or something else. Thanks.

POSTED BY: Updating Name

Hi Bob,

I found this proof, from the University of Arizona, which stays within the bounds of the course and you don't need too much external knowledge of other branches of mathematics besides Multivariable Calculus. This is not too much of a difficult proof, but it remains fairly theoretical. The actual proof is short and at the end, but the rest of the paper gives great context.

POSTED BY: Marc Vicuna
Posted 1 year ago

Would you please provide details of the proof for the inequality on Slide 12 of Lesson 10? Thanks very much.

POSTED BY: Bob Renninger
Posted 1 year ago

Adding to this, I believe there's always a catch when trying to rely on a single approach to assess all possible analytical situation. Of course, having an equation beforehand as a model it is an ideal situation, but that's not always doable. As an example, forecasting the production profile of an Industrial Plant where hundreds or thousands of assets interact (where you have random climate, random times to failure, sometimes with clear patterns and sometimes not), forcing one to have an equation as a model may turn the analysis into not-viable. In Modeling, there's regularly an "elucidation" process where one plays and may end having an equation possible to fit, or not. I believe in those cases, if one wants to approach them cost-effectivelly, I'd say it is perfectly valid play with the "RandonNumber" generation as part of the Modeling process because we're just trying to figure out what's going on. That's why approaches like Monte Carlo (despite being loved or hated) provides speed to the analysis process.

POSTED BY: J.Edi Gran

Thanks, Marc, for clarifying. I'll get used to your precise definitions, including that of modeling. Again, thanks 10^6 for a so-far wonderful course.

POSTED BY: Zbigniew Kabala

Actually, Mathematica does a good job, even though it confuses you by selecting different PlotRange options. But look carefully, and note that the value at which your distribution starts is 1/(1+3Pi/2) = 0.175058...

The best way to avoid this illusion is to specify the same PlotRange option for all your plots, say, PlotRange ->{0,1}, or PlotRange -> All, or PlotRange -> Full.

POSTED BY: Zbigniew Kabala

Hello Joseph,

Basically yes. It indicates you're making jumps of 1, and the fact you're jumping over values indicates you are taking discrete steps; thus, you are using a discrete variable.

Notice you could also theoretically also jump of 2, or 0.5, and this would still be a discrete distribution. The point is that you are jumping values instead of using a continuous range.

POSTED BY: Marc Vicuna

Hello Zbigniew,

First, let's be clear, I am taking a few liberties with the poll questions, such as having general comprehension questions and multiple true answers. The same fuzziness will not be found in quizzes or certifications exams, and only one answer will always be true for those.

Second, the 4th choice remains a bad choice. You can use RandomVariate to model, but should you? A mathematical model is an abstract description of a concrete system using mathematical concepts and language. (Wikipedia) RandomVariate will give you an approximation of the base model, but never exactly the base model. You are losing information when you model only based on the RandomVariate data.

"Say, if you want to see how close the head frequency in tossing a coin 100 times would get to the expected value, you would use the RandomVariate function in MODELING the problem."

In this case, you want to use the expectation of the difference between theoretical expectation and the random variable. Check for the reduction of sample mean error, or even the CentralMoment. The problem with using the RandomVariate is that you might be able to see how close the head frequency should be, or you might not. The RandomVariate is inexact. You may get your measurement, but it is also possible your sample may be heavily biased or not enough biased. Moreover, seeing how close a head frequency is does not constitute a model. I fail to see how your problem fits the definition of modelling.

POSTED BY: Marc Vicuna

In lesson 8, "Discrete Random Variables" we use the statements

children[n_] := 1/((n + 1)^3*Zeta[3]);
distChildren = 
 ProbabilityDistribution[children[n], {n, 0, Infinity, 1}]

Does the "1" in the ProbabilityDistribution function indicate that n is a discrete variable?

POSTED BY: Joseph Smith
Posted 1 year ago

Oh!! Awesome! Great advice Jürgen!!..Thanks a lot!

POSTED BY: J.Edi Gran

Some weird behavior when trying to use a PDF

Attachments:
POSTED BY: Jürgen Kanz

Thank you for the wonderful and edifying course. However, some of the questions you ask seem to be ambivalent. For example, let's take look at one of the questions posed today:

enter image description here

The second option was announced to be correct. But one could easily argue that the last one is also correct. Say, if you want to see how close the head frequency in tossing a coin 100 times would get to the expected value, you would use the RandomVariate function in MODELING the problem. Having potentially more than one correct answer and only one deemed correct by an automated exam is not a problem for quizzes, which you can take more than once with the same questions. However, it would be a problem for a certification exam, which you can take only once with the same questions.

In summary, the fuzziness/ambivalence of some of the questions presented in the course worries me in the context of the future certification exam.

POSTED BY: Zbigniew Kabala
Posted 1 year ago

Hi everybody!! I'm having some issues while trying to Plot a PDF. I see some unexpected changes at the staring of the PDF plot. Am I doing something wrong in the PDF definition or invoking the Plot WLFunction? Thanks in advance!


MODERATOR NOTE: notebook Some wierd behavoir when trying to use a PDF was moved to the attachment below and also can be viewed at https://www.wolframcloud.com/obj/29289cee-c540-4d5e-9507-dbfcc2001739 Wolfram cloud notebook.

Attachments:
POSTED BY: J.Edi Gran

Yes! It is 1/2. Let's see how you could do it considering all the notions seen.

First, get the sample space and probabilities.

{sampleSpace, probabilities} = 
 Transpose@Tally@Flatten@Table[x + y, {x, 1, 6}, {y, 1, 6}]

Then, normalize the probabilities for a sum of 1:

probabilities = probabilities/Total[probabilities]

Finally, sum all even sums:

Sum[If[EvenQ@sampleSpace[[i]], probabilities[[i]], 0], {i, 
  Length[sampleSpace]}]

Not the simplest way, but definitely a visual way to see it:

ListPlot[Transpose@{sampleSpace, probabilities}, Filling -> Axis]

Which should give you: enter image description here

POSTED BY: Marc Vicuna

Hello Mitchell,

No, these are not equivalent. Those are for very different contexts.

Conditioned is only used in the context of Probability to symbolize the classical P(A|B) or probability of A given B, basically replacing the word given.

Condition is used in pattern recognition, always jointly with a pattern, giving an iterative test to accomplish on a list usually, as a much more core mechanic of the language.

POSTED BY: Marc Vicuna

Hi;

I noticed that a squared off D (esc cond esc) was being used to indicate a condition. Also, Mathematica use a /; to indicate a condition. Are these two interchangeable?

Thanks,

Mitch Sandlin

POSTED BY: Mitchell Sandlin

Marc

Thanks for your response. I agree that the set of all possible outcomes should not include repeats. But is the probability of an even sum 1/2? .

I'm enjoying these study group sessions!

Joe

POSTED BY: Joseph Smith
Posted 1 year ago

Awesome.. Got it! Thanks!

POSTED BY: J.Edi Gran

Hello J,

A valid question for sure. Let me try to be as explicit as possible.

First, why must a probability be a constant measure? Well, a quick Google search gives us that a measure is "A reference standard or sample used for the quantitative comparison of properties." In other words, for something to be a measure, it needs to be reliable, a reference. An uncertain number that has some convergence is not evidently reliable, thus not a good measure. For us to define an entire branch of mathematics, we need more reliable things.

Second, why is the assumption of convergence of any frequency problematic? Simply, because it is too much of a strong statement. If I start by assuming a big and complicated statement, then any possible exception to that statement may render worthless all the theorems I’ve built on that statement. It's really a question of what is considered conventionally true.

Look at it this way. We want the most basic statements for axioms. You can prove the convergence of frequency through Kolmogorov's axioms. However, you cannot prove Kolmogorov's axioms from the assumption of the convergence of frequency. This implies Kolmogorov's axioms are more "basic".

POSTED BY: Marc Vicuna

Hello Joseph,

Let's be careful with the terms here. In lesson 3, we defined the sample space, the set of all possible outcomes. For the sum of two dice, this is {2,3,4,5,6,7,8,9,10,11,12}, as a set does not need to repeat instances. The probability of each of those events is not given at that point.

Now, if we discuss probability, you can obviously see that the event of sum 2 will only happen for {1,1} dice, but the event of sum 7 will happen for {1,6}, {2,5}, {3,4}, {4,3}, {5,2}, {6,1} dice, in other words, many combinations. From this you can affirm the events {2,3,4,5,6,7,8,9,10,11,12} most definitely don't have the same probability. Thus, your equally likely assumption leading to 6/11 is wrong. You need to assign to each event its correct probability.

POSTED BY: Marc Vicuna
Posted 1 year ago

This doesn't dare to be an "answer", but my interpretation, as I had the same "feeling" when reviewed that slide.

As the explicit statement in the slide is {2,4,6,8,10,12}, and those were examples on the topic of "Sample spaces and Events" I saw that list as the "Event Definition" that one would use to calculate the mentioned probability. That list is actually the Event Definition against which the EvenQ function tests the full range of outputs. And I'd humbly agree the probability of an Even Sum is 1/2

.enter image description here

POSTED BY: J.Edi Gran
Posted 1 year ago

Hi,

In the material of Lesson 4, when referring to the probability of the event "E" , and the need for Kolmogorov axioms, appears this..

However, k/M must be constant to be relevant, which is problematic<

Every time I "test" the 1/6 of a perfect dice, the number of k ones / M trials is not constant, certainly because random events are not constant. So I don't believe I get the core idea of why k/M must be constant or why this is "problematic". From an Statistical perspective that k/M is how we ( or at least I ) estimate (guess) probability. So, I'm confused.

POSTED BY: J.Edi Gran

The example in lesson 3 for the probability that the sum of two dice will be even comes up with a probability of 6/11. This seemed odd and a little bit of extra thought suggests that you should not delete duplicate outcomes as they are part of the sample space of outcomes. When you consider duplicate outcomes in both numerator and denominator, you get the more intuitively satisfying answer of a probability of 1/2.

Attachments:
POSTED BY: Joseph Smith

Hi Juan,

To be fair, this is extra knowledge that is only there for your information. That said, let's get into it.

The Naive Bayes assumption is that every event is considered independent. This implies there is n probabilities to guess for any set of n events. From there, a Naive Bayes classifier, a machine learning algorithm, will assume every attribute of your dataset is independent. Now, with a lot of data points, noticing where data points differ, you can infer those n probabilities, which allows you afterwards to predict the class based on your guessed probabilities, just multiplying the independent events.

Feel free to check external resources, like this, to get a good understanding of the algorithm. But this is beyond the scope of the course of course, something you would see in an Introduction to AI.

POSTED BY: Marc Vicuna

Hello J,

  1. Yes! This is the priority of operations, as noted throughout the lesson. First parentesis, then negation, then intersection, then union. Like the infamous PEDMAS, but for set operations.

  2. Yes, the generalization applies. Imagine it as equivalent operations but operation on different objects.

The logical Or ||, the addition +, the set union, all represent the natural first common operation.

The logical And &&, the multiplication *, the set intersection, all represent the natural second common operation in a ring.

This is a common theme in mathematics, which implies a lot of the basic rules you see, often in linear algebra, are appliable to many many contexts!

POSTED BY: Marc Vicuna
Posted 1 year ago

This is really great for learning the complete possibilities of the Wolfram Language.. Thanks for doing it so thoroughly!. You could paste a picture!.. but you didn't!. Thanks a lot!

POSTED BY: J.Edi Gran
Posted 1 year ago

enter image description here


Hi...two fast questions: 1- In these two last laws (pls. see the image) .. and in general, should it be assumed that intersection always goes (is processed) first, before that Union. 2. Does it make sense to generalize that, what is applicable to + and * when in regular arithmetic {i.e (A+B)=(B+A) } Is valid when replacing + by Union and * by Intersection {i.e ( A u B ) = (B u A) or as A(B+C) is AB +A*C then A n ( B u C) --> (A n B ) u ( B n C) shoud be ok...)

Thanks!

POSTED BY: J.Edi Gran

Hi Mark:

I did not understand what you are calculating in the UCILetter Example on the Baye's section.

Can you point me in the right direction?

Thanks.

Hi Tianyi,

Most code for visualizations is available just by downloading the notebook and expanding the cell of interest.

For lesson 1, most of the visualisations are taken from the Wolfram Demonstrations Project, feel free to explore it and use it for your own classes.

Here is the code for the Normal Convergence:

Manipulate[
 Show[
  Histogram[
   Table[Mean[RandomReal[{0, 100}, n]], {200}], {35, 65, 1}],
  Plot[200*PDF[NormalDistribution[50, Sqrt[9999/(12 n)]], x], {x, 35, 
    65}],
  ImageSize -> {400, 200}
  ] , {{n, 20, "Sample size"}, 10, 200, 1, Appearance -> "Labeled"}]

I switched Manipulate to Animate to make it an animation.

POSTED BY: Marc Vicuna
Posted 1 year ago

Hi Mark,

I am a high school math teacher. The lesson materials are wonderful to be used in our classroom. There are some simulations in the notebook of day 1. I wonder is there any place that I can download the source code? just like the "normal convergence" you shared.

POSTED BY: Tianyi Hu

Hi John,

You spotted the first error! Indeed, this should be Binomial[4,3]*Binomial[5,2]! You are totally right! The answer is 40.

Thank you for noticing!

Sorry I can't give you a physical medal, but this will have to do. The binomial distribution thanks you for noticing binomial mistakes:

enter image description here

POSTED BY: Marc Vicuna
Posted 1 year ago

Same question here, lol. Should the answer be 40?

POSTED BY: Mr. Khushu
Posted 1 year ago

Marc,

I downloaded your class notebooks from the site today.

Slide 7 of lesson 2 "Consider that the 4 Swiss and 5 Ethiopian athletes want to form a team of 5 to represent them in competition. How many different teams are possible given that it must include 3 Swiss and 2 Ethiopians?" Shouldn't the answer be Binomial[4,3]*Binomial[5,2]?

John

POSTED BY: John Burke

Thank you, Peter, for your question!

The probabilistic statements of quantum mechanics are a useful application of probability theory. This is not approached in this course not because of its difficulty, but because of necessary background knowledge of quantum physics to be able to use those statements. However, the probability of decay is mentioned Lesson and Exercises 17, as it is a classical example for exponential distributions.

Yes, as said in the beginning, this covers the material for Probability courses. However, Statistics are not covered, you will need your courses for that.

As for the probabilities of throwing two dice, there is far more knowledge needed to formulate this than meets the eye. As said in the introduction, the probabilities of dice are explored throughout the lessons. I believe if you seek the exact probabilistic formulation of this problem, you can look at Lesson 22.

Indeed! FactorialPower can be used, you're welcome to do that. However, the goal of Lesson 2 was to get a good understanding of combinatorics, so it seemed more intuitive to just do it explicitly.

POSTED BY: Marc Vicuna

Emails have just now been sent to everyone who is registered for the Daily Study Group. Recordings can be accessed from the webinar series landing page.

POSTED BY: Jamie Peterson

Was told by one of the Q&A moderators that the session recording will be posted . How can we access it to review it ?

POSTED BY: Amin Cheikhi

I am wondering if this course will touch on any aspects of probabilistic statements in quantum mechanics such as the probability the electron is here is 50%. Another application would be the probability the radioisotope decays that expresses radioactivity that is radioactive will be based on the half life.
There are some examples of using Mathematica's probability functions in quantum physics at https://resources.wolframcloud.com/PacletRepository/resources/Wolfram/QuantumFramework/tutorial/ExploringFundamentalsOfQuantumTheory.html.

This course will fit well with my university courses of Applied Probability and Statistics STA 345 and Probability and Statistics I STA 445 and Probability and Statistics II 446.

I think applying probability theory to Catan is interesting. For example there are two dice in Catan with six sides and the most common roll is a 7 because the expectation of a normal dice is 3.5 and 3.5+3.5=7. The 7 causes the robber to make you discard half of your resource and commodity cards rounded down if you have more than the discard limit.

I would like to mention that you can calculate the combinatorial enumeration of all events in the sample space with FactorialPower. For example, with the example from the second presentation Imagine that 4 Swiss and 5 Ethiopian athletes compete for the 200m sprint. How many different top-three winner rankings are possible? FactorialPower[9,3] returns 504 which is the same as 9!/(9-3)!.

POSTED BY: Peter Burbery

Reminder that our upcoming Daily Study Group provides a preview of the new interactive course, Introduction to Probability. The Study Group meets daily over two weeks, Monday through Friday, for an hour online each day, starting Monday. Take advantage of this opportunity to prepare for probability and statistics related coursework and research in natural science, engineering, finance, medicine, data science and other fields! You can sign up here.

POSTED BY: Jamie Peterson

The study group starts next week, on February 27th.

If you want to take full advantage of this course's material and get a practical and deep understanding of probability, don't forget to click on the REGISTER HERE link to get registered in this course!

I'm looking forward to your participation and feedback!

POSTED BY: Marc Vicuna

This study group will be based on the upcoming Introduction to Probability course on Wolfram U.

Marc Vicuna is the instructor for the study group as well as the Wolfram U course and is an outstanding young teacher and data scientist.

I strongly recommend you to join the study group and immerse yourself in probabilistic thinking for two weeks!

POSTED BY: Devendra Kapadia
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract