Group Abstract Group Abstract

Message Boards Message Boards

8
|
9.9K Views
|
29 Replies
|
10 Total Likes
View groups...
Share
Share this post:
GROUPS:

[WSG24] Daily Study Group: Introduction to Machine Learning

A Wolfram U Daily Study Group on Machine Learning begins on June 10, 2024.

We will study and review the first six chapters of the book "Introduction to Machine Learning" by Etienne Bernard. A Wolfram U instructor will guide each session by summarizing the chapter, walking through code examples, polling the group to review key concepts, working on selected exercises and answering questions. Participants are encouraged to read the book chapters before coming to each session.

Whether you are new to machine learning or are looking to further your understanding, this book is a great place to start. It weaves reproducible coding examples into explanatory text to show what machine learning is, how it can be applied and how it works. The book itself begins with a brief introduction to Wolfram Language, the programming language used for the examples throughout the book. From there, students are introduced to key concepts before exploring common methods and paradigms such as classification, regression and clustering. More advanced concepts from later chapters, such as deep learning methods, will be left for a future Study Group. This book and the corresponding Study Group are sure to benefit anyone curious about the fascinating field of machine learning.

Please feel free to use this thread to collaborate and share ideas, materials and links to other resources with fellow learners.

June 10th-14th, 11am-12pm CT (4-5pm GMT)

REGISTER HERE

29 Replies

Hi, regarding the Clustering chapter, I got most of it but have been puzzling over the use of the NotablePerson model to extract probablilities to feed into (I think) the KL Divergence parameters used in the FindClusters command.

It just seems to come out of nowhere, and is not much-explained in the text.

Is this something we can draw some resuable lessons from?

Thanks

POSTED BY: Brendan McMahon

Additional material discussed during the session on clustering today https://www.wolframcloud.com/obj/online-courses/multiparadigm-data-science/cluster-analysis.html

To save the "unsaveable" notebook with your edits, select from the menu Format-> Options inspector and then with "Selected Notebook" selected in the drop down at top left, search for the "saveable" and set it to True. enter image description here

Hi Abrita, What tools are useful to balance data?. My data is 10% - 90% for the 2 categories of the predictor variable.

POSTED BY: Lina M. Ruiz G.

You can try:

  • Resampling your data - oversample the rare class, undersample the common class
  • Use k-fold cross validation, to ensure you have balanced data in each fold
  • Use a different performance metric like Recall or Precision rather than accuracy
  • Cluster the common class and then use a representative sample from each cluster in the final classification to reduce the number of samples from the common class.

Hi Abrita, I have a quick question: what model could I use to identify : How much does a variable contribute to the classification?. My predictor variable is binary and I have 11 explanatory numerical variables. Thanks

POSTED BY: Lina M. Ruiz G.

You can use SHAPValues to figure out the contribution of a feature. It is available as a n option for both PredictorFunction and ClassifierFunction.

 c = 
 Classify[{<|"Age" -> 32, "Height" -> 160|> -> 
    "Female", <|"Height" -> 183, "Age" -> 41|> -> 
    "Male", <|"Height" -> 123|> -> 
    "Female", <|"Height" -> 175, "Age" -> 21|> -> 
    "Male", <|"Age" -> 11|> -> 
    "Male", <|"Age" -> 52, "Height" -> 164|> -> "Female"}]

In[14]:= c[<|"Age" -> 12, "Height" -> 120|>, "SHAPValues"]
Out[14]= <|"Female" -> <|"Age" -> 0.915746, "Height" -> 26.9166|>, 
 "Male" -> <|"Age" -> 1.09201, "Height" -> 0.0371519|>|>

The way SHAP values are computed is basically to see how much results change if different features in the data are dropped.

Posted 1 year ago

Nice!, what if a want some global values. For example, when I used "LogitModelFit" I got a "ParameterTable" with the Estimates for each feature, is possible to have something like that for a Classifier?

POSTED BY: Updating Name

You can use Information with "MethodParameters" option to get the parameter values for the model.

Sorry for mistakenly referring to "Clustering" as Chapter 5 from the book. It is actually Chapter 6. Tomorrow, Thursday June 13th, we will be covering Chapter 6: Clustering.

Posted 1 year ago
POSTED BY: Zachary Wertz

Wolfram Language also has built-in support fro Time Series Analysis that may be of interest for this dataset https://reference.wolfram.com/language/guide/TimeSeries.html

When I try to run the code in the downloaded version of the book, there is a dependency on an external tool, see screen shot. How do I resolve this? enter image description here

POSTED BY: Fredrik Doberl

Sorry about that. For these issues I think it would be better to run the code in the notebooks that contain just the code from here https://www.wolfram.com/language/introduction-machine-learning/inc/Bernard-MachineLearning-Code-NotebookEdition.zip The notebooks containing the entire lesson (text and code) seem to have the issue you have highlighted in your screenshot!

Thanks for a quick answer. I followed the link, downloaded the .zip file. It has the same code in it.

POSTED BY: Fredrik Doberl
Posted 1 year ago

I see that now. Here is the cleaned up code that should work:

Show[Plot[
  QuantityMagnitude@PDF[dist, Quantity[x, "Feet"]], {x, 58, 88}, 
  Filling -> Bottom, PlotRange -> {{0, 140}, {0, Automatic}}, 
  PlotStyle -> Gray, Frame -> {True, True, False, False}, 
  PlotLabel -> "Predictive distribution for 23 mi/h", 
  PlotLegends -> SwatchLegend[{"68% probability interval"}], 
  FrameLabel -> {"distance (ft)", "probablity density"}, 
  ImageSize -> 360], 
 Plot[QuantityMagnitude@PDF[dist, Quantity[x, "Feet"]], {x, 0, 140}, 
  PlotLegends -> {"probability density"}, Filling -> Bottom, 
  PlotRange -> {0, Automatic}]]
POSTED BY: Updating Name

Hi

It looks like we run in to similar issues with tomorrow's session on Clustering.

For example, the News Aggregator example returns errors, please see below,

Russell.

topics = {"Tennis", "SpaceX", "COVID-19 pandemic"};
pages = WikipediaData /@ topics;

articles = Partition[TextSentences[#], 10] & /@ articles;
articles = Map[StringRiffle, articles, {2}]; articles = 
  Flatten[Thread /@ Thread[articles -> topics]]; articletopics = Values[articles];
articles = Keys[articles];
Length[articles]

enter image description here

POSTED BY: Carl Hahn
Attachments:
POSTED BY: Byron Alexander

Thanks for your response. Is each sample not associated with a one-hot vector class? Would it then not be the same amount of samples and classes? Yes there are only two inputs in this example. Could you please advise on how Predict could be used in this case, would it output a list of probabilities associated with the labels (one-hot vectors)?

POSTED BY: Byron Alexander
Posted 1 year ago
POSTED BY: Zachary Wertz

I think statistics is a mathematical language for describing and modeling things. What you use those descriptions and models for, that's engineering and science. Sometimes rhetoric and propaganda. It is sometimes said there are "liars", "damned liars", and "statisticians" . It lets you be wrong with confidence. And yes, ChatGPT does tell lies. On the other hand, if statistics weren't so effective at making truthful descriptions, you would not be able to read these words on paper nor take effective medications.

POSTED BY: Carl Hahn
Posted 1 year ago

Carl,

I like your reply! That's a fantastic point you make. The math or model can be wrong but it also can be useful and accurate.

POSTED BY: Zachary Wertz
POSTED BY: Rohit Namjoshi
Posted 1 year ago

I found it amusing that one of the poll answers proposed ML as an alternative to traditional programming, which was not the expected answer. In that sense, I do agree the correct answer was the most agreeable one.

That said, some of the next poll questions I thought demonstrated the point of the first, which is to say there are many, many ways to use either to solve a problem. In some of those cases, such as writing a procedure, there's a clearly better choice and a clearly more cumbersome one.

I'd put it more like this. Fundamentally, traditional programming languages "decide" what to do based on some collection of rules. They express a primitive aspect of a traditional computer, the logic gate: something is true or false. Complexity in decision-making builds up from there.

Machine learning is intended as a replacement to traditional programming, I argue, but in a particular case. To wit: when the amount of data available is Big, and the values of its features show some propensity for changing (or evolving) over time, machine learning is the more likely choice.

Traditional programming makes decisions through a pathway of rules. Machine learning operates on assigning meaning to data (e.g., labelling, feature reduction), or associating data (e.g., classification), provided there is enough data to determine statistical significance, and then accuracy of its inferences.

I argue you can replace traditional programming with machine learning for the same reasons, in hindsight, we've been using traditional programming in lieu of machine learning. The only reason we didn't, any sooner than we did, was for lack of resources to do it. Now that we have enough to do both, we can freely our waste our time committing to the wrong approach, as we often do, or selecting the more plausible approach.

If you have any experience watching customers choose ML for a project because ML was the hot new thing, I imagine you can see my point of view.

POSTED BY: Updating Name

As with any tool, there are classes of problems that benefit from machine learning and other AI tools. For example automated image recognition of manufacturing defects, or Star Trek's universal translator. (That's got to be right around the corner: Put an earbud in, and hear a person speak in your own language while cancelling the sound of their native language.)

But for classes of problems that already have robust solutions, maybe not so much. Just a quick review of some literature on error correcting codes. It's a pretty advanced field. Folks using AI generated error correcting CODECs were, at least 4 or 5 years ago, getting similar results with both techniques.

Comparing results between methods would be good at providing insight into how deep learning is actually solving problems, and may even provide insight into how organic neuro-nets actually work. Someday we may be (bio)engineering those as well.

POSTED BY: Carl Hahn
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard