Message Boards Message Boards

8
|
3696 Views
|
29 Replies
|
10 Total Likes
View groups...
Share
Share this post:
GROUPS:

[WSG24] Daily Study Group: Introduction to Machine Learning

A Wolfram U Daily Study Group on Machine Learning begins on June 10, 2024.

We will study and review the first six chapters of the book "Introduction to Machine Learning" by Etienne Bernard. A Wolfram U instructor will guide each session by summarizing the chapter, walking through code examples, polling the group to review key concepts, working on selected exercises and answering questions. Participants are encouraged to read the book chapters before coming to each session.

Whether you are new to machine learning or are looking to further your understanding, this book is a great place to start. It weaves reproducible coding examples into explanatory text to show what machine learning is, how it can be applied and how it works. The book itself begins with a brief introduction to Wolfram Language, the programming language used for the examples throughout the book. From there, students are introduced to key concepts before exploring common methods and paradigms such as classification, regression and clustering. More advanced concepts from later chapters, such as deep learning methods, will be left for a future Study Group. This book and the corresponding Study Group are sure to benefit anyone curious about the fascinating field of machine learning.

Please feel free to use this thread to collaborate and share ideas, materials and links to other resources with fellow learners.

June 10th-14th, 11am-12pm CT (4-5pm GMT)

REGISTER HERE

29 Replies
Posted 5 months ago

I found it amusing that one of the poll answers proposed ML as an alternative to traditional programming, which was not the expected answer. In that sense, I do agree the correct answer was the most agreeable one.

That said, some of the next poll questions I thought demonstrated the point of the first, which is to say there are many, many ways to use either to solve a problem. In some of those cases, such as writing a procedure, there's a clearly better choice and a clearly more cumbersome one.

I'd put it more like this. Fundamentally, traditional programming languages "decide" what to do based on some collection of rules. They express a primitive aspect of a traditional computer, the logic gate: something is true or false. Complexity in decision-making builds up from there.

Machine learning is intended as a replacement to traditional programming, I argue, but in a particular case. To wit: when the amount of data available is Big, and the values of its features show some propensity for changing (or evolving) over time, machine learning is the more likely choice.

Traditional programming makes decisions through a pathway of rules. Machine learning operates on assigning meaning to data (e.g., labelling, feature reduction), or associating data (e.g., classification), provided there is enough data to determine statistical significance, and then accuracy of its inferences.

I argue you can replace traditional programming with machine learning for the same reasons, in hindsight, we've been using traditional programming in lieu of machine learning. The only reason we didn't, any sooner than we did, was for lack of resources to do it. Now that we have enough to do both, we can freely our waste our time committing to the wrong approach, as we often do, or selecting the more plausible approach.

If you have any experience watching customers choose ML for a project because ML was the hot new thing, I imagine you can see my point of view.

POSTED BY: Updating Name
Posted 5 months ago

This is awesome!

Often I have opened programming book and right off I'm in the unknown, Now it seems I can start with understanding. Then work towards the parts. There is a lot of parts within ML and statistics so the learning does not stop here. A broad overview seems to be a good start.

Machine Learning vs Statistics. What is the difference between both? That's something I learned right away. That's an way to start understanding without doing any math work myself. That's something I am trying to change my paradigm of thought. With Mathematica you do not need to do the math yourself. So, that is a different way of learning compared to step-by-step ladder. I have to be careful, as Mathematica shows with levels in many places. I can go to a level of understanding or math work. Am I understanding the same at levels? Maybe, a level can look the same to another. I think of the example of thinking of a concept or understanding, zoomed in or zoomed out. Zoom in enough and the picture does not look like anything at a different level.

ML vs Statistics what is a level above both? "Data science" I propose that you could call that a level above both. The process of getting data, then wrangling it. Turning it into something meaningful. Is it to make predictions? ML claims to be the sole purpose for. Is it to make sense of the data? Statistics is the purpose to make sense of the data with logical order or programmatically. So with those definitions you can say that "Data Science" is a level above both. As the AI is a level above everything. The models part of the AI or a level below somewhere. Understanding the where/what you want to do can help work towards a goal of where you work with either ML and statistics and so on. I have never finished a formal class of statistics myself. This definition helps me understand what statistics or ML is.

Here's a question for experienced people in statistics, Would you agree or disagree that statistics is not to make predictions?

POSTED BY: Zachary Wertz
Posted 5 months ago

Firstly thanks for a very interesting series of lectures and discussions. I have a specific neural network example code that I would like to discuss and just need some assistance in finding the best way to approach this example. If not appropriate to discuss in depth here please advise if we can discuss off the forum. I am a looking at a supervised learning classification example of a rotated qubit. Basically an interval [0, pi] of rotation is discretized and measurements of spin-up or spin-down are associated with each each chose rotation angle in this interval. The rotation angle labels are represented as one-hot vectors (that is of the form {1,0,0,0...}). Hence this is a classification problem. I have a working code (attached) but I think it needs refinement (or a different approach). I am new to using Mathematica for machine learning applications.
My training and validation data sets are of the form:

{{975, 25} -> {1, 0, 0, 0, 0, 0, 0, 0, 0, 0}, {919, 81} -> {0, 1, 0, 
   0, 0, 0, 0, 0, 0, 0}, {801, 199} -> {0, 0, 1, 0, 0, 0, 0, 0, 0, 
   0}, {652, 348} -> {0, 0, 0, 1, 0, 0, 0, 0, 0, 0}, {505, 495} -> {0,
    0, 0, 0, 1, 0, 0, 0, 0, 0}, {357, 643} -> {0, 0, 0, 0, 0, 1, 0, 0,
    0, 0}, {190, 810} -> {0, 0, 0, 0, 0, 0, 1, 0, 0, 0}, {108, 
   892} -> {0, 0, 0, 0, 0, 0, 0, 1, 0, 0}, {15, 985} -> {0, 0, 0, 0, 
   0, 0, 0, 0, 1, 0}, {0, 1000} -> {0, 0, 0, 0, 0, 0, 0, 0, 0, 1}}

The tuples on the left are basically the data of how many spin-up and spin-down outcomes are recorded for each angle. Basically I simulate 1000 measurements for each rotation angle (10 rotation angles in the above training set but 40 in the attached code), and associate the outcomes of spin-up or spin-down with the rotation angle label (one-hot vectors). I then test the code for {1,0} and {0,1}, that is one spin-up and one-spin down and plot the associated angles probabilities. If you have a chance please have a look at my code and how I trained the data in the attached code. The output I am interested in is an accurate probability distribution for each bin (theta angle) given {1,0} and {0,1} after training. Any insights and tips for improving the accuracy of the neural network for this classification example would be appreciated.

Attachments:
POSTED BY: Byron Alexander

I think statistics is a mathematical language for describing and modeling things. What you use those descriptions and models for, that's engineering and science. Sometimes rhetoric and propaganda. It is sometimes said there are "liars", "damned liars", and "statisticians" . It lets you be wrong with confidence. And yes, ChatGPT does tell lies. On the other hand, if statistics weren't so effective at making truthful descriptions, you would not be able to read these words on paper nor take effective medications.

POSTED BY: Carl Hahn

When I try to run the code in the downloaded version of the book, there is a dependency on an external tool, see screen shot. How do I resolve this? enter image description here

POSTED BY: Fredrik Doberl

Sorry about that. For these issues I think it would be better to run the code in the notebooks that contain just the code from here https://www.wolfram.com/language/introduction-machine-learning/inc/Bernard-MachineLearning-Code-NotebookEdition.zip The notebooks containing the entire lesson (text and code) seem to have the issue you have highlighted in your screenshot!

Thanks for a quick answer. I followed the link, downloaded the .zip file. It has the same code in it.

POSTED BY: Fredrik Doberl

Hi

It looks like we run in to similar issues with tomorrow's session on Clustering.

For example, the News Aggregator example returns errors, please see below,

Russell.

topics = {"Tennis", "SpaceX", "COVID-19 pandemic"};
pages = WikipediaData /@ topics;

articles = Partition[TextSentences[#], 10] & /@ articles;
articles = Map[StringRiffle, articles, {2}]; articles = 
  Flatten[Thread /@ Thread[articles -> topics]]; articletopics = Values[articles];
articles = Keys[articles];
Length[articles]

enter image description here

Posted 5 months ago

Carl,

I like your reply! That's a fantastic point you make. The math or model can be wrong but it also can be useful and accurate.

POSTED BY: Zachary Wertz
Posted 5 months ago

https://www.eia.gov/petroleum/weekly/

There's some data here that I can use, I have not attempted to use it. This is a challenge goal for me before I spend time wrangling the data, what ML predictions would I want to look for?

Should we use a Classify or Regression model of projected fuel prices for the next 5 years (more relevant to us because knowing this will help us prepare for the prices to either increase or fluctuate). Do we want to look further out 10 years? 20 years?

Today's lecture would suggest Regression since it's a numerical model that we are looking at. Giving us a price of fuel.

What ML methodologies wold we approach making a model trying to predict future gas prices? This data goes back up to 1980's for some of it.

Another question, are the fuel prices histrionically in line with inflation? Would we want to look for comparison data and make another ML model or make a simple assumption on the percent of inflation?

POSTED BY: Zachary Wertz

Sorry for mistakenly referring to "Clustering" as Chapter 5 from the book. It is actually Chapter 6. Tomorrow, Thursday June 13th, we will be covering Chapter 6: Clustering.

As with any tool, there are classes of problems that benefit from machine learning and other AI tools. For example automated image recognition of manufacturing defects, or Star Trek's universal translator. (That's got to be right around the corner: Put an earbud in, and hear a person speak in your own language while cancelling the sound of their native language.)

But for classes of problems that already have robust solutions, maybe not so much. Just a quick review of some literature on error correcting codes. It's a pretty advanced field. Folks using AI generated error correcting CODECs were, at least 4 or 5 years ago, getting similar results with both techniques.

Comparing results between methods would be good at providing insight into how deep learning is actually solving problems, and may even provide insight into how organic neuro-nets actually work. Someday we may be (bio)engineering those as well.

POSTED BY: Carl Hahn

There does seem to be a bug in the "News Aggregator" example code. Try the following:

articles = Partition[TextSentences[#], 10] & /@ pages;

instead of

articles = Partition[TextSentences[#], 10] & /@ articles;

I was able to retrieve 162 articles.

Posted 5 months ago

I see that now. Here is the cleaned up code that should work:

Show[Plot[
  QuantityMagnitude@PDF[dist, Quantity[x, "Feet"]], {x, 58, 88}, 
  Filling -> Bottom, PlotRange -> {{0, 140}, {0, Automatic}}, 
  PlotStyle -> Gray, Frame -> {True, True, False, False}, 
  PlotLabel -> "Predictive distribution for 23 mi/h", 
  PlotLegends -> SwatchLegend[{"68% probability interval"}], 
  FrameLabel -> {"distance (ft)", "probablity density"}, 
  ImageSize -> 360], 
 Plot[QuantityMagnitude@PDF[dist, Quantity[x, "Feet"]], {x, 0, 140}, 
  PlotLegends -> {"probability density"}, Filling -> Bottom, 
  PlotRange -> {0, Automatic}]]
POSTED BY: Updating Name

The DNA Hierarchical Clustering example isn't working right either.

I got a message (after downloading for a long time):

EntityValue::nodat: Unable to download data. Some or all results may be missing.

Some of the functions later in the example worked but Clustering Tree and Dendrogram did not :-(

POSTED BY: Carl Hahn

Hi Abrita, I have a quick question: what model could I use to identify : How much does a variable contribute to the classification?. My predictor variable is binary and I have 11 explanatory numerical variables. Thanks

POSTED BY: Lina M. Ruiz G.

Hi Abrita, What tools are useful to balance data?. My data is 10% - 90% for the 2 categories of the predictor variable.

POSTED BY: Lina M. Ruiz G.

You can use SHAPValues to figure out the contribution of a feature. It is available as a n option for both PredictorFunction and ClassifierFunction.

 c = 
 Classify[{<|"Age" -> 32, "Height" -> 160|> -> 
    "Female", <|"Height" -> 183, "Age" -> 41|> -> 
    "Male", <|"Height" -> 123|> -> 
    "Female", <|"Height" -> 175, "Age" -> 21|> -> 
    "Male", <|"Age" -> 11|> -> 
    "Male", <|"Age" -> 52, "Height" -> 164|> -> "Female"}]

In[14]:= c[<|"Age" -> 12, "Height" -> 120|>, "SHAPValues"]
Out[14]= <|"Female" -> <|"Age" -> 0.915746, "Height" -> 26.9166|>, 
 "Male" -> <|"Age" -> 1.09201, "Height" -> 0.0371519|>|>

The way SHAP values are computed is basically to see how much results change if different features in the data are dropped.

You can try:

  • Resampling your data - oversample the rare class, undersample the common class
  • Use k-fold cross validation, to ensure you have balanced data in each fold
  • Use a different performance metric like Recall or Precision rather than accuracy
  • Cluster the common class and then use a representative sample from each cluster in the final classification to reduce the number of samples from the common class.

To save the "unsaveable" notebook with your edits, select from the menu Format-> Options inspector and then with "Selected Notebook" selected in the drop down at top left, search for the "saveable" and set it to True. enter image description here

Posted 5 months ago

Nice!, what if a want some global values. For example, when I used "LogitModelFit" I got a "ParameterTable" with the Estimates for each feature, is possible to have something like that for a Classifier?

POSTED BY: Updating Name

Additional material discussed during the session on clustering today https://www.wolframcloud.com/obj/online-courses/multiparadigm-data-science/cluster-analysis.html

Posted 5 months ago

"All models are wrong, but some are useful". - George Box.

POSTED BY: Rohit Namjoshi

That was what my professor said in my very first class on machine learning :)

You can use Information with "MethodParameters" option to get the parameter values for the model.

Hi, regarding the Clustering chapter, I got most of it but have been puzzling over the use of the NotablePerson model to extract probablilities to feed into (I think) the KL Divergence parameters used in the FindClusters command.

It just seems to come out of nowhere, and is not much-explained in the text.

Is this something we can draw some resuable lessons from?

Thanks

POSTED BY: Brendan McMahon

Wolfram Language also has built-in support fro Time Series Analysis that may be of interest for this dataset https://reference.wolfram.com/language/guide/TimeSeries.html

How many possible classes do you expect to have? If each one-hot encoded vector is a possible class then it seems like many more classes than samples. Would it be possible to aggregate the classes into fewer "super-classes"?

Also your dataset has only two input variables right? You are definitely welcome to use a Neural Net model, but perhaps Predict would be easier to use with its automated approach to feature extraction, and access to multiple other methods.

Posted 5 months ago

Thanks for your response. Is each sample not associated with a one-hot vector class? Would it then not be the same amount of samples and classes? Yes there are only two inputs in this example. Could you please advise on how Predict could be used in this case, would it output a list of probabilities associated with the labels (one-hot vectors)?

POSTED BY: Byron Alexander
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract