Message Boards Message Boards

[DATASCIFRI] Data Science Friday Webinars to Dive into Machine Learning

Our Data Science Friday webinars launched today, providing a good starting point for further exploration of machine learning topics. Today's topic was "Getting Your Data Ready for Automated Machine Learning," and you can access the session recording by signing up for the webinar series. Next Friday we'll cover "Machine Learning and Statistics: Better Together" with @Jon McLoone, and we'll dive deeper into machine learning functionality at upcoming sessions. Thanks to @Abrita Chakravarty for today's presentation. Post any follow-up questions or comments!

REGISTER HERE: https://www.bigmarker.com/series/data-science-friday-webinars/
About This Webinar Series: Join us each Friday to learn about leveraging the power of the Wolfram Language for data science. In the first collection of this multipart series, we will focus on the wide range of state-of-the-art integrated machine learning capabilities available in the Wolfram Language. We'll start with the first steps of importing data from many different sources and getting it ready for machine learning. Then we will look at highly automated functions like Predict, Classify, AnomalyDetection and FeatureExtraction, as well as the powerful symbolic Wolfram Neural Net Framework. The series will also include pointers to freely available repositories of neural net models and curated computable datasets, so you can jump right in and start your data science explorations. Each webinar is led by a Wolfram certified instructor and content expert who will cover weekly topics, poll the audience about the addition of new topics for the series and answer your questions.

enter image description here

POSTED BY: Jamie Peterson
42 Replies
Posted 2 years ago

Thanks a lot :-) Also, the links that are posted during the webinar in the chat, are unavailable for those who cannot attend at the live webinar, but like me, have to watch later. Please provide us as well with the link to the VideoClass "building apps with Neuralnet Depositery" with Megan Sheer. Thanks.

POSTED BY: B. Cornas
Posted 2 years ago

I always have to watch the recording of the webinar, as I can never attend, unfortunately. But the last webinar on may 6th is not playable. I can reach the page, but all I get "The webinar ended" and no possibility to play the recording. Even with my VPN switched off.

I would appreciate a valid link.

Thanks.

POSTED BY: B. Cornas

Sorry for the delay. The recording from the May 6 webinar has now been published. Use your unique link from your webinar email notifications, or click here: Data Science Friday webinar (May 6)

POSTED BY: Jamie Peterson

Hello,

Thank you for organizing this series!!

I was wondering whether there is a NN model that is trained to scan text from a pdf and do character recognition to obtain a text document from it. More specifically, what I would like to do is scan old typewritten mathematical text and produce a latex document from it. But a first step is to be able to do character recognition from a pdf'ed text. Or is there a function in Mathematica that can do this? For the kind of mathematical text I am trying to recognize and put into LaTex, I am uploading a little piece of it (source: AMS)

Attachment

Attachments:

@Enrique Garcia Moreno E.

@Bob Sandheinrich reached out to the Machine Learning team and found these three models you could look at (Note: They are not in our Neural Net Repository):

  1. https://github.com/luopeixiang/im2latex
  2. https://github.com/lukas-blecher/LaTeX-OCR
  3. https://github.com/da03/Attention-OCR

Thank you Abrita!!

Posted 2 years ago

I am refering back to the "Machine Learning and Statistics: Better Together Talk" by @Jon McLoone. For me the best talk in the series.

How can I increase the image size in the different classifier layers? In the examples Jon gave, the image has to be small to start with and gets downsized or kept the same size by the different classifier layers. My goal is to do the same things, but with images of bigger size (which will of course take more time, but that is ok for me). See it as a way to do a sort of image processing in the different layers and then pick one that I like to work on (visual art).

So, how can I have bigger images at the input and keep these dimensions as much as possible?

POSTED BY: B. Cornas

@B. Cornas are you refering to the example using NetModel["Wolfram ImageIdentify Net V1"] on slide 22 (that classifies the image of a tiger and looks at the output from the intermediate layers of the neural network)?

If yes, then it is possible to perform similar Image Classification tasks with out-of-core images. The Neural Net Framework is able to work with large images that may need to stored out-of-core.

An example can be found here: https://www.wolfram.com/language/11/neural-networks/out-of-core-image-classification.html where you will notice that the training data is an association of File expressions (a symbolic representation of a location in the local file system) and labels instead of the actual image itself.

The trained network is also used to classify images directly from file paths: enter image description here

Posted 2 years ago

Thanks Abrita for your answer. I do indeed refer to the slide you mention. But I had a completely different meaning with my question. I'll try to be more clear.

First of all, forget what everybody thinks of doing with NeuralNets.

Second, I am totally not interested in the result this Net gives.

I want to use it completely different : I want to use the Net for creating abstractions of the original image.

I want to look at the different intermediate layers and see what kind of 'abstractions' they have made from an image that I input. Then I want to select one or more of these abstractions (results in one of the layers) and artistically work with that abstraction from there.

But I need a high resolution. I cannot blow up an abstraction (image) of 224x224 pixels to 6000x6000 pixels without ugly artifacts. So I want to adatp the layers to accept and work with images of a as big as possible size. Adapting the layers, so that they do not reduce the size of the image and thus that the different abstractions in the different layers are of a big pixel size.

I do not need to train or retrain the net.

Ok, I realise that I am probably asking for too much here, as the number of connections will explode very quick. Already a doubling in size might be hard.

So I was thinking of the following : I run the Net on a small version of my selected image. Peep into Layer 'X' and select an abstraction that I like from that layer.

Now I extract the workings(e.g. the Convolution) and parameters (weights) from that layer + previous layers and apply this to the big version of my image. I should get more or less the same result in the big image, as in the small abstraction from the layer that I have selected.

So my question would be : How to extract the workings (layer algorithm) and weights from the leyers involved? It might be that I'll have to scale up certain parameters when appying them to the big image (like maybe the Convolution kernel and other stuff). That I can try out myself of course.

Maybe it helps to realise that I am a visual artist who uses programming, among other techniques, for his work.

Thanks for your help.

POSTED BY: B. Cornas

Here is the blog post (sorry not community post) I was referring to in today's session: https://blog.wolfram.com/2021/01/07/deploy-a-neural-network-to-your-ios-device-using-the-wolfram-language/

Is there a Repository (Neural Net) to identify trees and plants?

Thanks,

Mitch Sandlibn

POSTED BY: Mitchell Sandlin

@Mitchell Sandlin Unfortunately we do not seem to have such a model in our neural net repository, but I found a couple of academic papers that might be of interest (both seem to use Convolution Neural Nets):

Looking forward to today's session on the Wolfram Data repository and Wolfram Neural Net Repository with @Bob Sandheinrich

Join us at https://www.bigmarker.com/wolfram-u/data-science-friday-ready-to-use-datasets-and-neural-net-models

Hi Patrick, The "lenet-CIFAR-10.wlnet" file has been included in the zip file that you downloaded. Could you please confirm that after you unzipped the download, the notebook "ExploringNNFramework-BuildingToTraining.nb" and the file "lenet-CIFAR-10.wlnet" appear in the same location?

If yes, then

Import[FileNameJoin[{NotebookDirectory[],"lenet-CIFAR-10.wlnet"}]

should be able to load the simple network modeled after Lenet into your session and you can work further with it.

Yes the model is slightly different from the network shown on the slide 4. Here is the code to create it yourself:

lenet = 
   NetChain[{
      ConvolutionLayer[20, 5], 
      Ramp, 
      PoolingLayer[2, 2], 
      ConvolutionLayer[50, 5], 
      Ramp, 
      PoolingLayer[2, 2], 
      FlattenLayer[], 
      500, 
      Ramp, 
      10, 
       SoftmaxLayer[]
   }, 
  "Output" -> NetDecoder[{"Class", classes}], 
  "Input" -> NetEncoder[{"Image", {32, 32}}]]

You can also find it in the tutorial at https://reference.wolfram.com/language/tutorial/NeuralNetworksComputerVision.html#280210622 within the example demonstrating "CIFAR-10 Object Classification".

Posted 2 years ago

Thank you - this is very helpful!
Pat

POSTED BY: Updating Name
Posted 2 years ago

First, thank you for the presentation earlier today.

lenet is used throughout the presentation notebook. However, when I run the notebook from the beginning I'm getting errors.

At the end of the notebook there is an Initialization cell:

trained=lenet=Import[FileNameJoin[{NotebookDirectory[],"lenet-CIFAR-10.wlnet"}]];

I have some questions.

Is the lenet-CIFAR-10.wlnet an empty network that is assigned to trained and lenet?

Is the file essentially the NetChain statement on slide 4?

POSTED BY: Patrick Brooks

Looking forward to tomorrow's presentation by @Giulio Alessandrini:

Exploring the Neural Network Framework from Building to Training

Friday April 29 at 1:00 PM CT.

Join us for a beginner-friendly introduction to the Neural Net Framework in the Wolfram Language and a step-by-step walk-through of a simple image classification network using the CIFAR-10 dataset.

Posted 2 years ago

Hi! Why I'm I getting this error when evaluating the code in "Human in the Loop: Interpretable Machine Learning" notebook? enter image description here

POSTED BY: Damir Borkovic

Hi Damir,

Two questions:

  1. What version of the Wolfram Language are you using?

  2. Can you confirm, the following lines of code have been successfully run prior to this evaluation?

    wine = RandomSample[ResourceData["Sample Data: Wine Quality"]];
    {winetest, winetrain} = TakeDrop[wine, 100];
    predictor = Predict[winetrain -> "WineQuality", Method -> "LinearRegression"]
    

If you continue to face issues, we may have to just reformat the data into one of the other formats acceptable as input to Predict - like this:

wine = Rest[#] -> First[#] & /@ 
  Normal[Values@
    RandomSample[ResourceData["Sample Data: Wine Quality"]]]
{winetest, winetrain} = TakeDrop[wine, 100];
predictor = Predict[winetrain, Method -> "LinearRegression"]
wineMeasurements = PredictorMeasurements[predictor, winetest]
Posted 2 years ago

Hi! 1. I'm using 12.0.0.0 version. 2. Yes, as you can see in the picture attached enter image description here 3. The formatting you are suggesting is working 4. Can I ask one more question. What is the difference between LearnDistribution and Predict? Can I get the same probability distribution contour plot (for 2D data) with Predict as for LearnDistribution?

POSTED BY: Damir Borkovic
Posted 2 years ago

A methodological question on using Predict versus Classify. I am working on a model to forecast attendance in a non-contractual setting, which effectively means that all the data I have is in the historic attendance.

The attendance per participant can be recorded in “binary” format so for example the first event the attendance was 1 and the second 0 so we can create list with attendance per participants as so {{1,0},{2,0},{3,1}..}.

As far as I can see, I can use Predict to forecast each individual’s attendance series, or I can use Classify. If I use the latter, I can treat each event as an "attended" (value 1) versus "unattended" (value 0) class.

Are both methods equally applicable for this case or is there reason why I should use one and not the other?

POSTED BY: Dave Middleton

It is helpful to think of the problem as attempting to answer a question of s specific type and then choose the function to help answer it. Classify is good for answering questions like "Is this A or B (or C or D or E)?" while Predict is good for answering "How much or How many?"

So if you want to forecast the attendance of a person as "attended" or "unattended" (Is this A or B), Classify would be the way to go. While "attended" is represented by 1 and unattended is represented by 0, they are still categorical or nominal features, NOT numeric features.

If you want to forecast "how many" of the events the person is likely to attend, Predict would be the way to go.

How can we visualize decision trees in mathematica? Do you have an example of how to see decision trees?

There is no way to visualize the decision tree from the model produced by Predict or Classify yet. However you can get some detailed information about the decision tree model as follows:

In[1]:= titanic=ResourceData["Sample Data: Titanic Survival"];
c=Classify[titanic->"SurvivalStatus",Method->"DecisionTree"]
Out[2]= ClassifierFunction[Input type: {Nominal,Numerical,Nominal}
Classes: died,survived ]

In[3]:= Information[c,"MethodOption"]
Out[3]= Method->{DecisionTree,DistributionSmoothing->1,FeatureFraction->1}

In[4]:= c[[1]]["Model"]
Out[4]= <|Tree->MachineLearning`DecisionTree[Number of nodes: 225
Number of leaves: 113],Processor->EmbedNominalVector\[ThinSpace]\[RightArrow]\[ThinSpace]MergeVectors\[ThinSpace]\[RightArrow]\[ThinSpace]Values,Calibrator->MachineLearning`CalibratorFunction[TemperatureScaling,<|Theta->0.462231|>],Method->DecisionTree,PostProcessor->Identity,Options-><|DistributionSmoothing-><|Value->1,Options-><||>|>,FeatureFraction-><|Value->1,Options-><||>|>|>|>

In[5]:= c[[1]]["Model"]["Tree"]//FullForm
Out[5]//FullForm= MachineLearning`DecisionTree[Association[Rule["FeatureIndices",NumericArray[List[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6],"Integer16"]],Rule["NumericalThresholds",List[-1.1093288462606128`,-1.09195429877913`,-1.0851267088215946`,-1.0215090313921906`,-0.9933393992363942`,0.05578158707301903`,0.06349617385960019`,0.10698518650736097`,0.1438680732723027`,1.1244347232941254`,1.129311963025544`,1.1409925716270861`,1.15777655085511`,1.1587633392580745`,1.171232551579825`,1.1853282036306936`,1.1884314762737993`,1.1982708136799285`,1.1986830729887432`,1.2049541955720044`,1.2138669812231202`,1.2333737225205454`,1.236920699175986`,1.2478166559325916`,1.2631421784212422`,-0.6521623222004684`,-0.6521622295408744`,-0.6521620242386478`,-0.6521614585949597`,-0.2030572009451387`,-0.2030570377237761`,-0.2030568866336571`,-0.20305687163865116`,-0.20305567453469475`,0.09869252461898023`,0.09869258425986824`,0.09869280222533691`,0.09869314297553534`,0.09869319931483313`,0.09869334503472878`,0.09869336303013855`,0.09869368571235419`,0.09869371251476487`,0.09869380755036536`,0.09869415456637161`,0.09869424883269172`,0.0986943357839214`,-0.4474913175797399`,-0.4474912970155966`,-0.44749107051057185`,-0.4474909202360258`,-0.44749072029312253`,-0.44749061010480323`,-0.44748981363609325`,-0.44748980211170947`,0.08446763612574955`,0.0844677320775505`,0.08446790266898689`,0.08446834154121462`,0.3972744078903446`,0.3972755238204073`,-0.5700892529280701`,-0.5700891448269508`,-0.5700888172353218`,-0.5700887791484793`,-0.5700886427694413`,-0.5700884122166382`,-0.570088108842751`,-0.570088095932906`,-0.5700879219241077`,-0.57008740072637`,0.3058445670476174`,0.30584481890947207`,0.30584529998720517`,0.3058456072587361`,0.3058456162007218`,0.30584584608112214`,0.305845931628907`,1.0462073237997662`,1.04620740639019`,1.0462074099182053`,1.0462087429290876`,-0.2859567735059471`,-0.2859564694186271`,-0.2859563025212852`,-0.2859562800495861`,-0.2859558419383742`,-0.285955792484437`,-0.2859552341329527`,-0.2859550904184345`,-0.2859548527481892`,-0.28595478781779243`,0.17642749014618417`,0.1764279660974715`,0.17642797433799567`,0.17642797520252884`,0.17642838872232613`,0.1764284324942041`,0.17642849330162316`,0.17642893372111212`,0.17642894339225756`,-0.3600153563844246`,-0.3600152647924192`,-0.3600147157403142`,-0.3600135110979804`,-0.36001340559709355`,-0.003911936948213508`,-0.003910764595476518`,-0.003910548793407916`,-0.00391051413862169`,-0.003910428801723417`,-0.00391020447817348`]],Rule["NominalSplits",List[]],Rule["Children",NumericArray[List[List[-7,-8],List[1,76],List[36,71],List[-91,95],List[70,88],List[45,49],List[-36,-37],List[-39,-40],List[-50,-51],List[106,90],List[-111,17],List[-78,-79],List[12,-54],List[10,47],List[-44,-45],List[-58,-59],List[-112,-113],List[67,87],List[-87,-88],List[-101,-102],List[56,-13],List[-55,-56],List[34,59],List[-41,-42],List[68,-100],List[105,-75],List[104,29],List[26,91],List[-98,-99],List[9,-35],List[20,-90],List[-6,21],List[-27,-28],List[-29,8],List[82,55],List[51,2],List[-81,-82],List[93,6],List[107,46],List[-92,-93],List[50,-48],List[40,-84],List[-52,-53],List[-61,-62],List[112,108],List[101,-21],List[77,89],List[-85,-86],List[15,97],List[-70,-71],List[-2,-3],List[75,42],List[-63,-64],List[98,-31],List[99,-1],List[-17,-18],List[33,-16],List[57,23],List[-30,24],List[103,69],List[-72,-73],List[-4,102],List[-14,5],List[-38,16],List[-49,61],List[64,-26],List[84,-94],List[-108,11],List[18,27],List[65,-34],List[62,83],List[-67,-68],List[43,13],List[72,100],List[-83,4],List[-9,-10],List[-23,78],List[22,80],List[31,-80],List[79,-57],List[96,58],List[81,32],List[-5,14],List[-105,86],List[-106,-107],List[-109,-110],List[-95,-96],List[60,28],List[-24,-25],List[73,-22],List[-76,-77],List[3,35],List[-19,54],List[66,-15],List[-103,-104],List[94,-12],List[-46,-47],List[109,-43],List[38,39],List[48,-69],List[74,41],List[-11,63],List[-74,19],List[85,-97],List[25,-89],List[30,7],List[-20,111],List[53,110],List[-60,37],List[-65,-66],List[-32,-33],List[52,44]],"Integer16"]],Rule["LeafValues",NumericArray[List[List[11,1],List[2,2],List[1,15],List[1,3],List[3,5],List[4,1],List[2,2],List[8,1],List[1,5],List[4,1],List[1,3],List[1,86],List[2,1],List[12,1],List[2,1],List[4,1],List[2,2],List[1,9],List[3,1],List[1,3],List[6,1],List[1,5],List[55,1],List[4,1],List[1,2],List[1,26],List[2,2],List[1,10],List[1,28],List[1,3],List[2,1],List[16,1],List[3,2],List[1,6],List[338,43],List[1,3],List[3,1],List[1,12],List[1,6],List[2,2],List[3,1],List[1,2],List[2,1],List[1,4],List[2,1],List[15,1],List[1,2],List[1,10],List[1,3],List[4,1],List[1,3],List[1,3],List[2,1],List[1,2],List[1,3],List[3,1],List[39,1],List[3,1],List[1,2],List[1,18],List[1,5],List[2,1],List[1,10],List[2,2],List[4,1],List[1,3],List[3,1],List[1,5],List[1,2],List[3,1],List[1,4],List[7,1],List[2,2],List[3,1],List[1,3],List[22,1],List[1,2],List[2,2],List[10,1],List[1,2],List[3,1],List[1,5],List[10,1],List[1,4],List[2,2],List[11,1],List[2,2],List[1,7],List[1,2],List[20,1],List[1,3],List[1,3],List[4,1],List[6,1],List[16,1],List[1,2],List[1,6],List[6,1],List[1,2],List[2,2],List[3,1],List[1,2],List[4,1],List[1,2],List[1,5],List[3,1],List[1,2],List[13,1],List[3,1],List[1,2],List[5,1],List[1,3],List[4,1]],"UnsignedInteger16"]],Rule["RootIndex",92],Rule["NominalDimension",0]]]

Hello @Tsai Ming-Chou,

No you do not need to check the data distribution beforehand and there is no requirement for the data to be from a normal distribution when you use a functions like Classify or Predict. The automated machine learning functions are capable of handling any type of data from any distribution. You do have the option of specifying a FeatureExtractor with Classify (or Predict), if you have some specific thoughts about preprocessing the input features. Else Classify (or Predict) will automatically choose a suitable feature extractor based on the data.

LinearModelFit is specific in that it can only fit a linear regression model and can only work with numeric data. Classify or Predict can work on any type of data (numbers, text, images etc.) and has more options available in terms of the type of algorithm you want to use (DecisionTree, NaiveBayes, etc.)

Looking forward to the next session in the Data Science Friday series: "Supervised and Unsupervised Learning" on Friday, April 8 at 1:00 PM Central Time (US & Canada) (GMT -5:00)

See you there.

Just want to let everyone know that we've posted an improved and complete recording of @Jon McLoone's presentation from last week. To access recordings, go to the series page, scroll down to find the title you want to see, and click on "Start this section." The video is embedded on each webinar page. Abrita also plans to revisit this final section of Jon's presentation at the end of this week's webinar. Hope to see you there!

POSTED BY: Jamie Peterson

May I ask when using machine learning in Mathematica?
Do we need to check the data distribution beforehand (or even convert it to a normal distribution) like traditional statistics?
Is it the same when using directives like LinearModelFit?

POSTED BY: Tsai Ming-Chou

Jamie - thanks for the update.

POSTED BY: Charles Glover

Sorry for the confusion about where you can download presentation notebooks. We will include the link in all upcoming reminders, so it's easy to find. A reminder will be sent today, so please watch for that in your inbox. All the notebooks for the full series will be placed in a single location, so you can download the latest each week.

We want to be sure everyone who is interested is signed up for the series, so that you receive all the details you need. You will receive reminder emails, recording notifications and further resources. Sign up for Data Science Friday Webinars.

POSTED BY: Jamie Peterson
Posted 2 years ago

Hi! I did get an event reminder but no download link?

POSTED BY: Damir Borkovic

@Charles Glover and @John Snyder we will send out an event reminder email with the link to the download folder containing all the notebooks from the entire webinar series.

Posted 2 years ago

I signed up for the Friday sessions, but I am busy during the day and have been watching the published videos as they become available. I would like to have access to the Mathematica notebooks, but I can't find any link to them on-line. Can you please send me a link to the notebooks for the first two sessions? Thank you very much.

POSTED BY: John Snyder

@Abrita Chakravarty, I missed the link to the notebook for this presentation. Would you please post it.

Thanks

POSTED BY: Charles Glover

We did run into some issues with the presenter's internet connection towards the end of the session. We are working on preparing a recording of the presentation (without the audio dropout and screenshare issues) to share with our attendees.

Posted 2 years ago

The "Machine Learning and Statistics: Better Together Talk" by @Jon McLoone on Friday April 1 at 1:00 PM CT had many technical errors during the presentation. It would be nice to get a cleaned up version via email to watch later. In other words, please fix the audio and video by re-recording the webinar if you don't mind. I don't mind you extending the webinar next week by 30 minutes, but in the release of the webinar for later viewing, please take the time to fix the presentation by elimination of the technical problems from the video and replace these sections with the material that the presenter meant to present. Thanks!

POSTED BY: Nancy Wilkens

Looking forward to the "Machine Learning and Statistics: Better Together Talk" by @Jon McLoone on Friday April 1 at 1:00 PM CT.

If you have any questions from last Friday's talk "Getting your Data Ready for Automated Machine Learning", please post them here.

I missed the download link of the notebook for the "Machine Learning and Statistics: Better Together Talk."
Can you help! Thanks!

POSTED BY: Tsai Ming-Chou

Let us know if you have not received the links to the download notebooks over email. Best, Abrita

Hello, @Luis Antonio Gonzalez, I see that you signed up yesterday and that you missed the notebook link. I will send that to you by email. I also encourage you to register for the full series, so that you receive reminders and emails with links to all the upcoming topics. Sign up for the Data Science Friday webinar series here.

POSTED BY: Jamie Peterson

Hi. @AbritaChakravarty I registered for the seminar. I already saw the first video that has the name Getting Your Data Ready for Automated Machine Learning.

could you share the mathematica notebook?

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract