Message Boards Message Boards

1
|
11512 Views
|
4 Replies
|
5 Total Likes
View groups...
Share
Share this post:

How to import a Dataset ?

Posted 10 years ago

Please, How to import a Dataset in Mathematica for Machine Learning applications; actually I'm thinking about this question since I've seen the built-in function ExampleData[]:

In[1]:= ExampleData["MachineLearning"]

Out[1]= {{"MachineLearning", "BostonHomes"}, {"MachineLearning", 
  "FisherIris"}, {"MachineLearning", "MNIST"}, {"MachineLearning", 
  "MovieReview"}, {"MachineLearning", "Mushroom"}, {"MachineLearning",
   "Satellite"}, {"MachineLearning", "Titanic"}, {"MachineLearning", 
  "UCILetter"}, {"MachineLearning", "WineQuality"}}

All these are imported Datasets .
Thank you.

POSTED BY: Megri Youcef
4 Replies
Posted 10 years ago

Hi Marco,
Thank you for your good ideas,
As you've said there is a wonderful and rich events to wait for in the near future, Please take a look at this link: http://datadrop.wolframcloud.com/
It's a suprisingly useful dynamic Data project:

The Wolfram Data Drop is an open service that makes it easy to accumulate data of any kind, from anywhere—setting it up for immediate computation, visualization, analysis, querying or other operations.

It uses the powerful function SemanticImport through WDF (Wolfram Data Framework), as you've mentioned :

From personal experience I would also think that the SemanticImport function is really useful to work with external data.

About your last question, I tried to import an Image Data Set from the same site Mathematica took the MNIST Data Set as an example: (THE small NORB DATASET) ---> http://www.cs.nyu.edu/~ylclab/data/norb-v1.0-small/
I want to use it in the same way Mathematica used the MNIST Data set, the machine Learning applications newly implemented in Mathematica.
But unfortunatly I didn't succeed in achieving this, I think I don't master the Mathematica language to do this, so I asked for Help.

That's it :)

POSTED BY: Megri Youcef

Dear Megri,

I guess that the import will depend a bit on the type of data base. You seen the MNIST does not come in a standard format. At the bottom of the page you mention they tell you how the format is and what you need to program to extract it.

The format for different data bases will be different. Sometimes you might want to read data from a website. You then can use import. See for example this article: http://community.wolfram.com/groups/-/m/t/435736

Sometimes you might take data from an API see for example here: http://community.wolfram.com/groups/-/m/t/344241

Some databases are easier to access than others. If you want to access data from the Office for National Statistics in the UK, it is actually quite easy. You head to the data section and download a file in a standard format, usually Excel, and after saving the you use a simple import. This command does this for some crime data:

data = Import["http://www.ons.gov.uk/ons/rel/crime-stats/crime-statistics/period-\ending-march-2014/rft-table-5.xls"];

You then will have to figure out what the numbers mean, i.e. read meta data or explanations within the dataset, and you are good to go.

Accessing different databases usually requires some form of manual work. I don't think that there is a general procedure available right now that does all of that automatically. In fact Conrad Wolfram has some nice ideas about how making data more accessible would transform society/democracy. One of the problems is that there is no common standard yet to make all these datasets immediately computable - so some work is required. There is a really interesting workshop coming up that show how to use the Wolfram Language to use external data. Also, from what I hear the new Data Science Platform might be really useful.

From personal experience I would also think that the SemanticImport function is really useful to work with external data. It worked very nicely for some projects.

I have used large data sets from NASA or gigabytes of data from police forces to analyse data. Once you have imported and cleaned the data all this is possible even for large data set.

Is there any specific data set you want to access?

Best wishes,

Marco

POSTED BY: Marco Thiel
Posted 10 years ago

Thank you Marco ,
in fact I want to know how you import a new database into Mathematica ,exactly like the Mathematica Team have done in the example you've showed me Marco ,
the MNIST Database .

ExampleData[{"MachineLearning", "MNIST"}]

You can discover the source of this Database forexample by:

In[2]:= ExampleData[{"MachineLearning", "MNIST"}, "Source"]

Out[2]= "   'The MNIST Database of handwritten digits', Y. LeCun, C. \
Cortes and C.J.C. Burges, http://yann.lecun.com/exdb/mnist/"

You can visit the given link:
http://yann.lecun.com/exdb/mnist/

Four files are available on this site:

train-images-idx3-ubyte.gz: training set images (9912422 bytes) train-labels-idx1-ubyte.gz: training set labels (28881 bytes) t10k-images-idx3-ubyte.gz: test set images (1648877 bytes) t10k-labels-idx1-ubyte.gz: test set labels (4542 bytes)

Marco, I want to know how to import the data contained in these set of files into Mathematica in an efficient short way.
Once I know the right method to do that, I will try to import other Databases - especially if they contain huge sets of data - into Mathematica for Machine Learning applications offered by Mathematica.
Again, In order to achieve this I want to know how Mathematica Team have managed to do this work of importing these kind of huge Databases shown in ExampleData.

I don't know if the idea is clear.
thanks again.

POSTED BY: Megri Youcef

Hi there,

I think this might help. If you want to know what you can extract from that data set you might want to use:

ExampleData[{"MachineLearning", "MNIST"}, "Properties"]

The output is:

enter image description here

You could extract the training data bit for example:

data = ExampleData[{"MachineLearning", "MNIST"}, "TestData"];

There are 10000 examples in that data set (there are 60000 in the full data set). If you want to display one element you can use:

data[[1]]

The result is:

enter image description here

If you only want to handwritten number you can use:

data[[1,1]]

I hope that this is what you want.

Cheers,

Marco

POSTED BY: Marco Thiel
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract