Group Abstract

Message Boards

5.6K Views

4 Replies

2 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Biological Sciences Data Science Image Processing Wolfram Language Machine Learning

Posted 3 years ago

Hi, I have a few samples of orchids that I would like to compare in terms of color and shape. I've been experimenting with two options, using ClusteringTree and ImageDistance(with EMD). The results are as follows: For the Clustering Tree For the ImageDistance(EMD) Note: For some reason, the lonely orchid at the bottom left could not be fully displayed. Both give different outputs, and I wish to seek help trying to explain them. From my understanding, ClusteringTree uses FeatureExtractor which is enabled automatically and probably would take into account the background colours. Whereas, ImageDistance with EMD provides a similarity index (of some sort) that can compensate for the background colours. I've actually tested this by removing the background using Photoshop and the output still remains the same. However, clustering seems to give a nice tree representation, whereas I assume that the lonely orchid in the EMD results means that it is completely different to the other orchids in the group. Are my assumptions correct? I am a field biologist and this is quite new for me to use.

POSTED BY: Nik Fadzly N Rosely

4 Replies

Sort By:

Posted 3 years ago

Clustering relies on a single measure of "distance" between objects, so it's just a matter of devising a reasonable way combining several distances (one for image data, one for physical measurements, one for pollinators, ...) into one. It is subjective, and that is both an advantage and a disadvantage. It's an advantage because you can emphasize the contributions that you feel are most important for your analysis, and it's a disadvantage because there is no one right way to do it. I mentioned the Fisher iris dataset because there are 50 examples for each of the three species, and there is a fair amount of variability of the four measurements within each species. An analysis that might be useful with pollinator data is formal concept analysis (FCA, see https://link.springer.com/book/10.1007/978-3-642-59830-2). In this case the objects are the different species of orchid and the attributes are the pollinator species that attend each orchid species. You could also include habitat (soil type for terrestrial and host species for epiphytic) among the attributes, and there is even a way of scaling quantitative measures. It would take some creativity to make use of image data in FCA, but I don't think it's impossible.

POSTED BY: Robert Nachbar

Posted 3 years ago

OK, I think I have an idea of how to proceed actually. Thank you very much for recommending the FCA. And yes, it is possible to use the image data in FCA, and it does involve Mathematica. I'll work on a solution and will post it back here when it is done.

POSTED BY: Nik Fadzly N Rosely

Posted 3 years ago

You should be able to get useful results from both methods by including more examples of each species or cultivar, especially varying the backgrounds. You should experiment with different image feature extractors, different distance functions, and different cluster dissimilarity functions to find the combination that gives you results that you find useful. You might also try training a classifier with Classify (again experimenting with these same options), and then test it on images of the same species that were not in the training images. Are you familiar with Fisher's iris data set? In[4]:= ExampleData[{"MachineLearning", "FisherIris"}, "Properties"] Out[4]= {"Data", "Description", "Data", "Dimensions", "LearningTask", \ "LongDescription", "MissingData", "Name", "Source", "TestData", \ "TrainingData", "VariableDescriptions", "VariableTypes"} While the data are not images, they are often used to demonstrate the characteristics of different clustering methods. Please post your code, especially for the second example (it looks like you tried using one of the examples from the Applications section of ImageDistance). You cloud use ImageDistance to form a distance matrix from your collection of images, and then use that as input to ClusteringTree. What is "EMD"? I couldn't find it anywhere of the ImageDistance reference page.

POSTED BY: Robert Nachbar

Posted 3 years ago

Hi Robert, thanks for the input. The codes I use are as follows: SetDirectory[NotebookDirectory[]]; fns = FileNames["*.jpg"]; imgs = Import /@ fns; imgs; ClusteringTree[imgs] The EMD refers to the Earth Movers Distance or Wasserstein Index, and yes I did lift the code directly from the example page. The EMD concept is easier for me to explain in general terms. distances = Table[ImageDistance[imgs[[i]], imgs[[j]], DistanceFunction -> "EarthMoverDistance"], {i, Length[l]}, {j, i + 1, Length[imgs]}]; With[{mtemp = PadLeft[#, Length[imgs]] & /@ distances}, distmatrix = mtemp + Transpose[mtemp]]; NumberForm[distmatrix, 3] adjmatrix = 1 - Unitize[ Threshold[distmatrix, Quantile[Flatten[distances], 1/3]]]; GraphPlot[adjmatrix, VertexShapeFunction -> (Inset[l[[#2]], #, Center, .5] &), SelfLoopStyle -> None, Method -> "SpringEmbedding", ImageSize -> 500] The Iris dataset is either nominal, discrete or continuous. Can we mix images with different types of data for clustering? The main objective for me is to evaluate whether there is a pattern between terrestrial and epiphytic orchids (based on their appearance). I also have data on their pollinator insects. It would be great if I can combine all of them together for cluster analysis.

POSTED BY: Nik Fadzly N Rosely

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback