You should be able to get useful results from both methods by including more examples of each species or cultivar, especially varying the backgrounds.
You should experiment with different image feature extractors, different distance functions, and different cluster dissimilarity functions to find the combination that gives you results that you find useful. You might also try training a classifier with Classify (again experimenting with these same options), and then test it on images of the same species that were not in the training images.
Are you familiar with Fisher's iris data set?
In[4]:= ExampleData[{"MachineLearning", "FisherIris"}, "Properties"]
Out[4]= {"Data", "Description", "Data", "Dimensions", "LearningTask", \
"LongDescription", "MissingData", "Name", "Source", "TestData", \
"TrainingData", "VariableDescriptions", "VariableTypes"}
While the data are not images, they are often used to demonstrate the characteristics of different clustering methods.
Please post your code, especially for the second example (it looks like you tried using one of the examples from the Applications section of ImageDistance). You cloud use ImageDistance to form a distance matrix from your collection of images, and then use that as input to ClusteringTree.
What is "EMD"? I couldn't find it anywhere of the ImageDistance reference page.