Message Boards Message Boards

Classifier for Galaxy Types

Posted 8 years ago

Introduction


After attending the Mathematica Summer Camp in 2014, I participated in the Wolfram Mentorships program in October 2015. The program allowed me to use my Mathematica programming skills to work on a project. One of the projects the program offered me to work on was classifying galaxy images to their correct galaxy type: for example, an image of a spiral galaxy correctly classified as spiral.

To build the classifier, there had to be images for each of the three galaxy types: spiral, elliptical, and irregular, and a way to correctly distinguish each image as one of the three types.

Retrieving the Images

Using the GalaxyData[] function in Mathematica, the Wolfram Data Set contained over 8000 spiral images, 1994 elliptical images, and 150 irregular images; other galaxy types such as "barred spiral" or "dwarf elliptical" would be counted as part of one of the three sets. Since there were only 150 irregular galaxy type images available, which was not enough for the classifier, I excluded this class and used only the spiral and elliptical galaxy types as classes.

I used the following code to obtain the images and add to the class, then rearranged them so that I randomized the classes:

galaxyData = GalaxyData[];
galaxyImages = GalaxyData[galaxyData, {"Name", "GalaxyType", "Image"}];
spiralImages = 
  Select[galaxyImages, 
   Head[#[[3]]] == 
      Image && (#[[2]] == "spiral" || #[[2]] == "barred spiral") &];
ellipticalImages = 
  Select[galaxyImages, 
   Head[#[[3]]] == Image && #[[2]] == "elliptical" &];
spiralRandom = RandomSample[spiralImages[[All, 3]], 1995];
ellipticalRandom = RandomSample[ellipticalImages[[All, 3]], 1994];

Additional Images

Finding more images to add to these sets was also a challenge. However, I searched online and managed to find a research paper with a link to about 230 more images for both spiral and elliptical classes. Since they were in color and did not have the same dimensions as the images in the Wolfram Data Set, I had to Grayscale and then resize them.

The online images can be found through this link: http://vfacstaff.ltu.edu/lshamir/, click "software", then "Ganalyzer", then look for "A set of galaxy images...can be downloaded here" with "here" being the downloadable link. Unzip the "GalaxyImages.zip" file, then run the following below (make sure the directory in the code below is the correct location of "GalaxyImages/").

spiralNew = 
  ImageResize[ColorConvert[Import[#], "Grayscale"], {150, 150}] & /@ 
   FileNames["*.tiff", {"~/Downloads/GalaxyImages/spiral"}];
ellipticalNew = 
  ImageResize[ColorConvert[Import[#], "Grayscale"], {150, 150}] & /@ 
   FileNames["*.tiff", {"~/Downloads/GalaxyImages/elip"}];
spiralNewRandom = RandomSample[spiralNew, 223];
ellipticalNewRandom = RandomSample[ellipticalNew, 224];

Training and Test Sets


Once I collected all these images, I combined them into two sets: one for the training set and the other for the test set. First, I used half of the spiral images from the Wolfram Data Set and half of the spiral images from the additional images and joined them for the training set; I did the same for the elliptical galaxy type. The other halves get placed for the test set.

Training

spiralTraining = 
  Join[Take[spiralRandom, 998], Take[spiralNewRandom, 111]];
ellipticalTraining = 
  Join[Take[ellipticalRandom, 997], Take[ellipticalNewRandom, 112]];

Test

spiralTest = Join[Drop[spiralRandom, 998], Drop[spiralNewRandom, 111]];
ellipticalTest = 
  Join[Drop[ellipticalRandom, 997], Drop[ellipticalNewRandom, 112]];

Galaxy Image Processing Function


To ensure a more accurate classification, an image processing technique was used on all galaxy images. This was used to potentially increase the chances of a galaxy image being classified as either spiral or elliptical, and make it more clear for the classifier to increase its accuracy.

Elaborating more on what I used to look at the function, I used the image-processed training and test sets into a temporary classifier that allows me to look visually at what the function is doing to the images and how it can reach a higher accuracy. This code was suggested to me by Todd Rowland.

How I chose the Image Processing Technique

Grid[Prepend[
    MapThread[
     Prepend, {Outer[
       Function[{actual, predicted}, 
        ImageCollage[
         Select[testset[[actual]], 
          galaxyClassifier[#] === predicted &], 
         Method -> "Rows"]], #, #], #}, 1], 
    Prepend[#, ""]] &[{"Spiral", "Elliptical"}], Frame -> All]

I eventually stuck with a combination of GeodesicOpening[][LongDash]removing white spots of an image[LongDash]and Sharpen[][LongDash]sharpens the image.

galaxyProcessing[random_] := 
 GeodesicOpening[Sharpen[#, 10], 4] & /@ random

I then mapped them onto the training and test sets, which will later be used for the classifier.

trainingset = <|"Spiral" -> galaxyProcessing[spiralTraining], 
   "Elliptical" -> galaxyProcessing[ellipticalTraining]|>;
testset = <|"Spiral" -> galaxyProcessing[spiralTest], 
   "Elliptical" -> galaxyProcessing[ellipticalTest]|>;

Classifier


The final step was to go through different classification methods and observe how the accuracy would change. The methods I used were LogisticRegression, RandomForest, NearestNeighbors, NeuralNetwork, and a default which would have the method be done automatically. Of these methods, RandomForest had consistently shown to have higher accuracies than the other methods, and LogisticRegression would have the lowest.

galaxyClassifier = 
  Classify[trainingset, PerformanceGoal -> "Quality", 
   Method -> "RandomForest"];
ClassifierMeasurements[galaxyClassifier, testset, {"Accuracy", 
  "ConfusionMatrixPlot"}]

Conclusion


The classifier function has obtained an accuracy of 0.7768 using the RandomForest method. this implies that the classifier has roughly an 80% chance of correctly labeling a galaxy image to its galaxy type. While the accuracy generally changes for each randomized sample of training and test sets, the accuracy would remain to be above 0.75 with the image processing technique and the RandomForest method.

Challenges


My biggest challenge in this project was that I had to go through numerous image processing functions that would simplify the image into a spiral or elliptical image. Going through many functions, such as ImageConvolve[], Binarize[], ImageCorners[], GeodesicClosing[], etc., was already hard enough. I had to also combine these functions and change their pixel radii or sizes.

In addition, going through the galaxy type and placing them into the respective class was easy, but the hard part was finding an image available for each galaxy type. This included the irregular galaxy type: no additional images for this type were accessible from an online reference. So, a total of only 150 images were available for this class, compared to over 2200 total images for each spiral and elliptical classes; including the irregular class would be inappropriate, though it would have been beneficial.

Final Thoughts


Although I felt that the project could have been more efficient and that it felt like a simple task, I believe that it was an essential part of improving my skills and increasing my interest in the Mathematica language.

I would like to thank Alison Kimball and Todd Rowland for their mentoring and help in this project.

POSTED BY: Aayush Dubey
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract