After mathematica 10.0 release, I found the new interesting function association, dataset, etc. I really did not have much pay attention until now, as I was always working with theoretical things.. Well, I am working hard through huge amount of astronomical data. I need to correlate some parameters for each star of a spectral type. I need to separate lot of catalogues, and each star separate its coord galactic (lat and long), paralax + errors +quality,spectral type , like a descending tree:
CATALOGUES ->.....
-> catalogue A: -> star...
-> galactic ....., -> paralax...., -> spectral type...., -> identifiers.... ( a lot of nomenclatures, they are for the same star but different catalogues)
catalogue B -> star ...
-> galactic , etc..
....
I tried this:
FIRST I PICK UP THE STAR FROM THE .TXT, here a link for a sample: The sample txt of a certain catalogue
$1 = OpenRead[
"C:\\Users\\decicco\\SkyDrive\\Documentos\\ProjetoFinal\\Simbad\\simbadART_\
Teste_dataMining.txt"];
$2 = ReadList[$1, Record, RecordSeparators -> {{"Object "}, {" ---"}}]
Close[$1];
Out[3]= {"HR 5027 ", "HR 5036 "}
$1 = OpenRead[
"C:\\Users\\decicco\\SkyDrive\\Documentos\\ProjetoFinal\\Simbad\\Teste_\
dataMining.txt"];
$2 = ReadList[$1, String];
Close[$1];
Coordenadas Galaticas
Flatten@StringCases[$2,
"Coordinates(Gal,ep=J2000,eq=2000): " ~~ (x :
NumberString ...) ~~ (y : ___ ~~ NumberString ...) ->
ToExpression@{x, y}]
Out[8]= {"307.0804", " +06.8343 ", "307.7283", " 10.4014 "}
Paralaxes erro e qualidade
Flatten@StringCases[$2,
"Parallax: " ~~ (x : ___ ~~ NumberString ...) ~~
"[" ~~ (y : ___ ~~ NumberString ...) ~~ "]" ~~ z : LetterCharacter ->
ToExpression@{x, y, z}] (*here I have to get the paralax + error and the quality of paralax that usually is A, but could be B or C, also*)
Out[41]= {}
Tipo Espectral
In[15]:= Flatten@StringCases[$2, "Spectral type: " ~~ x : ___ ~~ Except["~"] -> x]
Out[15]= {"B0.5Ia C ~ ", "B2.5Ib C ~ "} (*I dont want "~" ...*)
Identificadores de catálogos
In[44]:= Flatten@StringCases[$2,
RegularExpression["(?m)^Identifiers "] ~~ "(" ~~ DigitCharacter ~~ ") :" ~~
x__ ~~ RegularExpression["(?m)^Notes "] -> x] (* I need get the nomenclatures only , in separates comas*)
Out[44]= {}
After getting all of this parameters I want to build a data set for each catalogue I create. And these catalogues subdatasets would be inside a dataset.
Thanks!