# Creating dataset from a file.txt

Posted 8 years ago
6028 Views
|
|
0 Total Likes
|
 After mathematica 10.0 release, I found the new interesting function association, dataset, etc. I really did not have much pay attention until now, as I was always working with theoretical things.. Well, I am working hard through huge amount of astronomical data. I need to correlate some parameters for each star of a spectral type. I need to separate lot of catalogues, and each star separate its coord galactic (lat and long), paralax + errors +quality,spectral type , like a descending tree:CATALOGUES ->.....-> catalogue A: -> star... -> galactic ....., -> paralax...., -> spectral type...., -> identifiers.... ( a lot of nomenclatures, they are for the same star but different catalogues)catalogue B -> star ... -> galactic , etc......I tried this:FIRST I PICK UP THE STAR FROM THE .TXT, here a link for a sample: The sample txt of a certain catalogue  $1 = OpenRead[ "C:\\Users\\decicco\\SkyDrive\\Documentos\\ProjetoFinal\\Simbad\\simbadART_\ Teste_dataMining.txt"];$2 = ReadList[$1, Record, RecordSeparators -> {{"Object "}, {" ---"}}] Close[$1]; Out[3]= {"HR 5027 ", "HR 5036 "} $1 = OpenRead[ "C:\\Users\\decicco\\SkyDrive\\Documentos\\ProjetoFinal\\Simbad\\Teste_\ dataMining.txt"];$2 = ReadList[$1, String]; Close[$1]; Coordenadas Galaticas Flatten@StringCases[$2, "Coordinates(Gal,ep=J2000,eq=2000): " ~~ (x : NumberString ...) ~~ (y : ___ ~~ NumberString ...) -> ToExpression@{x, y}] Out[8]= {"307.0804", " +06.8343 ", "307.7283", " 10.4014 "} Paralaxes erro e qualidade Flatten@StringCases[$2, "Parallax: " ~~ (x : ___ ~~ NumberString ...) ~~ "[" ~~ (y : ___ ~~ NumberString ...) ~~ "]" ~~ z : LetterCharacter -> ToExpression@{x, y, z}] (*here I have to get the paralax + error and the quality of paralax that usually is A, but could be B or C, also*) Out[41]= {} Tipo Espectral In[15]:= Flatten@StringCases[$2, "Spectral type: " ~~ x : ___ ~~ Except["~"] -> x] Out[15]= {"B0.5Ia C ~ ", "B2.5Ib C ~ "} (*I dont want "~" ...*) Identificadores de catálogos In[44]:= Flatten@StringCases[$2, RegularExpression["(?m)^Identifiers "] ~~ "(" ~~ DigitCharacter ~~ ") :" ~~ x__ ~~ RegularExpression["(?m)^Notes "] -> x] (* I need get the nomenclatures only , in separates comas*) Out[44]= {} After getting all of this parameters I want to build a data set for each catalogue I create. And these catalogues subdatasets would be inside a dataset.Thanks!