Message Boards

WOLFRAM COMMUNITY

1995 Views

4 Replies

0 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Normalize an external dataset

Jonathan Ansell

Posted 1 year ago

I am trying to take the attached .txt data set (sample below) re: tropical storms from the https://www.nhc.noaa.gov/data/hurdat/hurdat2-1851-2022-040723.txt and normalize it to work within Wolfram Language. Example of original: AL011852, UNNAMED, 45, (* storm name, summary ) ( then followed by n lines of storm details re: storm track, wind speed, etc. during discrete times ) 18520819, 0000, , TS, 20.5N, 67.1W, 60, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999 18520819, 0600, , TS, 20.7N, 68.0W, 60, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999 18520819, 1200, , TS, 20.9N, 68.8W, 60, -9 I am a WL newbie and I'm sure there must be an easy way to do this but I just can't figure it out. Can someone help me with the code to do this? Thank you! Attachments: hurdat2-1851-202...txt

I am trying to take the attached .txt data set (sample below) re: tropical storms from the https://www.nhc.noaa.gov/data/hurdat/hurdat2-1851-2022-040723.txt and normalize it to work within Wolfram Language. Example of original:

AL011852, UNNAMED, 45, (* storm name, summary *)

(* then followed by n lines of storm details re: storm track, wind speed, etc. during discrete times *)

18520819, 0000, , TS, 20.5N, 67.1W, 60, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999

18520819, 0600, , TS, 20.7N, 68.0W, 60, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999 18520819, 1200, , TS, 20.9N, 68.8W, 60, -9*

I am a WL newbie and I'm sure there must be an easy way to do this but I just can't figure it out. Can someone help me with the code to do this? Thank you!

POSTED BY: Jonathan Ansell

4 Replies

Sort By:

Jonathan Ansell

Posted 1 year ago

Thanks, Henrik. Actually, what I am looking for is a way to delete the first line of every group of storms (eg. the shorter list that starts with the AL...) and append it to each longer list below it that starts with 8 integers. I don't need to transform or define the qualities of the elements. Example below . Thanks! Original: {AL011852, UNNAMED, 45} (* storm name, summary ) {18520819, 0000, , TS, 20.5N, 67.1W, 60, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999} {18520819, 0600, , TS, 20.7N, 67.1W, 60, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999} etc. Change to: {AL011852, UNNAMED, 45,18520819, 0000, , TS, 20.5N, 67.1W, 60, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999} {AL011852, UNNAMED, 45,18520819, 0600, , TS, 20.5N, 67.1W, 60, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999} etc. I hope that's clearer. Thanks again. Jon Attachments:* hurdat2-1851-202...txt

POSTED BY: Jonathan Ansell

Henrik Schachner

Henrik Schachner, Radiation Therapy Center, Weilheim, Germany

Posted 1 year ago

OK, I see. Here is one way to do it. I first split the whole list into appropriate pieces using `SequenceCases[]`; because the list has about 56000 elements, this step takes quite some time. After that things are joined together in the wanted way. The code should explain itself: data0 = Import["hurdat2-1851-2022-040723.txt", "Data"]; singleEntities = SequenceCases[data0, {{_String, _String, __}, {_Integer, __} ..}]; hJoin[hv_] := Module[{header, vals}, header = First[hv]; vals = Rest[hv]; Join[header, #] & /@ vals]; data = Flatten[hJoin /@ singleEntities, 1]; Regards -- Henrik

POSTED BY: Henrik Schachner

Henrik Schachner

Henrik Schachner, Radiation Therapy Center, Weilheim, Germany

Posted 1 year ago

Here is a very much faster version: data0 = Import["hurdat2-1851-2022-040723.txt", "Data"]; indx = {#1, #2 - 1} & @@@ Partition[Flatten@Join[Position[data0, {_String, _String, __}], {Length[data0] + 1}], 2, 1]; singleEntities = Take[data0, #] & /@ indx; hJoin[hv_] := Module[{header, vals}, header = First[hv]; vals = Rest[hv]; Join[header, #] & /@ vals]; data = Flatten[hJoin /@ singleEntities, 1];

Here is a very much faster version:

data0 = Import["hurdat2-1851-2022-040723.txt", "Data"];
indx = {#1, #2 - 1} & @@@ 
   Partition[Flatten@Join[Position[data0, {_String, _String, __}], {Length[data0] + 1}], 2, 1];
singleEntities = Take[data0, #] & /@ indx;

hJoin[hv_] := Module[{header, vals}, header = First[hv];
   vals = Rest[hv];
   Join[header, #] & /@ vals];

data = Flatten[hJoin /@ singleEntities, 1];

POSTED BY: Henrik Schachner

Henrik Schachner

Henrik Schachner, Radiation Therapy Center, Weilheim, Germany

Posted 1 year ago

Jonathan, ... and normalize it to work within Wolfram Language. I can only guess that you mean by "it" the respective last integers of your data. Maybe this is doing what you have in mind (I am using a simple replacement rule here, because your data do not come in a regular form): data0 = Import["hurdat2-1851-2022-040723.txt", "Data"]; data = data0 /. {i1_Integer, i2_Integer, s1_String, s2_String, s3_String, s4_String, i3_Integer, ivals__Integer} :> {i1, i2, s1, s2, s3, s4, i3, Splice@Normalize[{ivals}]}; Does that help? Regards -- Henrik

POSTED BY: Henrik Schachner

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Group Abstract

Feedback