Message Boards Message Boards

0
|
1995 Views
|
4 Replies
|
0 Total Likes
View groups...
Share
Share this post:

Normalize an external dataset

Posted 1 year ago

I am trying to take the attached .txt data set (sample below) re: tropical storms from the https://www.nhc.noaa.gov/data/hurdat/hurdat2-1851-2022-040723.txt and normalize it to work within Wolfram Language. Example of original:

AL011852, UNNAMED, 45, (* storm name, summary *)

(* then followed by n lines of storm details re: storm track, wind speed, etc. during discrete times *)

18520819, 0000, , TS, 20.5N, 67.1W, 60, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999

18520819, 0600, , TS, 20.7N, 68.0W, 60, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999 18520819, 1200, , TS, 20.9N, 68.8W, 60, -9*

I am a WL newbie and I'm sure there must be an easy way to do this but I just can't figure it out. Can someone help me with the code to do this? Thank you!

Attachments:
POSTED BY: Jonathan Ansell
4 Replies

Thanks, Henrik. Actually, what I am looking for is a way to delete the first line of every group of storms (eg. the shorter list that starts with the AL...) and append it to each longer list below it that starts with 8 integers. I don't need to transform or define the qualities of the elements. Example below . Thanks!

Original: {AL011852, UNNAMED, 45} (* storm name, summary *) {18520819, 0000, , TS, 20.5N, 67.1W, 60, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999} {18520819, 0600, , TS, 20.7N, 67.1W, 60, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999} etc.

Change to:

{AL011852, UNNAMED, 45,18520819, 0000, , TS, 20.5N, 67.1W, 60, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999} {AL011852, UNNAMED, 45,18520819, 0600, , TS, 20.5N, 67.1W, 60, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999} etc.

I hope that's clearer. Thanks again. Jon

Attachments:
POSTED BY: Jonathan Ansell

OK, I see. Here is one way to do it. I first split the whole list into appropriate pieces using SequenceCases[]; because the list has about 56000 elements, this step takes quite some time. After that things are joined together in the wanted way. The code should explain itself:

data0 = Import["hurdat2-1851-2022-040723.txt", "Data"];
singleEntities = SequenceCases[data0, {{_String, _String, __}, {_Integer, __} ..}];

hJoin[hv_] := Module[{header, vals},
   header = First[hv];
   vals = Rest[hv];
   Join[header, #] & /@ vals];

data = Flatten[hJoin /@ singleEntities, 1];

Regards -- Henrik

POSTED BY: Henrik Schachner

Here is a very much faster version:

data0 = Import["hurdat2-1851-2022-040723.txt", "Data"];
indx = {#1, #2 - 1} & @@@ 
   Partition[Flatten@Join[Position[data0, {_String, _String, __}], {Length[data0] + 1}], 2, 1];
singleEntities = Take[data0, #] & /@ indx;

hJoin[hv_] := Module[{header, vals}, header = First[hv];
   vals = Rest[hv];
   Join[header, #] & /@ vals];

data = Flatten[hJoin /@ singleEntities, 1];
POSTED BY: Henrik Schachner

Jonathan,

... and normalize it to work within Wolfram Language.

I can only guess that you mean by "it" the respective last integers of your data. Maybe this is doing what you have in mind (I am using a simple replacement rule here, because your data do not come in a regular form):

data0 = Import["hurdat2-1851-2022-040723.txt", "Data"];
data = data0 /. {i1_Integer, i2_Integer, s1_String, s2_String, s3_String, s4_String, i3_Integer, ivals__Integer} :> {i1, i2, s1, s2, s3, s4, i3, Splice@Normalize[{ivals}]};

Does that help? Regards -- Henrik

POSTED BY: Henrik Schachner
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract