I uploaded the files but have not bothered to open them in EXCELL. Rather, I import them directly into Mathematica and start 'blind'. This loads all the files into a list of files.
ClearAll[rawFiles]
rawFiles[S1] = Import["c:\\" <> "your path" <> "\\*.xls"]
When loading and wrangling unfamiliar data I always use the name[Sn] form where 'S' is step so rawFiles[S1] means rawFiles[step1]
rawFiles[S1] // Dimensions
{3, 1}
The dimensions above tell me there is an extra bracket ({ }) - typical of .xls imports. I'll come back to this later.
I want to see the column headers (if any) along with the first couple rows of data.
rawFiles[S1][[1, 1, 1 ;; 3]] // Grid
rawFiles[S1][[2, 1, 1 ;; 3]] // Grid
rawFiles[S1][[3, 1, 1 ;; 3]] // Grid
output not shown
It looks like the column headers are the same for all the files but I want to make sure.
rawFiles[S1][[1, 1, 1]] == rawFiles[S1][[2, 1, 1]] ==
rawFiles[S1][[3, 1, 1]]
True
Now I know it's safe to take the column headers of the first file for all subsequent files. So I can take all rows for the first file and all rows accept the first row of the remaining files.
rawFiles[S2] =
Join[{rawFiles[S1][[1, 1, 1 ;;]], rawFiles[S1][[2, 1, 2 ;;]],
rawFiles[S1][[3, 1, 2 ;;]]}];
Now I get ride of the extra { }'s and generate a single set of column x row date that seems to be flat and nicely dimensioned.
Flatten[rawFiles[S2], 1] // Dimensions
rawFiles[S3] = Flatten[rawFiles[S2], 1];
{19729, 27}
At this point I want to create a Dataset[ ] object to leverage the capabilities and flexibility that's available. I start by taking the first row (the headers) and thread if over the remaining rows.
colHeaders = First[rawFiles[S3]]
{"Date", "Position", "Measuring", "01", "02", "03", "04", "05", "06", \
"07", "08", "09", "10", "11", "12", "13", "14", "15", "16", "17", \
"18", "19", "20", "21", "22", "23", "24"}
dataDS[S1] =
Dataset[AssociationThread[colHeaders -> #] & /@ Rest[rawFiles[S3]]]
Now we can perform rigorous data wrangling in a straightforward - rule based - way. I'm going to apply some of my 'best practices' to what I see - but I'm not going to do an exhaustive inspection and data wrangle. I'll leave that for you ;)
First, I want to change all dates to DateObject[ ]'s.
dataDS[S2] = dataDS[S1][All, Query[Thread[{"Date"} -> DateObject ] ]];
Next, I always change string entries like "NR" and symbolic entries like "Null" into Missing[ ] when the type of data is known. That is, if a column should always be a number but sometimes data is missing, then make sure all data in a column has head Number or Missing. This way, in computation you know the data is directly computable or missing. There are many downstream benefits that accrue when doing analysis and computation. Look up Missing[ ] in help. Here's how I go about cleaning up all the "NR"'s in one step. For the same reason I don't want "" (blank spaces) in columns that otherwise would be numerical. Also, (assumption based on reading this post), data of this form, '..d#' should be '..d' and be understood to be a number. I add three replacement rules.
dataDS[S3] = dataDS[S2][All, All, Replace[{
"NR" :> Missing["imported as NR"],
"" :> Missing["imported as ''"]
}]];
And, finally I deal with the n# cases.
dataDS[S4] = dataDS[S3][All, All, Replace[{
v_DateObject :> v,
v_Number :> v,
v_String :>
If[ StringTake[v, -1] == "#", ToExpression[StringDrop[v, -1]], v]
}]];
dataDS[S4] not shown
dataDS[S4][Transpose, Drop[colHeaders, 3]][All,
Select[! MissingQ[#] &]][All, (Histogram[#] &)]
histograms not shown
There's loads more data inspection that can be done given your goals, but after the inspection your ready for computation, which I will leave to you...
Attachments: