Message Boards Message Boards

0
|
15856 Views
|
4 Replies
|
1 Total Likes
View groups...
Share
Share this post:

Missing values and importing Excel file

Posted 11 years ago
Hi, I'm trying to import an Excel file in which some data have missing values (represented as a period (.)). I import the file with no problem but when I try to get descriptive statistics, the procedure doesn't ignore the missing values. I've tried adding "." -> Sequence[] but this adds zeros and the descriptive statistics include these in the calculations. In searching, I've yet to find a way or an option that skips these missing values when calculating descriptive statistics.

Any suggestions would be most appreciated. Pat
POSTED BY: Pat McCarthy
4 Replies
Dear Pat,

this might work:
list = {1, 2, 3, 4, 5, ".", ".", "."}
DeleteCases[list, "."]

It should give:
{1, 2, 3, 4, 5}

Best wishes,
Marco
POSTED BY: Marco Thiel
Posted 11 years ago
Dear Marco, thanks for your reply and I now have a related problem. Below is code that successfully reads 22 columns of data from data matrix trydta. The command Mean correctly calculates the mean for data in column 7, which has both missing and non-missing data values.

colmean = Mean[Select[trydta[[All, 7]], NumberQ]] // N

But I would like to get a table of means for all 22 variables. Do you know how I need to modify the command to tell mathematica that I would like means not only for column 7 but for columns 1 - 22?

Thanks again for your help.

Pat
POSTED BY: Pat McCarthy
Dear Pat,

sure. You already answered the question yourself. You want a table of the results for all 22 columns, so try:
colmean =
Table[Mean[Select[trydta[[All, m]], NumberQ]] // N, {m, 1, 22}]

Cheers,
M.
POSTED BY: Marco Thiel
Posted 10 years ago

As an alternative, you could do something like this:

in:=sampleData = 
  SemanticImport["ExampleData/elements.xls", Automatic, 
   "NamedColumns"];
in:=Select[#, NumberQ] & /@ Values[sampleData];
in:=Mean[#] & /@ %

out:={5., Mean[{}], Mean[{}], 10.3094}

This and the previous answer may be equally convenient if your goal is to get a single result for each consecutive column. But the resulting output will be formatted as a list, rather than a table.

On the other hand, if you want to operate on specific columns in an arbitrary order, you could also do this:

in:=cols = Keys[sampleData];
in:=KeyTake[sampleData, {cols[[1]], cols[[4]]}];
in:=Select[#, NumberQ] & /@ Values[%];
in:=Mean[#] & /@ %

out:={5., 10.3094}
POSTED BY: Caitlin Ramsey
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract