Message Boards Message Boards

1
|
393 Views
|
6 Replies
|
1 Total Likes
View groups...
Share
Share this post:

How to skip lines and columns with Import

Posted 27 days ago

I have the following xlsx dataset: enter image description here

  • I want to import only the "Age" and "State" columns into my notebook. Can I use "SkipLines" to achieve it?

What I do now is

Import["example-dataset.xlsx",
 {"Dataset",(*Import as a dataset*)
  1,(*Import only the 1st sheet*)
  All,(*Import all rows*)
  {1, 2, 4}},(*Import only the 1st, 2nd and 4th columns*)
 "HeaderLines" -> {1, 1} (*1st row and 1st column are headers*)
 ]

I tried Import["example-dataset.xlsx",{"Dataset", 1}, "HeaderLines" -> {1, 1}, "SkipLines" -> {0, {2, 4}}] but it doesn't work.

  • I prepared this spreadsheet in Google sheets. For some reason, the sheet is 1000 rows long, so there are a lot of empty rows below the last meaningful row: enter image description here

How can I tell Wolfram not to import those ~1000 empty rows into the dataset? I tried "EmptyField" -> None and "EmptyField" -> Null but it doesn't help. Maybe I should use SkipLines->17;; to skip all lines but it doesn't work either.

Attachments:
POSTED BY: Ehud Behar
6 Replies
Posted 26 days ago

Found an alternative that is more compact than my earlier suggestion:

Import["example-dataset.xlsx",{"Dataset",1,1;;16,{2,4}},"HeaderLines"->1]
POSTED BY: Hans Milton
Posted 26 days ago

Thanks very much! But do you think that SkipLines can be used to simply ignore the rest of the sheet? As I said in the question, SkipLines->17;;?

POSTED BY: Ehud Behar
Posted 25 days ago

It seems that it is not possible to use Span in the "SkipLines" option

POSTED BY: Hans Milton
Posted 25 days ago

Before trying to specify ranges to the SkipLines option, I think I first need to understand its usage.

Suppose I want to skip lines 5, 6 and 7.

Then why neither "SkipLines" -> {{5, 6, 7}} nor "SkipLines" -> {5, 6, 7} work?

By the way, see Rohit Namjoshi comment for a similar question I asked.

POSTED BY: Ehud Behar
Posted 25 days ago

The documentation for "SkipLines" refers only to ranges that start from the first line enter image description here

POSTED BY: Hans Milton
Posted 27 days ago

Try this to remove the unwanted lines:

ds=Import["example-dataset.xlsx",{"Dataset",1,All,{1,2,4}},"HeaderLines"->{1,1}][[1;;15]]

Getting rid of the first column:

Normal[ds] // Values // Dataset
POSTED BY: Hans Milton
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract