Message Boards Message Boards

0
|
5024 Views
|
5 Replies
|
1 Total Likes
View groups...
Share
Share this post:

How do you specify element "Rules" and "Options" for Import[]

Posted 4 years ago

The documentation for Import mentions the "Rules" under "Elements supported by all formats".

The documentation says that

  • "Rules" provides rules for the values of all elements
  • "Options" provides rules for options, settings, metainformation, etc

But, I cannot find anymore information how how to specify these rules and options.

I'm specifically interested in how they would apply to CSV and Excel files. But, the lack of documentation and examples, has me curious as to how they apply to all file formats.

I have spent several hours this week searching the web. But, searching posts on Wolfram and Mathematica for keywords like "rules" and "options" is like drinking from a firehose. It is definitely not recommended.

Any insights that anyone can provide will be very much appreciated.

thanks

POSTED BY: Mike Besso
5 Replies
Posted 4 years ago

Rules, Elements, and Options are documented in ref/Import and you can find all of the details I summed up in tutorial/ImportingAndExporting (including the esoteric "Options" examples). There is a ton of information in our documentation and we do our best to both have good coverage in documentation and to assist users in finding the specific info they're looking for quickly.

POSTED BY: Sean Cheren
Posted 4 years ago

Thanks Sean. This should help me understand things a great deal.

But, your answer brings up another question. Why are these details not in the documentation? It is great that Wolfram actively answers questions both here and on stackexchange. But, it would be even better if Wolfram also added those details into the main documentation.

I do not mind Googling for help, but some concepts are a bit difficult to Google since those words are used in so many places.

Thanks for the help.

POSTED BY: Mike Besso
Posted 4 years ago

Hi Mike, hopefully I can help clear up a few things. I go into detail about these "Framework" elements i.e. elements that are known by every single format in this post: https://mathematica.stackexchange.com/a/212034/38159. I will reiterate the most important points here:

  • The Import framework has a few elements that it knows how to do for every format: All, "Rules", "Options", "Elements".
  • Elements gives the list of elements that are supported for a format.
  • Rules is a list of rules with elementName -> data for every element.
  • All is a list of the data without the element name.
  • "Options" allows you to specify at registration a list of elements that would be returned if you for "Options", importing that subset of elements. This is a little confusing and not often practical. See the MSE post I linked for a small example of how this works if you are still curious.
  • There is currently no way to get a list of options that a format accepts, maybe in future versions.

As for your valid suggestions for CSV, you are correct there is not a way to give columns special handling at the time of import and these are great ideas that I've been thinking about for quite some time. Hopefully in future versions CSV can better deal with typed columns more efficiently and with some better direction. Please feel free to keep posting your questions on community as you continue to learn Wolfram Language!

POSTED BY: Sean Cheren
Posted 4 years ago

Thank you Rohit. I have seen those. And I should have said that and included links to them as well.

What I do not see is mentioned on those pages is a reference to the term "Rules" and "Options" that is mentioned on the Import page under "Elements supported by all formats".

It is possible, perhaps even likely, that the "Rules" and "Options" elements are covered, but just not called "Rules" and "Options" elsewhere. If that is the case, then it would be great if the documentation were more consistent in what it calls things.

To answer your question about what specific problem am I trying to solve. I have some large datasets that have columns that I would like to specify how they are treated. For example:

  • Some columns are identifiers that are strings of digits that are too long to be treated as a number, and should remain as a string. Of course, there are other fields that are strings of digits that should be treated as numbers.
  • Some files have columns that are dates and other columns that are date times. I would like to bring them both in appropriately. So specifying a DateStringFormat for the entire file does not work (unless I'm missing something)
  • Some text column need to be trimmed. I know I can do this after the fact, but it would be great to do it all at once, if possible
  • etc.

I know I can consider using SemanticImport, but I'd rather avoid the performance penalty.

Regardless of what my immediate need is, I am more concerned with learning the language. So, even if there is another way to meet this particular need, I still want to understand what "Rules" and "Options" the documentation is referring to here.

Thanks

POSTED BY: Mike Besso
Posted 4 years ago

Hi Mike,

For CSV the options are documented here, for Excel XLS they are documented here and for XLSX, here.

Is there a specific problem with Import that you are trying to solve?

POSTED BY: Rohit Namjoshi
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract