Message Boards Message Boards

Methodology for semantic input of dirty data

Attached files implement importation of typical (but fictional) commercial data using SemanticImport[], to include the use of very basic data cleansing methodology to improve fidelity of Dataset[] contents. Where possible, information is presented as Entity[]. Problems of this sort are often encountered in commercial data processing. Store files in the same folder and execute semanticImportTest.nb.

Justification: Applied mathematics is often inapplicable absent large data volume. Much commercial data is either in Excel files or exported by databases into .csv format. Further, most commercial data contains many data of illegible format, such as a date formatted as "NA" or a simple blank. If such files are not read, applied mathematics cannot be applied to the files' data.

Attachments:
POSTED BY: Bill Lewis
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract