Message Boards Message Boards

Methodology for semantic input of dirty data


Attached files implement importation of typical (but fictional) commercial data using SemanticImport[], to include the use of very basic data cleansing methodology to improve fidelity of Dataset[] contents. Where possible, information is presented as Entity[]. Problems of this sort are often encountered in commercial data processing. Store files in the same folder and execute semanticImportTest.nb.

Justification: Applied mathematics is often inapplicable absent large data volume. Much commercial data is either in Excel files or exported by databases into .csv format. Further, most commercial data contains many data of illegible format, such as a date formatted as "NA" or a simple blank. If such files are not read, applied mathematics cannot be applied to the files' data.

POSTED BY: Bill Lewis
10 months ago

Group Abstract Group Abstract