Group Abstract

Message Boards

WOLFRAM COMMUNITY

9.6K Views

3 Replies

9 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Data Science Wolfram Language

Data cleaning, wrangling, munging with Mathematica

David Proffer

Posted 10 years ago

I browsed through Mathematica StackExchange and the Data Science group at the Wolfram Community and was not able to find any comprehensive discussion of this important topic. Clearly all the tools are available in Mathematica and once clean data is in, exploratory data analysis can be done much better than with environments like R and Python. However this critical first step does not seem to have been addressed in comprehensive way such as in books: Data Wrangling with R by Boehmke Data Wrangling with Python by Kazil & Jarmul Having a simple guide for Mathematica operations such as this: Data Wrangling with dplyr and tidyr Cheat Sheet by R Studio seems would be a good start. I have not seen any news or motion from Wolfram on their Data Science Platform. Is there any information on how it might make learning and using data cleaning procedures less 'exploratory'?

POSTED BY: David Proffer

3 Replies

Sort By:

Anton Antonov

Anton Antonov, Accendo Data LLC

Posted 10 years ago

POSTED BY: Anton Antonov

Sander Huisman

Sander Huisman, University of Twente

Posted 10 years ago

Without an explicit example it is hard to give you some insight in to data cleaning. However, there are common functions that are used to select/convert/transform data: Part (* to select parts of something based on indices ) Select ( to select something base on a True/False criterion ) Cases/FirstCase ( to 'select' something based on its structure ) ToExpression ( to convert a string to an expression ) UnitConvert/Quantity/QuantityMagnitude ( to add/remove/convert quantities ) StringSplit ( split string-data in to parts) StringTake/StringDrop ( take parts of strings ) Map/Apply ( used in conjunction with ToExpression, to convert an entire bunch of items to expression ) Delete ( delete based on indices ) DeleteCases ( delete based on a pattern-match ) DeleteDuplicates(By) ( delete duplicates) ArrayReshape/Flatten/Partition/Transpose/Reverse ( flipping/flattening/changing dimensions et cetera) Replace/ReplaceAll/StringReplace ( replace items based on replacement rules *) I think with those, you can get quite far. Of course then you can 'group' the data using: Gather/GatherBy/GroupBy Split/SplitBy Or count items: Tally/Count/Counts/CountsBy Sort items: Sort/SortBy Then you can do some statistics on it to reduce the data: Min Max Mean/TrimmedMean/Total Median StandardDeviation/Variance/RootMeanSquare Skewness Kurtosis Length

Without an explicit example it is hard to give you some insight in to data cleaning. However, there are common functions that are used to select/convert/transform data:

Part (* to select parts of something based on indices *)
Select (* to select something base on a True/False criterion *)
Cases/FirstCase (* to 'select' something based on its structure *)
ToExpression (* to convert a string to an expression *)
UnitConvert/Quantity/QuantityMagnitude (* to add/remove/convert quantities *)
StringSplit (* split string-data in to parts*)
StringTake/StringDrop (* take parts of strings *)
Map/Apply (* used in conjunction with ToExpression, to convert an entire bunch of items to expression *)
Delete (* delete based on indices *)
DeleteCases (* delete based on a pattern-match *)
DeleteDuplicates(By) (* delete duplicates*)
ArrayReshape/Flatten/Partition/Transpose/Reverse (* flipping/flattening/changing dimensions et cetera*)
Replace/ReplaceAll/StringReplace (* replace items based on replacement rules *)

I think with those, you can get quite far. Of course then you can 'group' the data using:

Gather/GatherBy/GroupBy
Split/SplitBy

Or count items:

Tally/Count/Counts/CountsBy

Sort items:

Sort/SortBy

Then you can do some statistics on it to reduce the data:

Min
Max
Mean/TrimmedMean/Total
Median
StandardDeviation/Variance/RootMeanSquare
Skewness
Kurtosis
Length

POSTED BY: Sander Huisman

Anton Antonov

Anton Antonov, Accendo Data LLC

Posted 10 years ago

POSTED BY: Anton Antonov

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback