Group Abstract

Message Boards

WOLFRAM COMMUNITY

4.9K Views

2 Replies

1 Total Like

View groups...

Follow this post

Share this post:

GROUPS:

Data Science Wolfram Language

Select elements of a list based on a condition and order position?

Andy Hollister

Posted 8 years ago

I encounter a common problem in data science that is difficult to solve in some languages and easier in others, and I'm wondering how Wolfram language can solve it. In ordered lists, there is an implicit "before" and "after" relationship. Often I need to select or operate on this element after a conditional. For instance, in the english word list, return the letters (and perhaps make a histogram) that follow the letter "i", to prove or disprove "i before e, except after c". Another example, given a list of dates with an associated measure for each, calculate the difference of the measures between any two dates. The most needed calculation is "now" from "last". Trigger events are another variation of this problem, i.e. capture all elements that meet "this" criteria, after "that" condition has been set. I've used this extensively in digital signal processing and other pattern recognition programs. Thanks.

POSTED BY: Andy Hollister

2 Replies

Sort By:

Andy Hollister

Posted 8 years ago

The examples were broad and by extension, vague. SQL and languages like it store tables and lists in an unordered format. SAS and data science languages don't. This affords functions such as "first", "last","lead", and "lag" relative to the sorted order of the key values. The second example is an actual work product from this week. Reports are posted, typically monthly, listing a cumulative value from a launch date (that itself is stored in a different table, but easily joined). The user wants the measure as a difference between the two report dates, regardless of the count of intervening days. For instance, June 1st, 1200; July 1st, 1400 should yield 200. In SQL this is difficult, requiring a self-join to get both values into the tuple/row. In SAS it is trivial. Sort by date, difference=value-lag(value); This has been a common pattern I've needed to work with over the years, often with deep nesting of the trigger events and overlapping selects that need to be filtered. The data mining aspect means I do not know the population beforehand. A common request: find all patients who have condition A, and exhibited symptoms B and C for a window of three weeks after taking medication D, while invalidating the selection if they took more, different medication (that could cause other effects) or reported another symptom in conjunction with the original report of B and C. I apologize for continuing to be vague. The data can be complex and not easy to describe briefly. Back to language processing. Again from a data mining perspective, once I find an arbitrary key word, what function is used to find the previous word (or null, if it is the first word), and again, the next word in the sentence (null if the last word)? In my mind, these are all the same problem. Thanks!

POSTED BY: Andy Hollister

Daniel Lichtblau

Daniel Lichtblau, Wolfram Research

Posted 8 years ago

The first type of problem can be handled by `Select` with a string pattern. The second could be done with `Outer` or `Table`, depending on how important it is to avoid comparing with previous elements. The third might be best done programmatically using `Table` though in some cases careful use of `Select` should work (I am not sure offhand that `Select` is actually guaranteed to work sequentially, but it does). Much easier to respond with actual code if you provide concrete examples (input code, that is) and concrete criteria.

POSTED BY: Daniel Lichtblau

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback