Message Boards Message Boards


Select elements of a list based on a condition and order position?

Posted 9 months ago
2 Replies
1 Total Likes

I encounter a common problem in data science that is difficult to solve in some languages and easier in others, and I'm wondering how Wolfram language can solve it.

In ordered lists, there is an implicit "before" and "after" relationship. Often I need to select or operate on this element after a conditional. For instance, in the english word list, return the letters (and perhaps make a histogram) that follow the letter "i", to prove or disprove "i before e, except after c".

Another example, given a list of dates with an associated measure for each, calculate the difference of the measures between any two dates. The most needed calculation is "now" from "last".

Trigger events are another variation of this problem, i.e. capture all elements that meet "this" criteria, after "that" condition has been set. I've used this extensively in digital signal processing and other pattern recognition programs.


2 Replies

The first type of problem can be handled by Select with a string pattern. The second could be done with Outer or Table, depending on how important it is to avoid comparing with previous elements. The third might be best done programmatically using Table though in some cases careful use of Select should work (I am not sure offhand that Select is actually guaranteed to work sequentially, but it does).

Much easier to respond with actual code if you provide concrete examples (input code, that is) and concrete criteria.

Posted 9 months ago

The examples were broad and by extension, vague. SQL and languages like it store tables and lists in an unordered format. SAS and data science languages don't. This affords functions such as "first", "last","lead", and "lag" relative to the sorted order of the key values.

The second example is an actual work product from this week. Reports are posted, typically monthly, listing a cumulative value from a launch date (that itself is stored in a different table, but easily joined). The user wants the measure as a difference between the two report dates, regardless of the count of intervening days. For instance, June 1st, 1200; July 1st, 1400 should yield 200. In SQL this is difficult, requiring a self-join to get both values into the tuple/row. In SAS it is trivial. Sort by date, difference=value-lag(value);

This has been a common pattern I've needed to work with over the years, often with deep nesting of the trigger events and overlapping selects that need to be filtered. The data mining aspect means I do not know the population beforehand. A common request: find all patients who have condition A, and exhibited symptoms B and C for a window of three weeks after taking medication D, while invalidating the selection if they took more, different medication (that could cause other effects) or reported another symptom in conjunction with the original report of B and C. I apologize for continuing to be vague. The data can be complex and not easy to describe briefly.

Back to language processing. Again from a data mining perspective, once I find an arbitrary key word, what function is used to find the previous word (or null, if it is the first word), and again, the next word in the sentence (null if the last word)?

In my mind, these are all the same problem.


Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract