Message Boards Message Boards

0 Replies
0 Total Likes
View groups...
Share this post:

Filtering data within an array (Sorting data within an array Part II)

Posted 10 years ago


After filtering my data upon the ratings (my log file is like {{user1, item1, rate},{user2, item2, rate},...} and is imported from Movie Lens dataset), I would like, now, to filter logs by keeping only users (and logs from) who have performed at least 10 rates (because I will perform precision and recall and need enough reference).

I can select this list of such user: In[196]:= TableUserIDAuMoins10 = Select[Counts[CorpusReferenceu1test[[All, 1]]], # > 9 &] Out[196]= <|1 -> 79, 2 -> 12, 5 -> 25, 6 -> 64, 7 -> 135, 8 -> 22, ...|>

In[197]:= Keys[TableUserIDAuMoins10] Out[197]= {1, 2, 5, 6, 7, 8, 10, 11, 12, 13, ...}

But now, I don't know how I can write something like "select logs form CorpusReferenceu1test where userid is within Key[TAbleUSerIDAumoins10]... I haven't found keyword like "in" or "within" and my last attempt failed:

In[14]:= Select[CorpusReferenceu1test, #[[1]] == Keys[TableUserIDAuMoins10] &;]

Out[14]= {}

If you had any idea ?



Finally, I found a solution in this way

CorpusReferenceu1testFiltered = Select[CorpusReferenceu1test, Length[Intersection[{#[[1]]}, MaListe]] == 1 &]

The part "Length[Intersection[{#[[1]]}, MaListe]] == 1 " stands for "user in MaListe" and MaListe = Keys[TableUserIDAuMoins10] (see above). It is maybe complicated just for filtering a list (based on triplet (userid, itemid, rate)) upon a list of user (users who has at least ten rates)


Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract