Hi,
After filtering my data upon the ratings (my log file is like {{user1, item1, rate},{user2, item2, rate},...} and is imported from Movie Lens dataset), I would like, now, to filter logs by keeping only users (and logs from) who have performed at least 10 rates (because I will perform precision and recall and need enough reference).
I can select this list of such user: In[196]:= TableUserIDAuMoins10 = Select[Counts[CorpusReferenceu1test[[All, 1]]], # > 9 &] Out[196]= <|1 -> 79, 2 -> 12, 5 -> 25, 6 -> 64, 7 -> 135, 8 -> 22, ...|>
In[197]:= Keys[TableUserIDAuMoins10] Out[197]= {1, 2, 5, 6, 7, 8, 10, 11, 12, 13, ...}
But now, I don't know how I can write something like "select logs form CorpusReferenceu1test where userid is within Key[TAbleUSerIDAumoins10]... I haven't found keyword like "in" or "within" and my last attempt failed:
In[14]:= Select[CorpusReferenceu1test, #[[1]] == Keys[TableUserIDAuMoins10] &;]
Out[14]= {}
If you had any idea ?
14/11/2014
Hi,
Finally, I found a solution in this way
CorpusReferenceu1testFiltered = Select[CorpusReferenceu1test, Length[Intersection[{#[[1]]}, MaListe]] == 1 &]
The part "Length[Intersection[{#[[1]]}, MaListe]] == 1 " stands for "user in MaListe" and MaListe = Keys[TableUserIDAuMoins10] (see above). It is maybe complicated just for filtering a list (based on triplet (userid, itemid, rate)) upon a list of user (users who has at least ten rates)
BR.