I have a test data set comprising a list of people and their preferred subjects.
{{"john" -> "physics"}, {"john" -> "chemistry"}, {"jane" ->
"physics"}, {"jane" -> "biology"}, {"peter" ->
"biology"}, {"peter" -> "chemistry"}, {"peter" ->
"mathematics"}, {"david" -> "mathematics"}, {"paul" ->
"chemistry"}, {"liz" -> "chemistry"}, {"liz" ->
"mathematics"}, {"liz" -> "physics"}}
I would like to process the data such that I can produce a limited number of groups, which contain participants with the 'best' collection of overlapping interests.
There seem to be a number of ways that one might do this, and I have been experimenting with Graph functions, as well as Cluster functions. However, none of them produce quite what I need. For instance FindClusters[testSet] produces
{{"physics", "chemistry"}, {"physics", "biology", "mathematics",
"chemistry", "chemistry", "mathematics", "physics"}, {"biology",
"chemistry", "mathematics"}}
I need it to a) show the people rather than the subjects, and b) not repeat them, i.e. John can only be in one group.
I am more than happy to reorganize the data if that would make it easier, i.e. have John ->{physics, chemistry}. I am just a bit stuck on how to tackle this problem.
All suggestions gratefully received :-)