Group Abstract Group Abstract

Message Boards Message Boards

1
|
2.7K Views
|
1 Reply
|
3 Total Likes
View groups...
Share
Share this post:

Divide into groups to maximize a metric

Posted 3 years ago

I have a test data set comprising a list of people and their preferred subjects.

{{"john" -> "physics"}, {"john" -> "chemistry"}, {"jane" -> 
   "physics"}, {"jane" -> "biology"}, {"peter" -> 
   "biology"}, {"peter" -> "chemistry"}, {"peter" -> 
   "mathematics"}, {"david" -> "mathematics"}, {"paul" -> 
   "chemistry"}, {"liz" -> "chemistry"}, {"liz" -> 
   "mathematics"}, {"liz" -> "physics"}}

I would like to process the data such that I can produce a limited number of groups, which contain participants with the 'best' collection of overlapping interests.

There seem to b number of ways that one might do this, and I have been experimenting with Graph functions, as well as Cluster functions. However, none of them produce quite what I need. For instance FindClusters[testSet] produces

{{"physics", "chemistry"}, {"physics", "biology", "mathematics", 
  "chemistry", "chemistry", "mathematics", "physics"}, {"biology", 
  "chemistry", "mathematics"}}

I need it to a) show the people rather than the subjects, and b) not repeat them, i.e. John can only be in one group.

I am more than happy to reorganize the data if that would make it easier, i.e. have John ->{physics, chemistry}. I am just a bit stuck on how to tackle this problem.

POSTED BY: Kevin Mosby
Posted 3 years ago

As it stands, this question is pretty vague. You should at least try to define your own shared-interest function. But since you alluded to Graph functions, we could use the built-in FindGraphPartition function. Applying it to a Graph weighted by a shared-interest function might give you what you want. Let's start by putting together the basic data:

prefData = 
 Sort@Flatten@{{"john" -> "physics"}, {"john" -> 
      "chemistry"}, {"jane" -> "physics"}, {"jane" -> 
      "biology"}, {"peter" -> "biology"}, {"peter" -> 
      "chemistry"}, {"peter" -> "mathematics"}, {"david" -> 
      "mathematics"}, {"paul" -> "chemistry"}, {"liz" -> 
      "chemistry"}, {"liz" -> "mathematics"}, {"liz" -> "physics"}};
prefsByPerson = Merge[prefData, Identity];

Now let's define an interest function to use to weight the Graph edges (I just used the size of the overlap):

InterestOverlapWeight[map_][a_, b_] := 
 Length[Intersection @@ Lookup[map, {a, b}, {}]]

Testing this out:

InterestOverlapWeight[prefsByPerson]["david", "liz"]
(* gives 1 *)

Let's create pairs of persons (to eventually become edges):

personPairs = 
 DeleteCases[Tuples[Keys@prefsByPerson, 2], {name_, name_}]

From this, let's create the edges and weights:

rawEdges = UndirectedEdge @@@ personPairs;
rawWeights = InterestOverlapWeight[prefsByPerson] @@@ personPairs;
sharedInterestEdges = Pick[rawEdges, rawWeights, _?(GreaterThan[0])];
sharedInterestWeights = DeleteCases[rawWeights, 0];

And now we can create a Graph:

graphWeightedBySharedInterests = 
 Graph[sharedInterestEdges, EdgeWeight -> sharedInterestWeights]

With this, we can find groups that should give higher interest weights:

FindGraphPartition[graphWeightedBySharedInterests]

Many tweaks could be made to this basic strategy to get something that you're hopefully satisfied with.

POSTED BY: Eric Rimbey
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard