Group Abstract

Message Boards

WOLFRAM COMMUNITY

2.8K Views

1 Reply

3 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Data Science Wolfram Language Machine Learning

Divide into groups to maximize a metric

Kevin Mosby

Posted 3 years ago

I have a test data set comprising a list of people and their preferred subjects. {{"john" -> "physics"}, {"john" -> "chemistry"}, {"jane" -> "physics"}, {"jane" -> "biology"}, {"peter" -> "biology"}, {"peter" -> "chemistry"}, {"peter" -> "mathematics"}, {"david" -> "mathematics"}, {"paul" -> "chemistry"}, {"liz" -> "chemistry"}, {"liz" -> "mathematics"}, {"liz" -> "physics"}} I would like to process the data such that I can produce a limited number of groups, which contain participants with the 'best' collection of overlapping interests. There seem to b number of ways that one might do this, and I have been experimenting with Graph functions, as well as Cluster functions. However, none of them produce quite what I need. For instance FindClusters[testSet] produces {{"physics", "chemistry"}, {"physics", "biology", "mathematics", "chemistry", "chemistry", "mathematics", "physics"}, {"biology", "chemistry", "mathematics"}} I need it to a) show the people rather than the subjects, and b) not repeat them, i.e. John can only be in one group. I am more than happy to reorganize the data if that would make it easier, i.e. have John ->{physics, chemistry}. I am just a bit stuck on how to tackle this problem.

POSTED BY: Kevin Mosby

1 Reply

Sort By:

Eric Rimbey

Posted 3 years ago

As it stands, this question is pretty vague. You should at least try to define your own shared-interest function. But since you alluded to Graph functions, we could use the built-in FindGraphPartition function. Applying it to a Graph weighted by a shared-interest function might give you what you want. Let's start by putting together the basic data: prefData = Sort@Flatten@{{"john" -> "physics"}, {"john" -> "chemistry"}, {"jane" -> "physics"}, {"jane" -> "biology"}, {"peter" -> "biology"}, {"peter" -> "chemistry"}, {"peter" -> "mathematics"}, {"david" -> "mathematics"}, {"paul" -> "chemistry"}, {"liz" -> "chemistry"}, {"liz" -> "mathematics"}, {"liz" -> "physics"}}; prefsByPerson = Merge[prefData, Identity]; Now let's define an interest function to use to weight the Graph edges (I just used the size of the overlap): InterestOverlapWeight[map_][a_, b_] := Length[Intersection @@ Lookup[map, {a, b}, {}]] Testing this out: InterestOverlapWeight[prefsByPerson]["david", "liz"] (* gives 1 *) Let's create pairs of persons (to eventually become edges): personPairs = DeleteCases[Tuples[Keys@prefsByPerson, 2], {name_, name_}] From this, let's create the edges and weights: rawEdges = UndirectedEdge @@@ personPairs; rawWeights = InterestOverlapWeight[prefsByPerson] @@@ personPairs; sharedInterestEdges = Pick[rawEdges, rawWeights, _?(GreaterThan[0])]; sharedInterestWeights = DeleteCases[rawWeights, 0]; And now we can create a Graph: graphWeightedBySharedInterests = Graph[sharedInterestEdges, EdgeWeight -> sharedInterestWeights] With this, we can find groups that should give higher interest weights: FindGraphPartition[graphWeightedBySharedInterests] Many tweaks could be made to this basic strategy to get something that you're hopefully satisfied with.

As it stands, this question is pretty vague. You should at least try to define your own shared-interest function. But since you alluded to Graph functions, we could use the built-in FindGraphPartition function. Applying it to a Graph weighted by a shared-interest function might give you what you want. Let's start by putting together the basic data:

prefData = 
 Sort@Flatten@{{"john" -> "physics"}, {"john" -> 
      "chemistry"}, {"jane" -> "physics"}, {"jane" -> 
      "biology"}, {"peter" -> "biology"}, {"peter" -> 
      "chemistry"}, {"peter" -> "mathematics"}, {"david" -> 
      "mathematics"}, {"paul" -> "chemistry"}, {"liz" -> 
      "chemistry"}, {"liz" -> "mathematics"}, {"liz" -> "physics"}};
prefsByPerson = Merge[prefData, Identity];

Now let's define an interest function to use to weight the Graph edges (I just used the size of the overlap):

InterestOverlapWeight[map_][a_, b_] := 
 Length[Intersection @@ Lookup[map, {a, b}, {}]]

Testing this out:

InterestOverlapWeight[prefsByPerson]["david", "liz"]
(* gives 1 *)

Let's create pairs of persons (to eventually become edges):

personPairs = 
 DeleteCases[Tuples[Keys@prefsByPerson, 2], {name_, name_}]

From this, let's create the edges and weights:

rawEdges = UndirectedEdge @@@ personPairs;
rawWeights = InterestOverlapWeight[prefsByPerson] @@@ personPairs;
sharedInterestEdges = Pick[rawEdges, rawWeights, _?(GreaterThan[0])];
sharedInterestWeights = DeleteCases[rawWeights, 0];

And now we can create a Graph:

graphWeightedBySharedInterests = 
 Graph[sharedInterestEdges, EdgeWeight -> sharedInterestWeights]

With this, we can find groups that should give higher interest weights:

FindGraphPartition[graphWeightedBySharedInterests]

Many tweaks could be made to this basic strategy to get something that you're hopefully satisfied with.

POSTED BY: Eric Rimbey

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback