Message Boards Message Boards

Calculate mean graph distance in a citation network?

GROUPS:

Hi, I'm a PhD student. For my research, I'm trying to find the shortest path length of a citation network. As I understand, I can use the MeanGraphDistance function for this purpose in mathematica. However, for a citation graph similar to the following, the mean graph distance shows as infinity. Moreover, the number of connected components in this graph show up as 10 though it is a connected graph.

g = Graph[{1 -> 2, 1 -> 3, 1 -> 4, 2 -> 5, 2 -> 6, 2 -> 3, 3 -> 7, 
    3 -> 8, 4 -> 9, 4 -> 7, 4 -> 8, 4 -> 10}];
ConnectedComponents[g]
MeanGraphDistance[g]

This is the output I get
{{7}, {8}, {3}, {5}, {6}, {2}, {9}, {10}, {4}, {1}}

\[Infinity]

I'm not able to understand why. Any help would be greatly appreciated.

POSTED BY: Praveena Chandra
Answer
1 month ago

This is a directed graph, and as such it is not connected. Not every node is reachable from any other along a directed path.

In[7]:= ConnectedGraphQ[g]
Out[7]= False

But it is weakly connected:

In[8]:= WeaklyConnectedGraphQ[g]
Out[8]= True

Which is equivalent to

In[17]:= ConnectedGraphQ@UndirectedGraph[g]    
Out[17]= True

You can compute the undirected average shortest path length:

In[20]:= MeanGraphDistance@UndirectedGraph[g]
Out[20]= 32/15

igraph has shortest path computation functionality that is useful in some cases and is not available in Mathematica. You can access it from Mathematica through my IGraph/M package.

The IGAveragePathLength function differs from MeanGraphDistance in that it computes the average of all-pair shortest paths between connected nodes only. Node-pairs which are not connected are excluded.

<< IGraphM`

In[22]:= IGAveragePathLength[g]    
Out[22]= 1.4

For this directed graph, this is equivalent to

N@Mean@DeleteCases[
   Flatten@GraphDistanceMatrix[g],
   0 | Infinity
   ]

but more efficient.

Of course, we can do the undirected one too, and it gives the same answer as N@MeanGraphDistance[g]

In[28]:= IGAveragePathLength@UndirectedGraph[g]
Out[28]= 2.13333

Other useful functions are IGDistanceCounts for counting how many paths are there of different unweighted lengths. IGDistanceHistogram is the equivalent for weighted graphs. IGDistanceMatrix is an alternative for GraphDistanceMatrix, but it is able to calculate the distance matrix only between a subset of vertices, which is more efficient and takes less memory than takes a part of the full distance matrix. In some cases these particular IG functions take less memory and are more efficient than the built-in ones, but this depends on the specific graph and use case in question.

I included these functions in IGraph/M because I needed to compute shortest path statistics for very large graphs for which Mathematica's builtins would run out of memory.

Finally, if you do directed/undirected conversions, check out IGUndirectedGraph which has some extra functionality currently unavailable in UndirectedGraph


Any feedback about IGraph/M will be appreciated. I am also looking for help with writing documentation for this package.

POSTED BY: Szabolcs Horvát
Answer
1 month ago

Thank you so much!

POSTED BY: Praveena Chandra
Answer
1 month ago

Group Abstract Group Abstract