This is a follow-up on the discussion about the future of the Graph
functionality in Mathematica.
@Charles Pooh was kind enough to respond in that thread and indicate that new developments are in the pipeline. However, the elephant in the room was not addressed: namely, will practical user needs be finally taken into account?
Charles showed statistics that most bugs are reported internally, and every version has multiple bugfixes. The problem is that very serious bugs that surfaced when users tried to get actual work done have not been addressed through several versions. Problems with usability and workflow have also not been addressed. Bugs that hinder everyday work don't seem to be prioritized.
A fundamental necessity for doing any non-trivial network analysis is to be able to associate attributes with edges and vertices of graphs, and then modify the graph (e.g. take a subgraph) while preserving attributes. This is near-unusable with Mathematica, and I am not exaggerating at all. If anyone believes otherwise, I challenge you to prove me wrong.
Let's create a random graph of moderate size and associate attributes with its vertices and edges. To make the task not too challenging, I will use the built-in EdgeWeight
property, which is easier to use than custom properties.
rg = RandomGraph[{10000, 50000}, EdgeWeight -> RandomReal[1, 50000]];
Let's add a custom vertex property:
rg = SetProperty[rg, Properties -> Thread[VertexList[rg] -> List /@ Thread["foo" -> RandomInteger[100, VertexCount[rg]]]]];
This looks pretty complicated. Why does it have to be so? With igraph's R interface, one would simply do V(rg)$foo <- runif(vcount(g))
. Forget that some functions names are shorter, as that's mostly irrelevant. Look at how much simpler and easier the syntax is!
Now let's add some vertex coordinates. Since VertexCoordinates
is a built-in property, this is a bit simpler, but still not as simple as it should be.
In[80]:= rg = SetProperty[rg, VertexCoordinates -> Thread[VertexList[rg] -> RandomReal[1, {VertexCount[rg], 2}]]]; // AbsoluteTiming
Out[80]= {2.49243, Null}
Also, it takes 2.5 seconds, which is unacceptably long considering that we are only processing 10000 pairs of machine reals!
Now try something trivial: take a subgraph. This is probably the single most common operation one does on such datasets. Well, Subgraph
doesn't support properties at all, so what do we do? Luckily we have VertexDelete
, which should support them, but has been slow and buggy for as long as it has existed.
sg = VertexDelete[rg, RandomSample[VertexList[rg], 5000]]; // AbsoluteTiming
This took 6 seconds (!!!) on my computer. In igraph/R it takes no perceptible time, as it should. Imagine e.g. a very typical task where on would try to do some statistics on a large set of subgraphs. This is basically impossible in Mathematica (at least if we do use Graph
).
Furthermore, it turns out that sg
is now corrupted and can't be used anymore. To make things more confusing, it is not immediately apparent that this is so, and that is was VertexDelete
that broke it. GraphQ[sg]
is still true. But now try
In[82]:= {Length@PropertyValue[sg, EdgeWeight], EdgeCount[sg]}
Out[82]= {50000, 12429}
The edge weight vector is longer than the number of edges, so functions like. EdgeBetweennessCentrality
will simply refuse to operate on the graph (without reporting any meaningful error BTW).
At least I was able to check the length of the edge weight vector. With custom properties this isn't even possible (but they're also mishandled by VertexDelete
). VertexDelete
also frequently messes up vertex properties (not just edge properties), and again in a way that the symptoms will appear only much later in the workflow.
In other words, we can't perform even the most trivial operations on graph with attributes.
Finally, multigraphs simply don't work with properties, but there is absolutely no mention in the documentation that this is not supported. Things simply don't work, or break, or the Graph
objects get corrupted.
So, to sum up:
It's great that Graph
will get more attention. But will actual use cases be considered? Will the most fundamental problems, the problems that currently make it unusable for such work, be fixed? Is Mathematica trying to be suitable for this type of work at all, or, to ask once again, should we just give up and go to igraph and networkx, to R and Python, like nearly everyone else does? (I certainly wouldn't like to do that as I already invested a lot in Mathematica.)
Bug(-fix) counts alone are not a good measure of progress. Effort should be concentrated where it actually matters.