Message Boards Message Boards

Puzzled by FindClusters on a set of dates with a custom DistanceFunction

Posted 9 years ago

Here is a set of dates:

events = {{2014, 12, 14, 15, 26, 20.`}, {2014, 12, 14, 15, 38, 31.`}, {2014, 12, 14, 15, 41, 14.`}, {2014, 12, 19, 11, 55, 11.`}, {2014, 12, 19, 11, 55, 47.`}}

Now I would like to find clusters of dates by using a custom DistanceFunction based on DateDifference. Here is the code for this--with some Print statements in it to see what dates are being chosen as the DistanceFunction is applied and the result of those applications.

FindClusters[events, 
 DistanceFunction -> (With[{num = 
       Abs@QuantityMagnitude[
         DateDifference[Floor@#1, Floor@#2]]}, {Print[Floor@#1, " ", 
       Floor@#2, " ", num]}; N@num] &)]

Note that the presense of the Floors is there to make sure that the dates have, for example, integer years, months and days as it seems that FindClusters numericalizes the data before sending it to the DistanceFunction (which is itself slightly annoying and perhaps a bug in my opinion...). The result of executing this generates the following errors:

DateDifference::twoarg: Argument {2014,13,19,19,45,44} is not a time unit or a list of time units, nor can it be interpreted as a date. >>
DateDifference::date: Expression {Gregorian,{2013.,11.,17.,6.,36.,28.}} cannot be interpreted as a date specification. >>
FindClusters::xnum: A non-numeric, negative, or complex dissimilarity value was computed; dissimilarities must be non-negative and real valued. >>

And the Print statements give the following--note the very peculiar dates that are being used:

{2014,12,14,15,26,20} {2014,12,14,15,38,31} 0.00846065
{2014,12,14,15,26,20} {2014,12,14,15,41,14} 0.0103472
{2014,12,14,15,26,20} {2014,12,19,11,55,11} 4.85337
{2014,12,14,15,26,20} {2014,12,19,11,55,47} 4.85378
{2014,12,14,15,38,31} {2014,12,14,15,41,14} 0.00188657
{2014,12,14,15,38,31} {2014,12,19,11,55,11} 4.84491
{2014,12,14,15,38,31} {2014,12,19,11,55,47} 4.84532
{2014,12,14,15,41,14} {2014,12,19,11,55,11} 4.84302
{2014,12,14,15,41,14} {2014,12,19,11,55,47} 4.84344
{2014,12,19,11,55,11} {2014,12,19,11,55,47} 0.000416667
{2014,12,16,5,49,35} {2013,10,14,1,36,29} 428.176
{2014,12,16,5,49,35} {2014,12,15,3,60,24} 1.07582
{2014,12,16,5,49,35} {2013,12,19,25,40,39} 361.173
{2014,12,16,5,49,35} {2013,12,16,12,39,30} 364.715
{2013,10,14,1,36,29} {2014,12,15,3,60,24} 427.1
{2013,10,14,1,36,29} {2013,12,19,25,40,39} 67.0029
{2013,10,14,1,36,29} {2013,12,16,12,39,30} 63.4604
{2014,12,15,3,60,24} {2013,12,19,25,40,39} 360.097
{2014,12,15,3,60,24} {2013,12,16,12,39,30} 363.64
{2013,12,19,25,40,39} {2013,12,16,12,39,30} 3.54247
{2014,10,12,4,59,31} {2013,11,14,1,39,30} 332.139
{2014,10,12,4,59,31} {2014,12,17,8,49,33} 66.1597
{2014,10,12,4,59,31} {2014,10,15,14,48,47} 3.40921
{2014,10,12,4,59,31} {2014,12,18,10,52,14} 67.2449
{2013,11,14,1,39,30} {2014,12,17,8,49,33} 398.299
{2013,11,14,1,39,30} {2014,10,15,14,48,47} 335.548
{2013,11,14,1,39,30} {2014,12,18,10,52,14} 399.384
{2014,12,17,8,49,33} {2014,10,15,14,48,47} 62.7505
{2014,12,17,8,49,33} {2014,12,18,10,52,14} 1.0852
{2014,10,15,14,48,47} {2014,12,18,10,52,14} 63.8357
{2013,11,17,6,36,28} {2014,13,19,19,45,44} 
Abs[QuantityMagnitude[DateDifference[{2013,11,17,6,36,28},{2014,13,19,19,45,44}]]]

Does anyone have any insight into this behavior/failure?

POSTED BY: David Reiss
2 Replies

If you are dealing with dates, then its better to deal with DateObjects:

events = DateObject /@ {{2014, 12, 14, 15, 26, 20.`}, {2014, 12, 14, 
     15, 38, 31.`}, {2014, 12, 14, 15, 41, 14.`}, {2014, 12, 19, 11, 
     55, 11.`}, {2014, 12, 19, 11, 55, 47.`}};

Then, your function doesn't appear to need the Floor operators:

FindClusters[events, 
 DistanceFunction -> (With[{num = 
       Abs@QuantityMagnitude[DateDifference[#1, #2]]}, {Print[#1, 
       " ", #2, " ", num]}; N@num] &)]

This evaluates cleanly for me. The older notation of using lists for dates is ambiguous since it could just be a vector/list of numbers. Does this do what you want?

POSTED BY: Jeffrey Bryant

Yes, that may do the trick. The behavior of my original example is still a puzzle to me. I assume that FindClusters numericalizes things for some reason, but I wonder where those peculiar dates came from. I assume that FindClusters is making some sort of assumptions as to what the data is -- which speaks to your solution as yours makes it unambiguous that the data are dates.

POSTED BY: David Reiss
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract