Message Boards Message Boards

1
|
18188 Views
|
45 Replies
|
8 Total Likes
View groups...
Share
Share this post:

Manipulate genealogy GEDCOM files using Wolfram Language?

Posted 7 years ago

Robert Nachbar has an interesting Wolfram video http://www.wolfram.com/broadcast/video.php?c=400&p=2&v=1497 on using Mathematica to manipulate genealogy GEDCOM files. As far as I can see the video does not provide a link to a source for his package. Does antone know where I might find it?

POSTED BY: Ron Gove
45 Replies
Posted 1 year ago

The cloud package has been updated.

If it's convenient, could you please post the snippet for the individual in your GEDCOM file that exposed the bug? I'd like to add it to my test suite. Thanks!

POSTED BY: Updating Name

Hi, Bob. thanks for pointing out the problem. I must not have had any tests that used that bit of code.

I was able to test the problem outside of the package with

In[34]:= assoc = <|"GIVN" -> "Robert", "SURN" -> "Nachbar", 
  "NSFX" -> "Jr"|>

Out[34]= <|"GIVN" -> "Robert", "SURN" -> "Nachbar", "NSFX" -> "Jr"|>

In[35]:= Scan[assoc[#] = . &, {"GIVN", "SURN", "NSFX"}]

During evaluation of In[35]:= Syntax::sntxf

The fix is just using parentheses to coerce the FrontEnd's parsing:

In[35]:= Scan[(assoc[#] = .) &, {"GIVN", "SURN", "NSFX"}]

In[36]:= assoc

Out[36]= <||>

I'll update the package and push a new version to the Cloud.

Thanks, Bob

POSTED BY: Robert Nachbar
Posted 1 year ago

Hi Bob, In the "cob" file there is a "gedcomLines" function definition specific to the "NAME" tag as follows:

gedcomLines[level_, tag:"NAME", assocIn_Association] /; 
       MemberQ[Keys[assocIn], "GIVN" | "SURN" | "NSFX"] := 
    Module[{assoc=assocIn, giv, sur, suf, value}, 
(*Echo["gedcomLines"[level, tag, Association], "NAME"]; *)
       giv = assoc["GIVN"] /. _Missing -> ""; 
       sur = assoc["SURN"] /. _Missing -> ""; 
       suf = assoc["NSFX"] /. _Missing -> ""; 
       If[StringFreeQ[giv, "/"] && StringFreeQ[sur, "/"] && StringFreeQ[suf, "/"], 
         (* // elided when surname and suffix are absent *)
         value = StringTrim[giv <> " /" <> sur <> "/ " <> suf] // 
          StringReplace[#, (" //"|"//")~~EndOfString -> ""]&; 
         Scan[assoc[#] = . &, {"GIVN", "SURN", "NSFX"}]; 
         assoc["_VALU"] = value
         ]; 
       gedcomLines[level, tag, assoc]
       ]

The line with the "Scan" statement appears to have a syntax errror and generates the error message:

Syntax::sntxf: "assoc[#]=" cannot be followed by ".&".

Thanks for your help in looking at this.

Regards, Bob

POSTED BY: Bob Renninger
Posted 2 years ago

Bob,

Yes, your suggestion works. Thanks for the quick reply!

Regards,

Bob

POSTED BY: Bob Renninger

The problem is with the first few instances of FamilyTree embedded in the notebook. Another Community member message me privately about that also. Newly computed FamilyTree objects are fine, so, for example, copy Out[5] and pasting it into the corresponding part of In[6] and In[7] should work.

I'll get a corrected version tested and uploaded soon.

Bob

POSTED BY: Robert Nachbar
Posted 2 years ago

Hi Robert,

I really appreciate your work in developing this package. However, I am running into a problem. The figure below shows some error messages involving the FamilyTree function, and I have attached the notebook used for the figure. Error Messages for FamilyTree There seems to be a conflict between global and local definitions of FamilyTree. I am using the Mathematica Home Edition version 13.0.1.0. For now my workaround is to avoid the Box notation when using FamilyTree. Is this something that can be fixed in the source package?

Thanks very much for your help.

Regards, Bob Renninger

Attachments:
POSTED BY: Bob Renninger

Great!

Btw, I'm teaching at the Wolfram Summer School, so responses will be delayed for the next few weeks.

POSTED BY: Robert Nachbar
Posted 2 years ago

Thanks. Now I see I do have something.

POSTED BY: Ron Gove
Posted 2 years ago

I am running version 12.1.1.0.

POSTED BY: Ron Gove

Hi Ron,
This line is not meant to give any output. It just loads the package. After evaluating it, try evaluate this line too

Names["Genealogy`*"]

Worked fine with me and gave the list of names, same as Robert showed.

POSTED BY: Ahmed Elbanna
Posted 2 years ago

still nothing

Attachment

Attachments:
POSTED BY: Ron Gove

This is what I get with version 12.3:

In[52]:= Get@
 CloudObject[
  "https://www.wolframcloud.com/obj/r.b.nachbar/Published/Genealogy.wl"]

In[54]:= Names["Genealogy`*"]

Out[54]= {"Family", "FamilyDelete", "FamilySummary", "FamilyTree", \
"FamilyTreeDelete", "FamilyTreeExistsQ", "FamilyTreeGraph", \
"FamilyTreeRename", "FamilyTreeSplit", "FamilyTreeSummary", \
"FamilyTreeTake", "FamilyTreeValidQ", "FindFamily", "FindIndividual", \
"GenealogicalProperty", "Individual", "IndividualDelete", \
"IndividualEquivalentQ", "IndividualSummary", "KindredGraph", \
"KindredRelationship", "KindredTreePlot", "ListFamilyTrees", \
"OrderBy", "ReadGEDCOM", "SetCurrentFamilyTree", "WriteGEDCOM", \
"$GenealogyReleaseNumber", "$GenealogyVersion", \
"$GenealogyVersionNumber"}

The file has Public permissions, so you should be able to load it.

POSTED BY: Robert Nachbar
Posted 2 years ago

Maybe this is a dumb question, but I ran

Get@CloudObject["https://www.wolframcloud.com/obj/r.b.nachbar/Published/Genealogy.wl"] 

and it did not get anything.

POSTED BY: Ron Gove

Glad you found that Throw and Catch were just what you needed. I've seen similar "failure" of Return and now use Throw and Catch all the time.

Sounds like we're of similar vintage. My first computing was done on the college's IBM 1130 fifty years ago with 8K bytes and real cores. I have one box of punch cards left from that era! I learned FORTRAN G from the IBM reference manual. I eventually moved on to c in the 90s. I started using Mathematica in 1989, but didn't get seduced until the mid 90s when I started doing genetic programming (John Koza's extension of genetic algorithms).

POSTED BY: Robert Nachbar
Posted 3 years ago

I thought about using FindPath but never did. On the other hand I thought it would be a good exercise to write my own path finding code. Originally I had a mess of Mathematica code with lots of Goto and Label. I since have been able to do a recursive program (LOD, in the notebook). It was quite a challenge. For some reason, Return stopped returning the answer when the recursions got too deep. Mathematica Stack Exchange was of no help here other than saying I should use Throw and Catch. I had no idea what those constructs would be good for when i first saw them. Turns out to have been exactly what i needed to get the code to function. FYI, I learned programming (machine language and Fortran) almost 60 years ago as an undergrad mathematics major. Our university computer was an IBM 1620 with 64,000 BITS of memory! So I am not well versed on modern coding techniques. But I love Mathematica and learn more about its capabilities every day.

POSTED BY: Ron Gove

Thanks for posting your code, Ron!

If all one is interested in is familial relationships, them simpler is better. I'll admit, I take a while to get my head around the file parsing code of my package because it captures everything in the file and attempts to handle a GED file regardless of its source.

I have some cousin marriages in my family tree, also, and a pair of brothers marrying a pair of sisters. Viewing the tree in Ancestry does not make that obvious, whereas making a Graph in Mathematica certainly does.

For lines of descent, have you tried making a Graph and then using FindPath?

Let me know if you have any problems using my package, I'd be more than happy to help you out.

POSTED BY: Robert Nachbar
Posted 3 years ago

I have finally settled in our new home and got my computer started up again after 5 homeless months. I now see that Robert Nachbar has posted his genealogy package. It looks very impressive. Makes what I've been doing look kind of sad. I plan on taking a detailed look as time permits. However, i often find it difficult to understand programs written by myself, let alone others. I've attached a notebook with some of the programs I've been using to parse GED files from my Genealogy in Gramps. Since Gramps (and other programs like that) display all sorts of trees and relationships, i am focusing my work on looking at lines of descent. In my family, there are a lot of cousin marriages amongst the Mayflower descendants (of which i am), and i needed Mathematica to search these out. For example, i am descended from Stephen Hopkins in 8 ways. Locating these required increasing the recursion limit to 4000. Also, my Mathematica skills are rather limited and I'm sure others could write much better code.

POSTED BY: Ron Gove
Posted 3 years ago

Amazing Rob! Thank you very much for doing this. I will experiment with my family tree right away...

POSTED BY: Sorin Suciu

My package Genealogy.wl is live, and the accompanying Community post is here.

POSTED BY: Robert Nachbar

I'm working on the Community post now, probably won't be ready until next week (along with the package and some example GED files)

POSTED BY: Robert Nachbar

If you take a look at the video that @Ron Gove linked at the very beginning of this thread you can get a pretty good idea of the computability that can be achieved. I'm sure there is much that can be done.

POSTED BY: Robert Nachbar

Thanks for your interest! I've been working on the package for the past few weekends: improving tree splitting and merging, making the function names consistent, debugging, and useful minor improvements (like clicking on a person or family in a FamilyTreeGraph and having it copied to the clipboard). Most of this work was driven by the need for a "public" tree with enough detail to make it interesting for use in a Community post. I've settled on Teddy Roosevelt, the 26th US President, as the central figure. There is a publicly available family tree of the US Presidents in GEDCOM format, up to and including 42, Bill Clinton. It's quite extensive (over 2000 individuals), and goes back to the 11th century. The tree surrounding Teddy Roosevelt that I extracted from the US Presidents tree has 57 individuals. I'm going to import it into Ancestry and link US Census data to the individuals, and then export it for use in the Community post. At that time the package will be made available from the Wolfram Cloud, and the Community post will act as a tutorial.

POSTED BY: Robert Nachbar
Posted 3 years ago

Hello Rob, I'm very interested in your package as well. My GEDCOM is almost a thousand people. I'm very curious about what Mathematica can do for a GEDCOM file in terms of graphs and data computability. Thank you very much! Edson

POSTED BY: Edson Ferreira
Posted 3 years ago

I'm very interested as well!

POSTED BY: Edson Ferreira
Posted 3 years ago

Hi Rob, Any chance to get a beta version once you think it is ready testing? I have a GEDCOM file with about 320 entries... Thank you, Sorin

POSTED BY: Sorin Suciu
Posted 3 years ago

Thank you Ron, I will remind you in few months. Looking forward to seeing your work in action. Best wishes, Sorin

POSTED BY: Sorin Suciu

OK, I guess it's time to get my package ready for beta testing. The only documentation at this point is in the form of usage messages for all the public symbols. I'll include a notebook with example workflows.

Some of the improvements I've made since this thread was last active include:

  • using DataStructure for queues and stacks
  • real summary boxes for Individual, Family, and FamilyTree objects
  • much better handling of dates (a lot of defensive coding to deal with typical user input), including intervals
  • better handling of name parsing based on the source of the GEDCOM data
  • better handling of the vagaries of real user data, such as multiple birth, marriage, and death records
  • KindredTree and KindredTreePlot

Instead of using accessor functions to get properties of objects, the objects act like entities and can be queried directly with the idiom

object["property"]

Most of the functionality of the original package has been ported to the new package.

A lot of attention was paid to making as much of the data as possible computable and not just flat text.

POSTED BY: Robert Nachbar
Posted 3 years ago

I never did get the referenced file and ended up writing my on Mathematica routines to analyze the GEDCOM file. I would be pleased to share what I have done but I don’t have access to my stuff at this time. I am in a condo waiting for our new home to be built and my IMac is in storage. Remind me in April or May when I should be up and running again. GEDCOM is pretty easy to parse and it wasn’t too difficult to write useful modules. I use a free program, Gramps to do the trees, and I export GRAMPS GEDCOM to Mathematica. I have a nice code that finds all lines of decent from person A to person B. It is a good exercise in “backtracking”. You should try it.

POSTED BY: Ron Gove
Posted 3 years ago

Hi Ron, I would love to be able to use my GEDCOM file in Mathematica (12.2) - any chance you could point to where the package is? Thank you, Sorin

POSTED BY: Sorin Suciu
Posted 7 years ago

Thank you. Robert did contact me by the way.

POSTED BY: Ron Gove

Apologies for not being clear, but simply typing @Robert N... etc. will bring a list of people in the interface. When you select name needed and post an automated email will be sent to notify the member referenced.

enter image description here

POSTED BY: Kapio Letto
Posted 7 years ago
Posted 7 years ago

That would be a very interesting statistic. Maybe a good Masters Thesis project in History or Sociology. By the way, I am 74 years old and a retired Mathematician.

POSTED BY: Ron Gove

Well, let's not jump to conclusions. Marriage between first cousins was common well into the the 19-th century throughout the US and UK.

I wonder how easy it would be to get the statistics on that.

POSTED BY: Robert Nachbar
Posted 7 years ago

OBYW, I left out the other Mayflower family, William Brewster.
There are scads of cousin marriages among the early settlers of Massachusetts, which probably explains a lot about Yankee personality.

POSTED BY: Ron Gove
Posted 7 years ago

Yes, I'd love the chance to test it

Ron

POSTED BY: Ron Gove

Cool!

Glad I could help.

Would you be willing to beta-test my package when it gets to that point?

POSTED BY: Robert Nachbar
Posted 7 years ago

I just tried Import["file.ged", "Lines"] and it worked perfectly. the end stuff is as you said like this:{"1 TITL Mayflower Births and Deaths, Vol. 1 and 2", "1 AUTH \ Ancestry.com", "1 PUBL Ancestry.com Operations, Inc.", "1 _APID \ 1,3718::0", "0 @S113467862@ SOUR", "1 TITL \ <a href="http://mayflowerhistory.com/brewster-william/"">http://mayflowerhistory.com/brewster-william/", "1 NOTE ", "0 \ @S113591755@ SOUR", "1 REPO @R100735530@", "1 TITL Canada, Find A \ Grave Index, 1600s-Current", "1 AUTH Ancestry.com", "1 PUBL \ Ancestry.com Operations, Inc.", "1 _APID 1,60527::0", "0 @S113613092@ \ SOUR", "1 REPO @R100735530@", "1 TITL Great Migration Begins: \

I find genealogy quite fascinating. I have discovered that I am a Mayflower descendant from 2 families, Stephen Hopkins and his 2 children Giles and Constance. Plus i am a distant cousin of Sarah Palin (She is also a Hopkins descendant) and a distant cousin of Mark Twain Immigrants to New England, 1620-33", "1 AUTH Robert Charles \ Anderson", "1 PUBL Ancestry.com Operations Inc", "1 _APID 1,4714::0", \ "0 TRLR"}

Thanks again.

POSTED BY: Ron Gove

Use Import["file.ged", "Lines"] to get the data as a list of strings, one for each line of the file. I hope Ancestry.com has not started using a new GEDCOM spec. My code is based on Release 5.5. I recall reading about an XML version in development, but I don't have any details.

I'd have to see your specific GEDCOM file to know what are the data at the end, but I suspect they are records for information about the resources in Ancestry.com that your individuals and families reference. Look for SOUR, REPO, NOTE at level 0. I currently ignore those data in my package.

Family trees are not your "normal" n-ary tree. One really needs the family vertex between two spouse vertices from which the child vertices are pendant. The graph is bipartite, with individuals in one class and families in the other. This structure become even more important when there second, third, etc. marriages. GEDCOM does support adoption records, but I'm not sure about the bet way to make the connections between the child and its biological and adoptive parents. The tree rendering in Ancestry.com is less than optimal, although they do a good job of collapsing peripheral portions.

POSTED BY: Robert Nachbar
Posted 7 years ago

Thanks for responding. I've ended up attempting to do my own coding. Not very elegant, but I am getting there. I am using associations and data sets. I found that for some reason my Mathematica (ver 11.01) stopped reading the .ged file. It was working fine for a few days and then when i updated my Ancestry tree, and downloaded a new GEDCOM, The "Import" sent a message that it didn't know how to open the file. By changing the extension to .txt, it opens as a long sting of text, from which I can then pull out what I need. For simplicity I am just using INDI, NAME, FAMC and FAMS. The GEDCOM file also has an awful lot of junk at the end that isn't associated with any individual. i have no idea what is does. I can build a tree from Graph but I think there are still some errors in the construction of edges. I don't particularly like using a family as a separate vertex. spouse-> family <- spouse and family->Child1, etc. but I can't think of any other way to do it using Graph[vertices, edges].

Ron Gove

POSTED BY: Ron Gove

Hi, sorry I didn't see this thread sooner, I was out of town.

I intentionally omitted the package from the download site because it was not ready for general use. I have since started a major overhaul, and it is going very well, although slowly as this work is really a hobby and I get back to it only sporadically.

The new version makes use of Association and objects (e.g., individuals, families) are overloaded similarly to entities so that one can easily extract their properties without having to use special functions for that purpose.

POSTED BY: Robert Nachbar
Posted 7 years ago

Thanks, but how do I "ping him as @Robert Nachbar." clicking on @Robert Nachbar. tells me all about him but no contact info. This may all be moot as I ended up writing my own code.

POSTED BY: Ron Gove

Robert is a member of this community, so you can ping him as @Robert Nachbar. Perhaps this will reach him and he will give some more info.

POSTED BY: Kapio Letto
Posted 7 years ago

Yes, I listened to the presentation and examined the notebook, No information of how to get his package.

POSTED BY: Ron Gove

Did you check the notebook? Don't have a laptop with me so can't check

http://library.wolfram.com/infocenter/Conferences/9268/

POSTED BY: l van Veen
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract