Message Boards Message Boards

GROUPS:

Manipulate genealogy GEDCOM files using Wolfram Language?

Posted 4 years ago
8204 Views
|
39 Replies
|
6 Total Likes
|

Robert Nachbar has an interesting Wolfram video http://www.wolfram.com/broadcast/video.php?c=400&p=2&v=1497 on using Mathematica to manipulate genealogy GEDCOM files. As far as I can see the video does not provide a link to a source for his package. Does antone know where I might find it?

39 Replies
Posted 4 years ago

Did you check the notebook? Don't have a laptop with me so can't check

http://library.wolfram.com/infocenter/Conferences/9268/

Posted 4 years ago

Yes, I listened to the presentation and examined the notebook, No information of how to get his package.

Robert is a member of this community, so you can ping him as @Robert Nachbar. Perhaps this will reach him and he will give some more info.

Posted 4 years ago

Thanks, but how do I "ping him as @Robert Nachbar." clicking on @Robert Nachbar. tells me all about him but no contact info. This may all be moot as I ended up writing my own code.

Apologies for not being clear, but simply typing @Robert N... etc. will bring a list of people in the interface. When you select name needed and post an automated email will be sent to notify the member referenced.

enter image description here

Posted 4 years ago

Thank you. Robert did contact me by the way.

Hi, sorry I didn't see this thread sooner, I was out of town.

I intentionally omitted the package from the download site because it was not ready for general use. I have since started a major overhaul, and it is going very well, although slowly as this work is really a hobby and I get back to it only sporadically.

The new version makes use of Association and objects (e.g., individuals, families) are overloaded similarly to entities so that one can easily extract their properties without having to use special functions for that purpose.

Posted 4 years ago

Thanks for responding. I've ended up attempting to do my own coding. Not very elegant, but I am getting there. I am using associations and data sets. I found that for some reason my Mathematica (ver 11.01) stopped reading the .ged file. It was working fine for a few days and then when i updated my Ancestry tree, and downloaded a new GEDCOM, The "Import" sent a message that it didn't know how to open the file. By changing the extension to .txt, it opens as a long sting of text, from which I can then pull out what I need. For simplicity I am just using INDI, NAME, FAMC and FAMS. The GEDCOM file also has an awful lot of junk at the end that isn't associated with any individual. i have no idea what is does. I can build a tree from Graph but I think there are still some errors in the construction of edges. I don't particularly like using a family as a separate vertex. spouse-> family <- spouse and family->Child1, etc. but I can't think of any other way to do it using Graph[vertices, edges].

Ron Gove

Use Import["file.ged", "Lines"] to get the data as a list of strings, one for each line of the file. I hope Ancestry.com has not started using a new GEDCOM spec. My code is based on Release 5.5. I recall reading about an XML version in development, but I don't have any details.

I'd have to see your specific GEDCOM file to know what are the data at the end, but I suspect they are records for information about the resources in Ancestry.com that your individuals and families reference. Look for SOUR, REPO, NOTE at level 0. I currently ignore those data in my package.

Family trees are not your "normal" n-ary tree. One really needs the family vertex between two spouse vertices from which the child vertices are pendant. The graph is bipartite, with individuals in one class and families in the other. This structure become even more important when there second, third, etc. marriages. GEDCOM does support adoption records, but I'm not sure about the bet way to make the connections between the child and its biological and adoptive parents. The tree rendering in Ancestry.com is less than optimal, although they do a good job of collapsing peripheral portions.

Posted 4 years ago

I just tried Import["file.ged", "Lines"] and it worked perfectly. the end stuff is as you said like this:{"1 TITL Mayflower Births and Deaths, Vol. 1 and 2", "1 AUTH \ Ancestry.com", "1 PUBL Ancestry.com Operations, Inc.", "1 _APID \ 1,3718::0", "0 @S113467862@ SOUR", "1 TITL \ <a href="http://mayflowerhistory.com/brewster-william/"">http://mayflowerhistory.com/brewster-william/", "1 NOTE ", "0 \ @S113591755@ SOUR", "1 REPO @R100735530@", "1 TITL Canada, Find A \ Grave Index, 1600s-Current", "1 AUTH Ancestry.com", "1 PUBL \ Ancestry.com Operations, Inc.", "1 _APID 1,60527::0", "0 @S113613092@ \ SOUR", "1 REPO @R100735530@", "1 TITL Great Migration Begins: \

I find genealogy quite fascinating. I have discovered that I am a Mayflower descendant from 2 families, Stephen Hopkins and his 2 children Giles and Constance. Plus i am a distant cousin of Sarah Palin (She is also a Hopkins descendant) and a distant cousin of Mark Twain Immigrants to New England, 1620-33", "1 AUTH Robert Charles \ Anderson", "1 PUBL Ancestry.com Operations Inc", "1 _APID 1,4714::0", \ "0 TRLR"}

Thanks again.

Cool!

Glad I could help.

Would you be willing to beta-test my package when it gets to that point?

Posted 4 years ago

Yes, I'd love the chance to test it

Ron

Posted 4 years ago

OBYW, I left out the other Mayflower family, William Brewster.
There are scads of cousin marriages among the early settlers of Massachusetts, which probably explains a lot about Yankee personality.

Well, let's not jump to conclusions. Marriage between first cousins was common well into the the 19-th century throughout the US and UK.

I wonder how easy it would be to get the statistics on that.

Posted 4 years ago

That would be a very interesting statistic. Maybe a good Masters Thesis project in History or Sociology. By the way, I am 74 years old and a retired Mathematician.

Posted 4 years ago
Posted 9 months ago

Hi Ron, I would love to be able to use my GEDCOM file in Mathematica (12.2) - any chance you could point to where the package is? Thank you, Sorin

Posted 9 months ago

I never did get the referenced file and ended up writing my on Mathematica routines to analyze the GEDCOM file. I would be pleased to share what I have done but I don’t have access to my stuff at this time. I am in a condo waiting for our new home to be built and my IMac is in storage. Remind me in April or May when I should be up and running again. GEDCOM is pretty easy to parse and it wasn’t too difficult to write useful modules. I use a free program, Gramps to do the trees, and I export GRAMPS GEDCOM to Mathematica. I have a nice code that finds all lines of decent from person A to person B. It is a good exercise in “backtracking”. You should try it.

Posted 9 months ago

Thank you Ron, I will remind you in few months. Looking forward to seeing your work in action. Best wishes, Sorin

Posted 8 months ago

I'm very interested as well!

OK, I guess it's time to get my package ready for beta testing. The only documentation at this point is in the form of usage messages for all the public symbols. I'll include a notebook with example workflows.

Some of the improvements I've made since this thread was last active include:

  • using DataStructure for queues and stacks
  • real summary boxes for Individual, Family, and FamilyTree objects
  • much better handling of dates (a lot of defensive coding to deal with typical user input), including intervals
  • better handling of name parsing based on the source of the GEDCOM data
  • better handling of the vagaries of real user data, such as multiple birth, marriage, and death records
  • KindredTree and KindredTreePlot

Instead of using accessor functions to get properties of objects, the objects act like entities and can be queried directly with the idiom

object["property"]

Most of the functionality of the original package has been ported to the new package.

A lot of attention was paid to making as much of the data as possible computable and not just flat text.

Posted 9 months ago

Hi Rob, Any chance to get a beta version once you think it is ready testing? I have a GEDCOM file with about 320 entries... Thank you, Sorin

Posted 8 months ago

Hello Rob, I'm very interested in your package as well. My GEDCOM is almost a thousand people. I'm very curious about what Mathematica can do for a GEDCOM file in terms of graphs and data computability. Thank you very much! Edson

If you take a look at the video that @Ron Gove linked at the very beginning of this thread you can get a pretty good idea of the computability that can be achieved. I'm sure there is much that can be done.

Thanks for your interest! I've been working on the package for the past few weekends: improving tree splitting and merging, making the function names consistent, debugging, and useful minor improvements (like clicking on a person or family in a FamilyTreeGraph and having it copied to the clipboard). Most of this work was driven by the need for a "public" tree with enough detail to make it interesting for use in a Community post. I've settled on Teddy Roosevelt, the 26th US President, as the central figure. There is a publicly available family tree of the US Presidents in GEDCOM format, up to and including 42, Bill Clinton. It's quite extensive (over 2000 individuals), and goes back to the 11th century. The tree surrounding Teddy Roosevelt that I extracted from the US Presidents tree has 57 individuals. I'm going to import it into Ancestry and link US Census data to the individuals, and then export it for use in the Community post. At that time the package will be made available from the Wolfram Cloud, and the Community post will act as a tutorial.

I'm working on the Community post now, probably won't be ready until next week (along with the package and some example GED files)

My package Genealogy.wl is live, and the accompanying Community post is here.

Posted 6 months ago

Amazing Rob! Thank you very much for doing this. I will experiment with my family tree right away...

Posted 4 months ago

I have finally settled in our new home and got my computer started up again after 5 homeless months. I now see that Robert Nachbar has posted his genealogy package. It looks very impressive. Makes what I've been doing look kind of sad. I plan on taking a detailed look as time permits. However, i often find it difficult to understand programs written by myself, let alone others. I've attached a notebook with some of the programs I've been using to parse GED files from my Genealogy in Gramps. Since Gramps (and other programs like that) display all sorts of trees and relationships, i am focusing my work on looking at lines of descent. In my family, there are a lot of cousin marriages amongst the Mayflower descendants (of which i am), and i needed Mathematica to search these out. For example, i am descended from Stephen Hopkins in 8 ways. Locating these required increasing the recursion limit to 4000. Also, my Mathematica skills are rather limited and I'm sure others could write much better code.

Thanks for posting your code, Ron!

If all one is interested in is familial relationships, them simpler is better. I'll admit, I take a while to get my head around the file parsing code of my package because it captures everything in the file and attempts to handle a GED file regardless of its source.

I have some cousin marriages in my family tree, also, and a pair of brothers marrying a pair of sisters. Viewing the tree in Ancestry does not make that obvious, whereas making a Graph in Mathematica certainly does.

For lines of descent, have you tried making a Graph and then using FindPath?

Let me know if you have any problems using my package, I'd be more than happy to help you out.

Posted 4 months ago

I thought about using FindPath but never did. On the other hand I thought it would be a good exercise to write my own path finding code. Originally I had a mess of Mathematica code with lots of Goto and Label. I since have been able to do a recursive program (LOD, in the notebook). It was quite a challenge. For some reason, Return stopped returning the answer when the recursions got too deep. Mathematica Stack Exchange was of no help here other than saying I should use Throw and Catch. I had no idea what those constructs would be good for when i first saw them. Turns out to have been exactly what i needed to get the code to function. FYI, I learned programming (machine language and Fortran) almost 60 years ago as an undergrad mathematics major. Our university computer was an IBM 1620 with 64,000 BITS of memory! So I am not well versed on modern coding techniques. But I love Mathematica and learn more about its capabilities every day.

Glad you found that Throw and Catch were just what you needed. I've seen similar "failure" of Return and now use Throw and Catch all the time.

Sounds like we're of similar vintage. My first computing was done on the college's IBM 1130 fifty years ago with 8K bytes and real cores. I have one box of punch cards left from that era! I learned FORTRAN G from the IBM reference manual. I eventually moved on to c in the 90s. I started using Mathematica in 1989, but didn't get seduced until the mid 90s when I started doing genetic programming (John Koza's extension of genetic algorithms).

Posted 4 months ago

Maybe this is a dumb question, but I ran

Get@CloudObject["https://www.wolframcloud.com/obj/r.b.nachbar/Published/Genealogy.wl"] 

and it did not get anything.

This is what I get with version 12.3:

In[52]:= Get@
 CloudObject[
  "https://www.wolframcloud.com/obj/r.b.nachbar/Published/Genealogy.wl"]

In[54]:= Names["Genealogy`*"]

Out[54]= {"Family", "FamilyDelete", "FamilySummary", "FamilyTree", \
"FamilyTreeDelete", "FamilyTreeExistsQ", "FamilyTreeGraph", \
"FamilyTreeRename", "FamilyTreeSplit", "FamilyTreeSummary", \
"FamilyTreeTake", "FamilyTreeValidQ", "FindFamily", "FindIndividual", \
"GenealogicalProperty", "Individual", "IndividualDelete", \
"IndividualEquivalentQ", "IndividualSummary", "KindredGraph", \
"KindredRelationship", "KindredTreePlot", "ListFamilyTrees", \
"OrderBy", "ReadGEDCOM", "SetCurrentFamilyTree", "WriteGEDCOM", \
"$GenealogyReleaseNumber", "$GenealogyVersion", \
"$GenealogyVersionNumber"}

The file has Public permissions, so you should be able to load it.

Posted 4 months ago

still nothing

Attachment

Attachments:

Hi Ron,
This line is not meant to give any output. It just loads the package. After evaluating it, try evaluate this line too

Names["Genealogy`*"]

Worked fine with me and gave the list of names, same as Robert showed.

Posted 4 months ago

I am running version 12.1.1.0.

Posted 4 months ago

Thanks. Now I see I do have something.

Great!

Btw, I'm teaching at the Wolfram Summer School, so responses will be delayed for the next few weeks.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract