Message Boards Message Boards

GROUPS:

Manipulate genealogy GEDCOM files using Wolfram Language?

Posted 4 years ago
6040 Views
|
28 Replies
|
6 Total Likes
|

Robert Nachbar has an interesting Wolfram video http://www.wolfram.com/broadcast/video.php?c=400&p=2&v=1497 on using Mathematica to manipulate genealogy GEDCOM files. As far as I can see the video does not provide a link to a source for his package. Does antone know where I might find it?

28 Replies
Posted 4 years ago

Did you check the notebook? Don't have a laptop with me so can't check

http://library.wolfram.com/infocenter/Conferences/9268/

Posted 4 years ago

Yes, I listened to the presentation and examined the notebook, No information of how to get his package.

Robert is a member of this community, so you can ping him as @Robert Nachbar. Perhaps this will reach him and he will give some more info.

Posted 4 years ago

Thanks, but how do I "ping him as @Robert Nachbar." clicking on @Robert Nachbar. tells me all about him but no contact info. This may all be moot as I ended up writing my own code.

Apologies for not being clear, but simply typing @Robert N... etc. will bring a list of people in the interface. When you select name needed and post an automated email will be sent to notify the member referenced.

enter image description here

Posted 3 years ago

Thank you. Robert did contact me by the way.

Hi, sorry I didn't see this thread sooner, I was out of town.

I intentionally omitted the package from the download site because it was not ready for general use. I have since started a major overhaul, and it is going very well, although slowly as this work is really a hobby and I get back to it only sporadically.

The new version makes use of Association and objects (e.g., individuals, families) are overloaded similarly to entities so that one can easily extract their properties without having to use special functions for that purpose.

Posted 4 years ago

Thanks for responding. I've ended up attempting to do my own coding. Not very elegant, but I am getting there. I am using associations and data sets. I found that for some reason my Mathematica (ver 11.01) stopped reading the .ged file. It was working fine for a few days and then when i updated my Ancestry tree, and downloaded a new GEDCOM, The "Import" sent a message that it didn't know how to open the file. By changing the extension to .txt, it opens as a long sting of text, from which I can then pull out what I need. For simplicity I am just using INDI, NAME, FAMC and FAMS. The GEDCOM file also has an awful lot of junk at the end that isn't associated with any individual. i have no idea what is does. I can build a tree from Graph but I think there are still some errors in the construction of edges. I don't particularly like using a family as a separate vertex. spouse-> family <- spouse and family->Child1, etc. but I can't think of any other way to do it using Graph[vertices, edges].

Ron Gove

Use Import["file.ged", "Lines"] to get the data as a list of strings, one for each line of the file. I hope Ancestry.com has not started using a new GEDCOM spec. My code is based on Release 5.5. I recall reading about an XML version in development, but I don't have any details.

I'd have to see your specific GEDCOM file to know what are the data at the end, but I suspect they are records for information about the resources in Ancestry.com that your individuals and families reference. Look for SOUR, REPO, NOTE at level 0. I currently ignore those data in my package.

Family trees are not your "normal" n-ary tree. One really needs the family vertex between two spouse vertices from which the child vertices are pendant. The graph is bipartite, with individuals in one class and families in the other. This structure become even more important when there second, third, etc. marriages. GEDCOM does support adoption records, but I'm not sure about the bet way to make the connections between the child and its biological and adoptive parents. The tree rendering in Ancestry.com is less than optimal, although they do a good job of collapsing peripheral portions.

Posted 4 years ago

I just tried Import["file.ged", "Lines"] and it worked perfectly. the end stuff is as you said like this:{"1 TITL Mayflower Births and Deaths, Vol. 1 and 2", "1 AUTH \ Ancestry.com", "1 PUBL Ancestry.com Operations, Inc.", "1 _APID \ 1,3718::0", "0 @S113467862@ SOUR", "1 TITL \ <a href="http://mayflowerhistory.com/brewster-william/"">http://mayflowerhistory.com/brewster-william/", "1 NOTE ", "0 \ @S113591755@ SOUR", "1 REPO @R100735530@", "1 TITL Canada, Find A \ Grave Index, 1600s-Current", "1 AUTH Ancestry.com", "1 PUBL \ Ancestry.com Operations, Inc.", "1 _APID 1,60527::0", "0 @S113613092@ \ SOUR", "1 REPO @R100735530@", "1 TITL Great Migration Begins: \

I find genealogy quite fascinating. I have discovered that I am a Mayflower descendant from 2 families, Stephen Hopkins and his 2 children Giles and Constance. Plus i am a distant cousin of Sarah Palin (She is also a Hopkins descendant) and a distant cousin of Mark Twain Immigrants to New England, 1620-33", "1 AUTH Robert Charles \ Anderson", "1 PUBL Ancestry.com Operations Inc", "1 _APID 1,4714::0", \ "0 TRLR"}

Thanks again.

Cool!

Glad I could help.

Would you be willing to beta-test my package when it gets to that point?

Posted 4 years ago

Yes, I'd love the chance to test it

Ron

Posted 4 years ago

OBYW, I left out the other Mayflower family, William Brewster.
There are scads of cousin marriages among the early settlers of Massachusetts, which probably explains a lot about Yankee personality.

Well, let's not jump to conclusions. Marriage between first cousins was common well into the the 19-th century throughout the US and UK.

I wonder how easy it would be to get the statistics on that.

Posted 4 years ago

That would be a very interesting statistic. Maybe a good Masters Thesis project in History or Sociology. By the way, I am 74 years old and a retired Mathematician.

Posted 4 years ago
Posted 3 months ago

Hi Ron, I would love to be able to use my GEDCOM file in Mathematica (12.2) - any chance you could point to where the package is? Thank you, Sorin

Posted 3 months ago

I never did get the referenced file and ended up writing my on Mathematica routines to analyze the GEDCOM file. I would be pleased to share what I have done but I don’t have access to my stuff at this time. I am in a condo waiting for our new home to be built and my IMac is in storage. Remind me in April or May when I should be up and running again. GEDCOM is pretty easy to parse and it wasn’t too difficult to write useful modules. I use a free program, Gramps to do the trees, and I export GRAMPS GEDCOM to Mathematica. I have a nice code that finds all lines of decent from person A to person B. It is a good exercise in “backtracking”. You should try it.

Posted 3 months ago

Thank you Ron, I will remind you in few months. Looking forward to seeing your work in action. Best wishes, Sorin

Posted 2 months ago

I'm very interested as well!

OK, I guess it's time to get my package ready for beta testing. The only documentation at this point is in the form of usage messages for all the public symbols. I'll include a notebook with example workflows.

Some of the improvements I've made since this thread was last active include:

  • using DataStructure for queues and stacks
  • real summary boxes for Individual, Family, and FamilyTree objects
  • much better handling of dates (a lot of defensive coding to deal with typical user input), including intervals
  • better handling of name parsing based on the source of the GEDCOM data
  • better handling of the vagaries of real user data, such as multiple birth, marriage, and death records
  • KindredTree and KindredTreePlot

Instead of using accessor functions to get properties of objects, the objects act like entities and can be queried directly with the idiom

object["property"]

Most of the functionality of the original package has been ported to the new package.

A lot of attention was paid to making as much of the data as possible computable and not just flat text.

Posted 3 months ago

Hi Rob, Any chance to get a beta version once you think it is ready testing? I have a GEDCOM file with about 320 entries... Thank you, Sorin

Posted 2 months ago

Hello Rob, I'm very interested in your package as well. My GEDCOM is almost a thousand people. I'm very curious about what Mathematica can do for a GEDCOM file in terms of graphs and data computability. Thank you very much! Edson

If you take a look at the video that @Ron Gove linked at the very beginning of this thread you can get a pretty good idea of the computability that can be achieved. I'm sure there is much that can be done.

Thanks for your interest! I've been working on the package for the past few weekends: improving tree splitting and merging, making the function names consistent, debugging, and useful minor improvements (like clicking on a person or family in a FamilyTreeGraph and having it copied to the clipboard). Most of this work was driven by the need for a "public" tree with enough detail to make it interesting for use in a Community post. I've settled on Teddy Roosevelt, the 26th US President, as the central figure. There is a publicly available family tree of the US Presidents in GEDCOM format, up to and including 42, Bill Clinton. It's quite extensive (over 2000 individuals), and goes back to the 11th century. The tree surrounding Teddy Roosevelt that I extracted from the US Presidents tree has 57 individuals. I'm going to import it into Ancestry and link US Census data to the individuals, and then export it for use in the Community post. At that time the package will be made available from the Wolfram Cloud, and the Community post will act as a tutorial.

I'm working on the Community post now, probably won't be ready until next week (along with the package and some example GED files)

My package Genealogy.wl is live, and the accompanying Community post is here.

Posted 25 days ago

Amazing Rob! Thank you very much for doing this. I will experiment with my family tree right away...

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract