Heuristic Genealogy

Good genealogists want documentation for everything they learn.   This site is about mathematical methods in genealogy, but recognizes these are at best heuristic.   Mathematical methods can suggest likely facts, but not prove your results.   Nevertheless, tbey can be very very good.  Moreover, the use of these methods can point out likely errors in the documents or their transcription.


A former front page which was of much general interest is still available.  This page may seem much too specific, but there is a good reason for that.  What follows is not particularly technical.

It is important to understand the difference between implication and inference.   To imply is to tell or suggest something.  To infer is to understand or accept what is implied.

A document implies something.  We infer that it is right.   A document may be unclear, as when bad handwriting makes words unreadable, but it still suggests something — it implies some apparent fact.   To the extent that we accept what the document seems to be saying, we are making an inference.  We often infer that what the document says is true.  That may be a mistake, as other documents will show.

If we completely accept what several documents say, we may combine them and tell another researcher what they imply.   That statement may be taken at face value, or the other researcher may infer that it is probably correct.

When it comes right down to it, almost all genealogy is inference.    There are only degrees of proof.

To make sure royal genealogies are correct, it was common for brides to be inspected for virginity and the act of consummation itself was often observed.  Even the common people often kept bloody bedsheets as proof of virginity.   But we now know how easily even this evidence can be faked.

In Judaic genealogy, the contribution of the man was always suspect.  A person was taken as Jewish only if born to a Jewish woman.

So all we have is probability and inference.  Now we have DNA evidence, but it is hard if not impossible to obtain from long-dead ancestors.  Nor is it completely reliable.

I claim that remarkable new methods based on mathematics can make these inferences much much more reliable.

One class of methods is called deep learning.  The best known of these is the training of an artificial neural network.  The canonical example of this was training a neural network to recognize bridges.   Images were captured as pixel arrays, as done today in any digital camera.  Some of these pictures were of bridges, some were of different objects.  The additional data supplied to the neural network was just the desired outcome, was it a bridge or not?

After training the neural network with thousands of images, it was used to classify other ones that it had never seen before.  Without being told explicitly how to recognize a bridge, it was able to do so almost perfectly.   A greater number and diversity of images supplied during training produced even better results.  More detailed images helped.  Variations in the design of the neural network helped.  Now it is a simple matter to get excellent results, and this method is among those used in facial recognition software.

In mathematics there are many problems which are easy to solve in one way but hard the other way.   Multiplying several prime numbers is easy.   To do the inverse and break down a given number into its prime factors is much more difficult.   This fact is the basis for most forms of modern cryptography.

This is true of genealogy as well.

Genealogical information is of tremendous importance in identifying and describing a person.   Given only limited information such as a person’s name and overall physical characteristics it is usually difficult to create an adequate ancestral tree.

A new technique called Recursive Exhaustion makes is possible to acquire vast amounts of social data without a person’s knowledge or permission.

Note: recursive exhaustion has the potential for good or evil.   The new algorithm will be implemented and used, but not too soon, I hope.  The key task lying ahead is making it available for research while keeping it out of the hands of malicious individuals.

In applying the recursive exhaustion algorithm to collect information about individuals, genealogical information can be very helpful.  In some posts on this site I will describe how this works.

Working the problem backwards to uncover genealogical information from the other data about a person is possible.   It will be harder to recover it than use it, but probably not all that hard.  It will almost certainly be done by some people.  In other posts on this site I will describe methods for doing so.

The closest I can come to a solution to the problems posed by the very large scale collection of social data is for some trusted people to use the information to locate the malicious ones and stop them from using what they have.

Genealogist who carefully study original sources and communicate them among themselves could be an enormous help in this.

Leave a Reply

Your email address will not be published. Required fields are marked *