Using Recursive Exhaustion

This is a renamed version of a recent front page.  It will soon be edited to make more sense as a correct but misleading post.

The former front page which was of much general interest is still available.  This page may seem much too specific, but there is a good reason for that.  What follows is not particularly technical and will be vitally important to future genealogists.


This website promotes mathematical genealogy — many people have no idea there could be such a thing.  More specifically it promotes the  use of one specific technique which be used for mathematical genealogy: Recursive ExhaustionNote: recursive exhaustion has the potential for good or evil.   The new algorithm will be implemented and used, but not too soon, I hope.  The key task lying ahead is making it available for research while keeping it out of the hands of malicious individuals.

One role of this website is to point out the advantages of the new algorithm for genealogy. It also exists to help explain recursive exhaustion to those wanting to know more about how it works.

The first step of this mathematical approach is to encode various things we know about person into a set of coordinates in a multidimensional vector space.  That’s easier than it sounds.

It  is obvious that names can be expressed by numbers.  For example, my first name, Douglas, can be assigned the number 45 because it is the 45th most common male first name on some old US census.  That is a terrible way to do it, as I shall explain later, but it does express one property of my first name — frequency.  You could call it the frequency or popularity dimension.

Similarly, my middle name, Pardoe, could be encoded as the number 80085, since it is the 80085th most common surname on the same old US census.  It is not found on any other name list from that census  My last name, Wilson, could be encoded by the number 8, since it was the 8th most common surname on that old census.

As far as I know, the combination of these three names are unique.  I am the one person in the world whose name could be represented as 45, 80085, 8.  It is a unique identifier.  On the other hand, my brother’s names do not describe a unique individual.  There is another person with exactly his three names in the same city in which we were raised.

To make a unique identifier for my brother it is necessary to add three numbers representing his birthdate: year, month and day.  To represent us both in the same vector space, my own birthdate would have to be added.

As I said, encoding names using lists from some old census is a terrible way to do it, but it illustrates a basic method.  A few basic facts about a person such as their name and birthdate can easily be translated into a sequence  of numbers, which we call a vector.  Of vital importance is the ability to invert this.  Given a sequence of numbers, it must be easy to decode them, reproducing the original information.

So the first step in mathematical genealogy is encoding a few basic facts which a person can easily read into a sequence of numbers.  The last step is the inverse, recovering that description from its numerical representation.  That could be just table-lookup, but there are better ways, discussed elsewhere.

What happens between these steps is the key to mathematical genealogy.   Encoding and decoding are inverses, more easily done with linear algebra, but it may be necessary to use category theory and think of them as adjoint functorsIn general, sandwiching important transformations between an operation and its inverse are the most powerful mathematical methods I’ve ever encountered.

The meat in the sandwich discussed here is recursive exhaustion, the most powerful data collection and correction method I know.

In my website on recursive exhaustion I use the term exhaustion as it is used in the context of computer science.   But another meaning of the term is that of exhausting a space of possibilities.  That basically means doing it for everybody.  For mathematical genealogy using recursive exhaustion the important thing is to create a mathematical model like a sequence of numbers for every single person we can identify, past, present and even future — though extrapolating beyond even the best date is always risky:  she may have a miscarriage.

That meaning of exhaustion is entirely consistent with a major goal of genealogy in general.  We do not want to simply create a mathematical model for existing people, we want to create one for people long dead.  For example, the sequence of numbers 131, 80085, 8, 1880, 10, 16 could represent my grandfather, Frederick Pardoe Wilson, born on October 16, 1880.  That is a unique descriptor for the man.  A goal of genealogy includes producing mathematical descriptions for not only my grandfather but all of my other relatives.  I know some identifying and other information about a very few going back to before the Norman Conquest.  Obviously I’ve had too much time on my hands.

One reason for collecting this information in mathematical form is that it will be easier to merge with that of other people.  The fact that I share ancestors six generations back with a fourth cousin made it much easier to establish the basic facts.   (Thanks, Ann.)   Collaboration will be much much easier for everyone when genealogical data is in mathematical form.

Though rewarding for many reasons,  encouraging collaboration is not the main reason for this approach.  Methods such as recursive exhaustion can be used to extract an enormous amount of additional information about each other person who has ever existed.  Much of this information would be of great value to those interested in their family history.  I will explain this in detail on various posts be added to this site.   Meanwhile look at the website actually called Very Large Scale Social Data Collect, with the domain name.

Leave a Reply

Your email address will not be published. Required fields are marked *