As part of my dissertation on privacy and technology, I’m looking into sterilization in the early part of the twentieth century. Many of the patients (men and women) were vulnerable, many were institutionalized, and often the extent of their consent is debatable. Others sought sterilization and appreciated the procedure. Many had families who seem to have consented on their behalf.

The E. S. Gosney Papers and Records of the Human Betterment Foundation have a number of archival records capturing information about these patients, especially those who were institutionalized. The HBF was concerned particularly with eugenics, a now-discredited scientific view that those with problematic genetics (as identified by the science of the time, and including mental disorders, limited intellectual capacity, epilepsy, and more) should be discouraged or prevented from reproducing.

Sorting through the records and identifying trends is time-consuming. One approach I’ve been experimenting with is taking the digital images I have of many of the records and looking for interrelationships between them. To do this kind of textual analysis first requires either manual retyping or the use of optical character recognition (OCR). OCR is much easier and more effective at handling large volumes of material, and I’m trying out ABBYY’s Cloud OCR SDK to see how effective it is at batch processing these materials.