U's trove of census data stands alone in the world
- Article by: MARY JANE SMETANKA
- Star Tribune
- April 8, 2012 - 9:00 PM
When it comes to bragging rights for loads of information about lots of people, Facebook has nothing on the University of Minnesota.
The U's Minnesota Population Center is the world's largest repository of census data, with data on a staggering 859 million people. Social media giant Facebook has information on a mere 800 million folks.
The U's center is a paradise for data addicts, with 765 record sets from 65 countries that date back to the 1787 census of Denmark.
Those statistics were recently enriched by the addition of the 1940 U.S. census, a treasure trove of detail that makes statisticians drool.
"It's like fine wine -- older data is better than new data, because it's rarer," said Director Steven Ruggles, his eyes lighting up as he speculated on the research that could result.
The Population Center is working with the genealogy website Ancestry.com to make the 1940 census accessible to the public by digitizing records so they are searchable, but researchers will have to access family records on commercial sites rather than through the center.
To protect privacy, the center strips names from its research files, and records are converted into a code that requires training to use.
With 68,000 users, research papers citing the center's data are churned out at the rate of three a day, Ruggles said. Some literally rewrite history.
One recent project that drew attention was by a Binghamton University professor who used records to recalculate the number of people who died in the Civil War. His work raised the death toll by 20 percent, from 620,000 to 750,000.
Research based on the center's data has tracked the effects of hookworm eradication in the South on school attendance, educational level and income in later life. Other research showed how the introduction of the birth control pill resulted in women marrying later and staying in school.
Finding crumbling data
Ruggles, who is a Regents Professor of history at the U, has used the data to track changes in families.
In 1850, he said, 72 percent of Americans 65 and older lived with family members. By 1990, that figure had shrunk to 14 percent. Since 1880, divorces and separations in this country have increased tenfold.
The numbers show that the perception of Americans as an increasingly restless society is false -- the peak of American mobility was between 1850 and 1860 and recently hit an all-time low, Ruggles said.
The data also indicate that while in the 19th century Americans were three times more likely than Britons to move up the economic ladder, in the last 20 years England has been the less hierarchical society.
"What we are really focused on is giving people the tools for understanding the big structural changes in societies over the last 150 years," Ruggles said.
One Population Center employee spends about half the year going around the world cajoling officials in other countries into sharing sometimes crumbling census and survey data.
China, India, Canada and most of Europe and South America have shared information, while Russia and Australia are notable holdouts. Agreement with Nigeria was near when the government changed and negotiations stalled. Norway, Sweden and Finland have declined to share records for confidentiality reasons, Ruggles said.
In Sudan, the dust-covered and sometimes exposed computer tapes of the 1973 census were found stacked in crooked piles in a building with holes in the walls. The tapes were shipped to a specialist in New York, and about half the data was recovered.
The only computer tapes of the 1981 population survey of Bangladesh were spotted with mold. Special machinery was sent to Dhaka, the tapes were cleaned up and the data saved.
In 2007, Ruggles did some detective work himself, traveling to a refrigerated cave in Kansas used by the federal government for document storage to recover 1960 census data covering Chicago. That data had been missing since 1973, when it was overlooked as the government converted old records to new media formats.
Ruggles thawed the microfilm and scanned the data so it is again available to researchers.
Ruggles is passionate about saving whatever population data he can, saying it is integral to understanding how the world has changed.
"We're the only people who care about old data," he said. "The world has just been totally transformed, and you can't ... get a handle on it by just reading a few old diaries. You've got to have the numbers."
The center has a budget of about $11 million a year. Last fall, it received a five-year, $8 million National Science Foundation grant for a new project that will use climate records, environmental information and population records to study the interaction between climate and people.
The changing census
To Ruggles' dismay, some European countries are phasing out and dropping their censuses. He mourns the loss of detail in the U.S. census, calling the shortened main form "a postcard."
There's no such problem with the 1940 U.S. census. People were asked 34 questions, with 5 percent of respondents asked 15 more. That count, covering 132 million Americans, was the first to ask about educational attainment, migration, work status, wage and salary and hours worked and veterans status.
When it is fully digitized, the 1940 census will be the largest database of detailed information about people and their households ever available to researchers. Linked to economic and health surveys and death records, the possibilities for study are nearly endless.
"It's the best census ever," Ruggles said with satisfaction.
Mary Jane Smetanka • 612-673-7380 Twitter: @smetan
© 2014 Star Tribune