A team of computer scientists has derived accurate, neighborhood-level estimates of the racial, economic and political characteristics of 200 U.S. cities using an unlikely data source — Google Street View images of people’s cars.

Published this week in the Proceedings of the National Academy of Sciences, the report details how the scientists extracted 50 million photographs of street scenes captured by Google’s Street View cars in 2013 and 2014. They then trained a computer algorithm to identify the make, model and year of 22 million automobiles appearing in neighborhoods in those images, parked outside homes or driving down the street.

The vehicles seen in Street View images are often small or blurry, making precise identification a challenge. So the researchers had human experts identify a small subsample of the vehicles and compare those to the results churned out by their algorithm. They found that the algorithm correctly identified whether a vehicle was U.S.- or foreign-made roughly 88 percent of the time, got the manufacturer right 66 percent of the time and nailed the exact model 52 percent of the time.

While far from perfect, the sheer size of the vehicle database means those numbers are still useful for real-world statistical applications, like drawing connections between vehicle preferences and demographic data. The 22 million vehicles in the database comprise roughly 8 percent of all vehicles in the United States. By comparison, the U.S. Census Bureau’s massive American Community Survey reaches only about 1.6 percent of American households each year, while the typical 1,000-person opinion poll includes just 0.0004 of American adults.

To test what this data set could be capable of, the researchers first paired the ZIP code-level vehicle data with numbers on race, income and education from the American Community Survey. They did this for a random 15 percent of the ZIP codes in their data set to create a “training set.” They then created another algorithm to go through the training set to see how vehicle characteristics correlated with neighborhood characteristics: What kinds of vehicles are disproportionately likely to appear in white neighborhoods, or black ones? Low-income vs. high-income? Highly educated areas vs. less-educated ones?

That yielded a number of reliable correlations. The five vehicle types most closely associated with white neighborhoods, for instance, were SUVs, cars made by Jeep and Subaru, expensive cars and cars classified as “wagons.” In black neighborhoods, on the other hand, Cadillacs, Buicks, Mercurys, Chryslers and sedan-type vehicles were more prevalent.

People with graduate degrees were more likely to drive Audi hatchbacks with high city MPG. Those with less than a high school education, on the other hand, were more likely to drive cars made by U.S. manufacturers in the 1990s.

One important thing to note is that these are just correlations. Saying that white people are more likely to drive Subaru wagons isn’t the same as saying all white people drive Subaru wagons, or that all Subaru wagons are driven by white people. But the data set showed that white people were more likely than black or Asian people to drive those cars.

Armed with all these correlations, it was time to put the algorithm to its true test: Could it accurately infer the demographics of the remaining 85 percent of ZIP codes, given only the car data?

Short answer: Yep. “We found a strong correlation between our results and ACS values for every demographic statistic we examined,” the researchers wrote.