COVID-19 arrived in Cambodia a year ago, on Jan. 23, when a Chinese national fell sick. A PCR test to detect the genetic material of SARS-CoV-2, the coronavirus that causes COVID-19, came back positive. With that news, the disease had officially pierced the borders of another nation.

Dr. Jessica Manning, 40, a researcher with the National Institute of Allergy and Infectious Diseases working in Cambodia, saw an opportunity: helping the country join the global effort to watch for new diseases.

Manning ran samples from the patient through a genetic sequencer, a device that reads the letters that make up an organism's genome; the sequencer was a recent addition to her lab at the Cambodian government's parasitology department in Phnom Penh. "I couldn't wait for the sequences to come off the sequencer," she recalled.

The sequencer uploaded the raw data to an online software package called IDseq, which could piece together the genomes in the sample and compare them to other organisms. The system verified that it held a virus with a genome virtually identical to that of the new coronavirus identified in Wuhan, China.

To identify unknown pathogens, Manning's project — funded by a grant from the Bill and Melinda Gates Foundation — employs an approach called metagenomic sequencing. More traditional techniques of genomic diagnosis, like PCR tests, look for the genetic sequence of a single pathogen. Those tests are accurate, fast and relatively cheap — but they can find only a pathogen you know you are looking for.

Metagenomic sequencing reads all of the genomic material in a sample and identifies all of the organisms present: bacteria, common pathogens, microbes that have never been spotted before. "Metagenomics can show what we don't know we don't know," Manning said, paraphrasing a quote from former Secretary of Defense Donald Rumsfeld.

But identifying unknown unknowns is complicated. Common sequencing machines chop up DNA and RNA molecules into segments, each with dozens to hundreds of genetic building blocks, and read the sequences of blocks in each one. This produces billions of short sequences with no information about how they originally were arranged.

To make sense of all that data, Manning's lab uses IDseq, a free online, open-source software package that reverse-engineers how the segments might fit together to form any number of genomes, and compares these with known genomes.

"It's like a giant jigsaw puzzle," said Joseph DeRisi, a biochemist and the lead developer of IDseq. "Where the edges of the pieces match, you can snap them together and assemble a picture of the genome."

Watching for novel pathogens in Southeast Asia has become an important part of the global effort to understand the pandemic and stop the next one before it happens.

"The Cambodia-based project has really shown the value of metagenomic sequencing," said Dr. Farhad Imam, a genomics expert and a program officer at the Gates Foundation. "You can in effect set up an early detection network for the next outbreak. The faster we find out what it is, the faster we can build the tools to defeat it."