Chuck Bednar for redOrbit.com – Your Universe Online
A new computer system developed at the University of Wisconsin-Madison performed as well or better than humans when it comes to the task of extracting data from scientific journals and entering it into a database.
The new mechanical reading and indexing system, known as PaleoDeepDive, was designed by former UW professor of computer sciences Christopher Ré and his colleagues. Its development is detailed in the latest edition of the journal PLOS ONE.
“We demonstrated that the system was no worse than people on all the things we measured, and it was better in some categories,” Ré, who is now at Stanford University, said in a statement. First author Shanan Peters, a professor of geoscience at UW-Madison, added that the computer’s progress “marks a milestone in the quest to rapidly and precisely summarize, collate and index the vast output of scientists around the globe.”
The researchers built on the DeepDive machine reading system at Stanford and the HTCondor open-source distributed job management framework to create PaleoDeepDive. They then arranged a competition between it and human scientists who had manually entered data into the Paleobiology Database, a repository that contains research data from paleontology studies funded by the National Science Foundation (NSF) and international agencies.
According to the university, PaleoDeepDive “mimics the human activities” required to assemble the Paleobiology Database. Peters said that he and his associates “extracted the same data from the same documents and put it into the exact same structure as the human researchers, allowing us to rigorously evaluate the quality of our system, and the humans.”
Much of the knowledge produced by paleontologists is broken up into hundreds of thousands of different publications, although Peters said that many research questions require a “synthetic approach: For example, how many species were on the planet at any given time?” Rather than trying to find the so-called correct meaning, they decided instead to “look at the entire problem of extraction as a probabilistic problem,” according to Ré.
Ré noted that computers can often have difficulties deciphering even the most simple-sounding statements. To illustrate his point, he references a study which contains the terms “Tyrannosaurus rex” and “Alberta, Canada.” In a case like this, does Alberta refer to the location where the fossil was found, or where it was stored? The odds are roughly equal that either case is true, and it gives PaleoDeepDive a major advantage over people.
“Information that was manually entered into the Paleobiology Database by humans cannot be assessed or enhanced without going back to the library and re-examining original documents. Our machine system, on the other hand, can extend and improve results essentially on the fly as new information is added,” Peters said.
He added that the advantages of their system could result from improvement within the computer tools. “As we get more feedback and data, it will do a better job across the board,” Peters explained. “Our machine system, on the other hand, can extend and improve results essentially on the fly as new information is added.”
The machine-reading trial required access to tens of thousands of articles, and despite the potential that the volume of the downloads would create a logjam in document delivery, academic publishing company Elsevier gave the UW-Madison researchers access to 10,000 downloads per week. Thus far, the Paleobiology Database has already generated hundreds of studies about the history of life, according to Peters.
“Ultimately, we hope to have the ability to create a computer system that can do almost immediately what many geologists and paleontologists try to do on a smaller scale over a lifetime: read a bunch of papers, arrange a bunch of facts, and relate them to one another in order to address big questions,” Peters added.
—–
Follow redOrbit on Twitter, Facebook, Instagram and Pinterest.
—–
Shop Amazon – The Hottest Electronics Gifts for 2014
Man Vs Machine: Computerized Scientific Indexing System Outperforms Humans
editor
Comments