摘要:Brian Enquist和他的同事最新汇编了关于美洲植物分布和特征的资料库,他们为此雀跃不已。该数据集包含2250万条记录,当他们发现这个资料库居然包括了611,728个成员,是他们想象的地球上所有的植物种类的两到三倍,恐慌也接踵而来。
显然,这里有很多的错误,这个资料库里面包含了许多的假名字,因此不可能以此来计算某一个特定区域内的物种数或者他们的相对丰度。
Enquist是亚利桑那州图森大学的植物生态学家。本月,他的团队将着手解决这个问题,从而可以帮助到世界各地的植物学家和生态学家。 这个被称作“分类名称解析服务(TNRS)” 的工具可以发现和修正不正确的植物名称,排除科学家在这方面的困扰。
iPlant是一项由美国国家科学基金会资助的植物科学网络基础设施项目,它为TNRS的筹建给予了资金和技术上的支持。
生物探索推荐英文原文:
Species spellchecker fixes plant glitches
Brian Enquist and his collaborators were delighted with their freshly compiled data set of 22.5 million records on the distribution and traits of plants in the Americas. But their delight turned to horror when they realized that the data set contained 611,728 names: getting on for twice as many as there are thought to be plant species on Earth.
Would it smell as sweet by any other name?
Completed in December 2010, the records were intended to help Enquist and his colleagues to discern trends in how forest trees in a wide variety of environments respond to climate change. But the data were clearly full of bogus names, making it impossible to count the species in a particular area, or their relative abundance. "I started to question our ability even to compare something as basic as species diversity at two sites," says Enquist, a plant ecologist at the University of Arizona in Tucson.
This month, Enquist's team will unveil a solution that could help botanists and ecologists worldwide. The Taxonomic Names Resolution Service (TNRS) aims to find and fix the incorrect plant names that plague scientists' records.
"It looks really good," says Gabriela Lopez-Gonzalez, a plant ecologist at the University of Leeds, UK, who curates a database of forest plots. Fixing species lists by hand is arduous, she says. "This should save us a lot of time".
She and others agree that the problem is widespread in botanical databases. "Digitization has made the problem worse," says TNRS co-leader, botanist Brad Boyle, also at the University of Arizona. Boyle explains that as more data are added to digital records, the chance of introducing errors also increases. Even in herbarium specimens, which ought to be the gold standard for plant identification, about 15% of the names are misspelt, he says.
Many of the errors seem to arise because biologists are not as careful as they should be when entering data into digital records. The TNRS team estimates that about one-third of the names entered into online repositories — such as GenBank, the US National Institutes of Health collection of DNA-sequence data, or the Ecological Society of America's VegBank database of plant-plot data — are incorrect.
The other problem is that names change. Old names can be abolished when experts reclassify plants as ideas about evolutionary relationships change, or when they realize the species already had a name — an occurrence almost as old as taxonomy itself. The result is that the same plant can have many names, and not everyone knows which one to use. Such synonyms are a particular problem in the study of medicinal plants, says Alan Paton, a plant taxonomist and bioinformatician at Kew Gardens in London.
The TNRS was built with financial and technical support from iPlant, a project run by the US National Science Foundation to fund cyberinfrastructure for plant science. It corrects names by comparing lists that users feed into it with the 1.2 million names in the Missouri Botanical Garden's Tropicos database, one of the most authoritative botanical databases. If the TNRS cannot find a name in Tropicos, it uses a fuzzy-matching algorithm, similar to a word-processor's spellchecker, to find and correct misspellings. It also hunts through Tropicos's lists of alternative names and supplies the one that is most up to date. When Enquist ran the 611,728 names through the system, just 202,252 came back, showing that two-thirds of them were invalid.
Because Tropicos is less comprehensive for plants outside the Americas, the team hopes to link the TNRS with The Plant List (www.theplantlist.org), a collaborative compilation of databases from Kew and other sources. Launched online in December 2010, it aims to become a global record of plants. The scientists are also working on a tool to correct geographical data — one that knows, for example, that Brazil, Brasil and Brésil are the same place, and can recognize when someone has muddled up longitude and latitude.