27 octobre 2017

What is taxize?

taxize allows users to search over many taxonomic data sources for species names (scientific and common) and download taxonomic hierarchical information.

In brief, taxize is a R package developed to clean and manage taxonomic information.

Install taxize with

install.packages("taxize") #or
devtools::install_github("ropensci/taxize")

Tutorial available at ROpenSci website

Why use taxize in meta-analyses context?

If your meta-analyses involve working on organisms with several collaborators:

Two situations could easily happen…

Situation 1 - Mispelled by collaborators

plants <- c("Poa annua", "Poa anua", " Poa annua")

Situation 2 - Inconsistent taxonomy within papers: species taxonomy changed over the time…

Monterey cypress has been Callitropsis macrocarpa, Cupressus macrocarpa, Cupressus hartwegii, Neocupressus macrocarpa, then back to Callitropsis macrocarpa

Why use taxize?

Taking advantage of the existant taxonomic backbones…

  • Encylopedia of Life
  • Taxonomic Name Resolution Service
  • Integrated Taxonomic Information Service
  • Global Names Resolver
  • IUCN Red List
  • CANADENSYS Vascan name search API
  • etc.

Total: 21 sources implemented

Study case 1: Mispelled names

For this case, we use Global Names Resolver service:

temp <- taxize::gnr_resolve(names = c("Poa annua", "Poa anua", "  Poa annua"),
                            data_source_ids=c(3))
head(temp)
R>>    user_supplied_name submitted_name matched_name data_source_title score
R>>  1          Poa annua      Poa annua Poa annua L.              ITIS 0.988
R>>  2           Poa anua       Poa anua       Poa L.              ITIS 0.750
R>>  3          Poa annua      poa annua Poa annua L.              ITIS 0.988

Fuzzy matches could be done with several taxonomy providers:

taxize::gnr_datasources()

Study case 2: Inconsistent taxonomic information

If we go back, to our Monterey cypress example with a scenario:

  • Find one publication with Cupressus macrocarpa in 1987
  • And another with Callitropsis macrocarpa in 1990

I want to harmonize them with the newest species name for Monterey cypress.

Study case 2: Inconsistent taxonomic information

I'll use ITIS to perform this task.

  • First step: Get the taxonomic serial number (TSN) for each species name.
  • Second step: Check if this these TSN are valid and if not get the valid one with the accepted names.

Study case 2: Inconsistent taxonomic information

  • First step: Get the taxonomic serial number (TSN) for each species names.
mysps <- c("Callitropsis macrocarpa", "Cupressus macrocarpa")
(tsn <- taxize::get_tsn(mysps, accepted = FALSE, verbose=FALSE))
R>>  [1] "822598" "183480"
R>>  attr(,"match")
R>>  [1] "found" "found"
R>>  attr(,"multiple_matches")
R>>  [1] FALSE FALSE
R>>  attr(,"pattern_match")
R>>  [1] FALSE FALSE
R>>  attr(,"uri")
R>>  [1] "http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=822598"
R>>  [2] "http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=183480"
R>>  attr(,"class")
R>>  [1] "tsn"

Study case 2: Inconsistent taxonomic information

  • Second step: Check if this these TSN are valid and if not get the valid one with the accepted names.
library(taxize)
(accept_tsn <- lapply(tsn, itis_acceptname))
R>>  [[1]]
R>>    submittedtsn acceptedname acceptedtsn author
R>>  1       822598           NA      822598     NA
R>>  
R>>  [[2]]
R>>    submittedtsn            acceptedname acceptedtsn               author
R>>  1       183480 Callitropsis macrocarpa      822598 (Hartw.) D.P. Little

Common name

Retrieve the English common name from valid scientific name

sci2comm('Cupressus macrocarpa', db = 'itis')
R>>  
R>>  Retrieving data for taxon 'Cupressus macrocarpa'
R>>  $`Cupressus macrocarpa`
R>>  [1] "ciprés Monterrey" "Monterey cypress"

Classification

Retrieve higher taxonomic information

classification(822598, db = 'itis')
R>>  $`822598`
R>>                        name          rank     id
R>>  1                  Plantae       kingdom 202422
R>>  2            Viridiplantae    subkingdom 954898
R>>  3             Streptophyta  infrakingdom 846494
R>>  4              Embryophyta superdivision 954900
R>>  5             Tracheophyta      division 846496
R>>  6          Spermatophytina   subdivision 846504
R>>  7                Pinopsida         class 500009
R>>  8                  Pinidae      subclass 954916
R>>  9                  Pinales         order 500028
R>>  10            Cupressaceae        family  18042
R>>  11            Callitropsis         genus 822533
R>>  12 Callitropsis macrocarpa       species 822598
R>>  
R>>  attr(,"class")
R>>  [1] "classification"
R>>  attr(,"db")
R>>  [1] "itis"

Limitations

  1. Some species might be absent from databases.
  2. Some taxonomic groups are under-represented (e.g. nonvascular plants, some insects families etc.)

Documentation

Exercise

sp <- c("your_favorite_sp1","your_favorite_sp1","..")
  1. Using the taxize package, clean up this vector using the gnr_resolve() function and ITIS as taxonomic provider.
  2. Find the TSN of your favorite species
  3. Change the provider (data_source_ids argument) in favor to Tropica and see what happen.
  4. Try getting higher taxonomic information for your own species list.