第一列:代表分类的顶级类别(top-level category),字母分别代表不同分类名(古菌,细菌,真核生物,病毒和类病毒,未分类,其他)
A = Archaea
B = Bacteria
E = Eukaryota
V = Viruses and Viroids
U = Unclassified
O = Other
同样提供不同格式的压缩包,解压gunzip -c taxdump.tar.gz | tar xf -后包含7个文件:
citations.dmp:与某个物种(taxid表示)的文献信息:
it_id :the unique id of citation
cit_key:citation key
medline_id:unique id in MedLine database (0 if not in MedLine)
pubmed_id:unique id in PubMed database (0 if not in PubMed)
url:URL associated with citation
text :any text (usually article name and authors)
:The following characters are escaped in this text by a backslash:
:newline (appear as “\n”),
:tab character ("\t"),
:double quotes (’"’),
:backslash character ("\").
taxid_list:list of node ids separated by a single space
names.dmp:存储 taxid 对应的物种名信息
tax_id:the id of node associated with this name
name_txt:name itself
unique name:the unique variant of this name if name not unique
name class:(synonym, common name, …)
nodes.dmp:存储 taxid对应的多级节点信息
tax_id:node id in GenBank taxonomy database
parent tax_id:parent node id in GenBank taxonomy database
rank:rank of this node (superkingdom, kingdom, …)
embl code:locus-name prefix; not unique
division id:see division.dmp file
inherited div flag (1 or 0): 1 if node inherits division from parent
genetic code id:see gencode.dmp file
inherited GC flag (1 or 0): if node inherits genetic code from parent
mitochondrial genetic code id: – see gencode.dmp file
inherited MGC flag (1 or 0): – 1 if node inherits mitochondrial gencode
GenBank hidden flag (1 or 0) : – 1 if name is suppressed in GenBank entry
hidden subtree root flag (1 or 0) : – 1 if this subtree has no sequence data yet