首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >自定义BibTeX样式文件以在ACS样式中实现专利

自定义BibTeX样式文件以在ACS样式中实现专利
EN

Code Review用户
提问于 2020-06-08 20:09:23
回答 2查看 190关注 0票数 6

Intro

我一直在努力实现achemso.bst文件的修改版本,该文件来自美国化学学会的achemso套餐,该文件正确地实现了“美国化学学会风格指南”定义的专利。

这是我第一次以一种有意义的方式处理BibTeX风格的文件,尽管我目前对结果很满意--我真的很想让人看看最后的代码,检查一下是否有明显的错误或愚蠢的错误。

主控制功能

一般来说,我希望检讨的更改最好由以下定义显示。

代码语言:javascript
复制
FUNCTION { patent } {
  begin.bibitem
  format.authors
  format.assignees
  format.title
  format.type.number.patent  
  format.date
  format.CAS  
  format.note
  end.bibitem
}

也就是说,BST文件添加了处理专利的定义。为此,我添加了函数format.assignees、format.date、format.type.number.patent和format.CAS。所有其他格式。函数如原始achemso.bst文件中定义的那样。

此外,如果存在url,则修改begin.bibitemend.bibitem以将整个条目包装在\href{}{}包装中。

添加/修改函数

添加/修改函数的All,按照实现中的不确定性顺序排列。

format.CAS函数

该函数在bibtex条目的末尾构造化学文摘引用线。它检查CAS_CANCAS_AN是否都存在。然后,通过从左到右添加适当的元素来构造输出字符串。

代码语言:javascript
复制
FUNCTION { format.CAS } {
  CAS_CAN duplicate$ empty$ not
    { 
      CAS_AN  duplicate$ empty$ not
        {
          "Chem. Abstr. " emph swap$ 
          "AN " "" find.replace 
            #1 #4 substring$
            add.comma bold
            *
          swap$
          "CAN " "" find.replace 
            ":" ",} " find.replace 
            "\emph{" swap$ *
            *
          output
        }
        { pop$ }
      if$
    }
    { pop$ }
  if$
}

修改了begin.bibitemend.bibitem函数

本节修改begin.bibitemend.bibitem定义,以包含\href{}{}包装器。

代码语言:javascript
复制
INTEGERS { href.present.bool }

FUNCTION { begin.bibitem } {
  newline$
  "\bibitem" write$
  label
  calculate.full.names
  duplicate$
  short.names =
    { pop$ }
    { * }
  if$ 
  "[" swap$ * "]" * write$
  "{" cite$ * "}" * write$
  newline$
  url empty$ not
  'href.present.bool := 
  href.present.bool
    {"\href{" url * "}{" * write$}
    { }
  if$
  ""
  next.punct.comma 'next.punct.int :=
}

FUNCTION { end.bibitem } {
  would.add.period
    {
      href.present.bool
        { "}" * }
        { }
      if$
      "\relax" * write$
      newline$
      "\mciteBstWouldAddEndPuncttrue" write$
      newline$
      "\mciteSetBstMidEndSepPunct{\mcitedefaultmidpunct}" write$
      newline$
      "{\mcitedefaultendpunct}{\mcitedefaultseppunct}\relax"
    }
    {
      href.present.bool
        { "}" * }
        { }
      if$
      "\relax" * write$
      newline$
      "\mciteBstWouldAddEndPunctfalse" write$
      newline$
      "\mciteSetBstMidEndSepPunct{\mcitedefaultmidpunct}" write$
      newline$
      "{}{\mcitedefaultseppunct}\relax"
    }
  if$
  write$
  newline$
  "\EndOfBibitem" write$
}

format.assignees函数

该函数将受让人值推送到堆栈,检查是否为空,如果不是,则格式化它并输出。格式设置包括将AND更改为;,并确保该值被包装在父文件中。

代码语言:javascript
复制
FUNCTION { format.assignees } {
  assignee duplicate$ empty$
    { pop$ }
    { " AND" ";" find.replace.upper
      duplicate$
      duplicate$
      #-1 #1 substring$
      ")" =
      #1 #1 substring$
      "(" =
      and
        { paren }
        {  }
      if$
      output
      next.punct.period 'next.punct.int :=    
    }
  if$   
}

format.type.number.patent函数

此函数及其助手函数格式化专利类型和数字,并输出它们。

代码语言:javascript
复制
FUNCTION { format.type.patent } {
  type empty$
    { "Patent" }{ type }
  if$ output
}

FUNCTION { format.type.number.patent } {
   number empty$
    { }
    {
      format.type.patent " " number * * output
      next.punct.comma 'next.punct.int :=
    }
  if$
}

帮助函数

这些是帮助函数,是上述一些函数正确工作所必需的。我认为它们对这个用例来说是足够的,但是任何改进都是值得赞赏的。

代码语言:javascript
复制
% From           : https://tex.stackexchange.com/a/28104
% Originally from: http://ctan.org/pkg/tamethebeast
INTEGERS{ l }
FUNCTION{ string.length }
{ #1 'l :=
    { duplicate$ duplicate$ #1 l substring$ = not }
    { l #1 + 'l := }
  while$
  pop$ l
}

STRINGS{replace find text}
INTEGERS{find_length}
FUNCTION{find.replace}
{ 'replace :=
  'find :=
  'text :=
  find string.length 'find_length :=
  ""
    { text empty$ not }
    { text #1 find_length substring$ find =
        { replace *
          text #1 find_length + global.max$ substring$ 'text :=
        }
        { text #1 #1 substring$ *
          text #2 global.max$ substring$ 'text :=
        }
      if$
    }
  while$
}

% new code
FUNCTION{find.replace.upper}
{ 'replace :=
  'find :=
  'text :=
  find string.length 'find_length :=
  ""
    { text empty$ not }
    { text #1 find_length substring$ "u" change.case$ find =
        { replace *
          text #1 find_length + global.max$ substring$ 'text :=
        }
        { text #1 #1 substring$ *
          text #2 global.max$ substring$ 'text :=
        }
      if$
    }
  while$
}

FUNCTION{find.replace.ignorecase}
{ swap$
  "u" change.case$
  swap$
  find.replace.upper
}

全BibTeX样式定义

以上所有代码都保存在一个BST文件中,如

代码语言:javascript
复制
% Notes: this is an editied version of the ``achemso.bst'' file, used to develop the patent entry type based off of the specification of the ACS Style Guide (3rd ed. pp 310-311)

ENTRY
{
    abstract
    address
    assignee
    author
    booktitle
    CODEN
    CAS_AN
    CAS_CAN
    chapter
    ctrl-article-title
    ctrl-chapter-title
    ctrl-doi
    ctrl-etal-firstonly
    ctrl-etal-number
    ctrl-use-title
    date
    day
    doi
    edition
    editor
    howpublished
    institution
    journal
    key
    maintitle
    month
    note
    number
    organization
    pages
    publisher
    school
    series
    title
    type
    url
    version
    volume
    year
}
{ }
{
    label
    short.names
}

% Generic logic functions, from the core BibTeX styles

FUNCTION { and } {
    { }
    {
    pop$
    #0
    }
if$ 
}

FUNCTION { not } {
    { #0 }
    { #1 }
if$ 
}

FUNCTION { or } {
    {
    pop$
    #1
    }
    { }
if$ 
}

FUNCTION { xor } {
    { not }
    { }
if$ 
}

% Generic functions for basic manipulations: many of these
% come from the core BibTeX styles or 'Tame the BeaST'

FUNCTION { chr.to.value.error } {
#48 +
int.to.chr$
"'" swap$ *
"'" *
" is not a number: treated as zero." *
warning$
#0
}

FUNCTION { chr.to.value } {
chr.to.int$ #48 -
duplicate$
#0 <
    { chr.to.value.error }
    {
    duplicate$
    #9 >
        { chr.to.value.error }
        { }
    if$ 
    }
if$ 
}

STRINGS {
extract.input.str
extract.output.str
}

FUNCTION { is.a.digit } {
duplicate$
"" =
    {
    pop$
    #0
    }
    {
    chr.to.int$
    #48 -
    duplicate$
    #0 < swap$ #9 > or not
}
if$
}

FUNCTION{ is.a.number } {
{
    duplicate$
    #1 #1 substring$
    is.a.digit
}
    { #2 global.max$ substring$ }
while$
"" =
}

FUNCTION { extract.number } {
duplicate$
'extract.input.str :=
"" 'extract.output.str :=
{ extract.input.str empty$ not }
    {
    extract.input.str #1 #1 substring$
    extract.input.str #2 global.max$ substring$ 'extract.input.str :=
    duplicate$
    is.a.number
        { extract.output.str swap$ * 'extract.output.str := }
        {
        pop$
        "" 'extract.input.str :=
        }
    if$
    }
while$
extract.output.str empty$
    { }
    {
    pop$
    extract.output.str
    }
if$
}

FUNCTION { field.or.null } {
duplicate$
empty$
    {
    pop$
    ""
    }
    { }
if$
}

INTEGERS {
multiply.a.int
multiply.b.int
}

FUNCTION { multiply } {
'multiply.a.int :=
'multiply.b.int :=
multiply.b.int #0 <
    {
    #-1
    #0 multiply.b.int - 'multiply.b.int :=
    }
    { #1 }
if$
#0
{ multiply.b.int #0 > } 
    {
    multiply.a.int +
    multiply.b.int #1 - 'multiply.b.int :=
    }
while$
swap$
    { }
    { #0 swap$ - }
if$ 
}

INTEGERS { str.conversion.int }

FUNCTION { str.to.int.aux.ii } {
{
    duplicate$
    empty$ not
}
    {
    swap$
    #10 multiply 'str.conversion.int :=
    duplicate$
    #1 #1 substring$
    chr.to.value
    str.conversion.int +
    swap$
    #2 global.max$ substring$
    }
while$
pop$
}

FUNCTION { str.to.int.aux.i } {
duplicate$
#1 #1 substring$
"-" =
{
    #1 swap$
    #2 global.max$ substring$
    }
    {
    #0 swap$
    }
if$ 
#0
swap$
str.to.int.aux.ii
swap$
    { #0 swap$ - }
    { }
if$
}

FUNCTION { str.to.int } {
duplicate$
empty$
    {
    pop$
    #0
    }
    { str.to.int.aux.i }
if$ 
}

FUNCTION { tie.or.space.connect } {
duplicate$
text.length$ #3 <
    { "~" }
    { " " }
if$
swap$ * *
}

FUNCTION { yes.no.to.bool } {
duplicate$
empty$
    {
    pop$
    #0
    }
    {
    "l" change.case$
    "yes" =
        { #1 }
        { #0 }
    if$ 
    }
if$ 
}

% Functions of formatting

FUNCTION { bold } {
duplicate$
empty$
    {
    pop$
    ""
    }
    { "\textbf{" swap$ * "}" * }
if$
}

FUNCTION { emph } {
duplicate$
empty$
    {
    pop$
    ""
    }
    { "\emph{" swap$ * "}" * }
if$
}


FUNCTION { paren } {
duplicate$
empty$
    {
    pop$
    ""
    }
    { "(" swap$ * ")" * }
if$
}

% Functions for punctuation

FUNCTION { add.comma } { ", " * }
FUNCTION { add.colon } {  ": " * }
FUNCTION { add.period } { add.period$ " " * }
FUNCTION { add.semicolon } { "; " * }
FUNCTION { add.space } { " " * }

% Bibliography strings: fixed values collected into functions

FUNCTION { bbl.and }     { "and" }
FUNCTION { bbl.chapter } { "Chapter" }
FUNCTION { bbl.doi }     { "DOI:" }
FUNCTION { bbl.editor }  { "Ed." }
FUNCTION { bbl.editors } { "Eds." }
FUNCTION { bbl.edition } { "ed." }
FUNCTION { bbl.etal }    { "\latin{et~al.}" }
FUNCTION { bbl.in }      { "In" }
FUNCTION { bbl.inpress } { "in press" }
FUNCTION { bbl.msc }     { "M.Sc.\ thesis" }
FUNCTION { bbl.page }    { "p" }
FUNCTION { bbl.pages }   { "pp" }
FUNCTION { bbl.phd }     { "Ph.D.\ thesis" }
FUNCTION { bbl.version } { "version" }
FUNCTION { bbl.volume }  { "Vol." }

% Functions for number formatting

STRINGS { pages.str }

FUNCTION { hyphen.to.dash } {
'pages.str :=
""
{ pages.str empty$ not }
    {
    pages.str #1 #1 substring$
    "-" =
        {
        "--" *
        {
            pages.str #1 #1 substring$
            "-" =
        }
            { pages.str #2 global.max$ substring$ 'pages.str := }
        while$
        }
        {
        pages.str #1 #1 substring$
        *
        pages.str #2 global.max$ substring$ 'pages.str :=
        }
    if$
    }
while$
}

INTEGERS { multiresult.bool }

FUNCTION { multi.page.check } {
'pages.str :=
#0 'multiresult.bool :=
{
    multiresult.bool not
    pages.str empty$ not
    and
}
    {
    pages.str #1 #1 substring$
    duplicate$
    "-" = swap$ duplicate$
    "," = swap$
    "+" =
    or or
        { #1 'multiresult.bool := }
        { pages.str #2 global.max$ substring$ 'pages.str := }
    if$
    }
while$
multiresult.bool
}

% Functions for calculating the label data needed by natbib


INTEGERS {
current.name.int
names.separate.comma
names.separate.semicolon
names.separate.comma.bool
remaining.names.int
total.names.int
}

STRINGS {
current.name.str
names.str
}

FUNCTION { full.format.names } {
'names.str :=
#1 'current.name.int :=
names.str num.names$ 'remaining.names.int :=
{ remaining.names.int #0 > }
    {
    names.str current.name.int "{vv~}{ll}" format.name$
    current.name.int #1 >
        {
        swap$ add.comma swap$
        remaining.names.int #1 >
            { }
            {
            duplicate$
            "others" =
                { bbl.etal }
                { bbl.and }
            if$
            add.space swap$ *
            }
        if$
        *
        }
        { }
    if$   
    remaining.names.int #1 - 'remaining.names.int :=
    current.name.int #1 + 'current.name.int :=
    }
while$
}

FUNCTION { full.author } {
author empty$
    { "" }
    { author full.format.names }
if$ 
}

FUNCTION { full.author.editor } {
author empty$
    {
    editor empty$
        { "" }
        { editor full.format.names }
    if$
    }
    { author full.format.names }
if$ 
}

FUNCTION { full.editor } {
editor empty$
    { "" }
    { editor full.format.names }
if$ 
}

FUNCTION { short.format.names } {
'names.str :=
names.str #1 "{vv~}{ll}" format.name$
names.str num.names$
duplicate$
#2 >
    {
    pop$
    add.space bbl.etal *
    }
    {
    #2 <
        { }
        {
        names.str #2 "{ff }{vv }{ll}{ jj}" format.name$
        "others" =
            { add.space bbl.etal * }
            {
            add.space
            bbl.and add.space
            *
            names.str #2 "{vv~}{ll}" format.name$
            *
            }
        if$
        }
    if$ 
    }
if$ 
}

FUNCTION { short.author.key } {
author empty$
    {
    key empty$
        { cite$ #1 #3 substring$ }
        { key }
    if$ 
    }
    { author short.format.names }
if$ 
}

FUNCTION { short.author.editor.key } {
author empty$
    {
    editor empty$
        {
        key empty$
            { cite$ #1 #3 substring$ }
            { key }
        if$ 
        }
        { editor short.format.names }
    if$
    }
    { author short.format.names }
if$
}

FUNCTION { short.author.key.organization } {
author empty$
    {
    key empty$
        {
        organization empty$
            { cite$ #1 #3 substring$ }
            {
            organization #1 #4 substring$
            "The " =
                { organization }
                { organization #5 global.max$ substring$ }
            if$
            #3 text.prefix$
            }
        if$ 
        }
        { key }
    if$ 
    }
    { author short.format.names }
if$ 
}

FUNCTION { short.editor.key.organization } {
editor empty$
    {
    key empty$
        {
        organization empty$
            { cite$ #1 #3 substring$ }
            {
            organization #1 #4 substring$
            "The " =
                { organization }
                { organization #5 global.max$ substring$ }
            if$
            #3 text.prefix$
            }
        if$ 
        }
        { key }
    if$ 
    }
    { editor short.format.names }
if$ 
}

FUNCTION { calculate.full.names } {
type$ "book" =
type$ "inbook" =
or
    { full.author.editor }
    {
    type$ "proceedings" =
        { full.editor }
        { full.author }
    if$ 
    }
if$
}

FUNCTION { calculate.short.names } {
type$ "book" =
type$ "inbook" =
or
    { short.author.editor.key }
    {
    type$ "proceedings" =
        { short.editor.key.organization }
        {
        type$ "manual" =
            { short.author.key.organization }
            { short.author.key }
        if$ 
        }
    if$ 
    }
if$
'short.names :=
}

FUNCTION { calculate.names } {
calculate.short.names
short.names
year empty$
    { "()" }
    { "(" year * ")" * }
if$
*
'label :=
}

% Counting up the number of entries

INTEGERS { entries.int }

FUNCTION { initialize.count.entries } {
#0 'entries.int  :=
}

FUNCTION { count.entries } {
entries.int #1 + 'entries.int  :=
}

% Start and end of bibliography functions

FUNCTION { begin.bib } {
"achemso 2019-02-14 v3.12a" top$
preamble$ empty$
    { }
    {
    preamble$ write$
    newline$
    }
if$
"\providecommand{\latin}[1]{#1}" write$
newline$
"\makeatletter" write$
newline$
"\providecommand{\doi}" write$
newline$
"  {\begingroup\let\do\@makeother\dospecials" write$
newline$
"  \catcode`\{=1 \catcode`\}=2 \doi@aux}" write$
newline$
"\providecommand{\doi@aux}[1]{\endgroup\texttt{#1}}" write$
newline$
"\makeatother" write$
newline$
"\providecommand*\mcitethebibliography{\thebibliography}" write$
newline$
"\csname @ifundefined\endcsname{endmcitethebibliography}" write$
"  {\let\endmcitethebibliography\endthebibliography}{}" write$
newline$
"\begin{mcitethebibliography}{"
    entries.int int.to.str$  * "}" * write$
newline$
"\providecommand*\natexlab[1]{#1}" write$
newline$
"\providecommand*\mciteSetBstSublistMode[1]{}" write$
newline$
"\providecommand*\mciteSetBstMaxWidthForm[2]{}" write$
newline$
"\providecommand*\mciteBstWouldAddEndPuncttrue" write$
newline$
"  {\def\EndOfBibitem{\unskip.}}" write$
newline$
"\providecommand*\mciteBstWouldAddEndPunctfalse" write$
newline$
"  {\let\EndOfBibitem\relax}" write$
newline$
"\providecommand*\mciteSetBstMidEndSepPunct[3]{}" write$
newline$
"\providecommand*\mciteSetBstSublistLabelBeginEnd[3]{}" write$
newline$
"\providecommand*\EndOfBibitem{}" write$
newline$
"\mciteSetBstSublistMode{f}" write$
newline$
"\mciteSetBstMaxWidthForm{subitem}{(\alph{mcitesubitemcount})}" write$
newline$
"\mciteSetBstSublistLabelBeginEnd" write$
newline$
"  {\mcitemaxwidthsubitemform\space}" write$
newline$
"  {\relax}" write$
newline$
"  {\relax}" write$
newline$    
}

FUNCTION { end.bib } {
newline$
"\end{mcitethebibliography}" write$
newline$
}

% Functions used for the special "control" entry type, to pass data
% from LaTeX to BibTeX

INTEGERS {
ctrl.article.title.bool
ctrl.chapter.title.bool
ctrl.doi.bool
ctrl.etal.firstonly.bool
ctrl.etal.number.int
}

FUNCTION { initialize.control.values } {
#1 'ctrl.article.title.bool :=
#0 'ctrl.chapter.title.bool :=
#0 'ctrl.doi.bool :=
#1 'ctrl.etal.firstonly.bool :=
#15 'ctrl.etal.number.int :=
}

FUNCTION { control } {
ctrl-article-title  yes.no.to.bool 'ctrl.article.title.bool  :=
ctrl-chapter-title  yes.no.to.bool 'ctrl.chapter.title.bool  :=
ctrl-doi            yes.no.to.bool 'ctrl.doi.bool            :=
ctrl-etal-firstonly yes.no.to.bool 'ctrl.etal.firstonly.bool :=
ctrl-etal-number    str.to.int     'ctrl.etal.number.int     :=
ctrl-use-title empty$
    'skip$
    { ctrl-use-title yes.no.to.bool 'ctrl.article.title.bool  := }
if$
}

% Functions of each bibitem: tracking punctuation and transferring
% items to the .bbl file

INTEGERS {
next.punct.comma
next.punct.period
next.punct.semicolon
next.punct.space
}

FUNCTION { initialize.tracker } {
#0 'next.punct.comma     :=
#1 'next.punct.period    :=
#2 'next.punct.semicolon :=
#3 'next.punct.space     :=
}

INTEGERS { next.punct.int }

FUNCTION { output } {
swap$
duplicate$ empty$
    { pop$ }
    {
    next.punct.int next.punct.space =
        { add.space }
        {
        next.punct.int next.punct.comma =
            { add.comma }
            {
            next.punct.int next.punct.semicolon =
                { add.semicolon }
                { add.period }
            if$ 
            }
        if$ 
        }
    if$
    write$
    }
if$  
next.punct.comma 'next.punct.int :=
}


%
INTEGERS { href.present.bool }

% Functions for each bibliography entry: start and finish
FUNCTION { begin.bibitem } {
    newline$
    "\bibitem" write$
    label
    calculate.full.names
    duplicate$
    short.names =
    { pop$ }
    { * }
    if$ 
    "[" swap$ * "]" * write$
    "{" cite$ * "}" * write$
    newline$
    url empty$ not
    'href.present.bool := 
    href.present.bool
    {"\href{" url * "}{" * write$}
    { }
    if$
    ""
    next.punct.comma 'next.punct.int :=
}

INTEGERS { add.period.length.int }

FUNCTION { would.add.period } {
duplicate$
add.period$
text.length$ 'add.period.length.int :=
duplicate$
text.length$
add.period.length.int =
    { #0 }
    { #1 }
if$
}

FUNCTION { end.bibitem } {
would.add.period
    {
    href.present.bool
        { "}" * }
        { }
    if$
    "\relax" * write$
    newline$
    "\mciteBstWouldAddEndPuncttrue" write$
    newline$
    "\mciteSetBstMidEndSepPunct{\mcitedefaultmidpunct}" write$
    newline$
    "{\mcitedefaultendpunct}{\mcitedefaultseppunct}\relax"
    }
    {
    href.present.bool
    { "}" * }
    { }
    if$
    "\relax" * write$
    newline$
    "\mciteBstWouldAddEndPunctfalse" write$
    newline$
    "\mciteSetBstMidEndSepPunct{\mcitedefaultmidpunct}" write$
    newline$
    "{}{\mcitedefaultseppunct}\relax"
    }
if$
write$
newline$
"\EndOfBibitem" write$
}
%

% Formatting names: authors and editors are not quite the same,
% and there is the question of how to handle 'et al.'

FUNCTION { initialize.name.separator } {
#1 'names.separate.comma     :=
#0 'names.separate.semicolon :=
}

FUNCTION { format.names.loop } {
{ remaining.names.int #0 > }
    {
    names.str current.name.int "{vv~}{ll,}{~f.}{,~jj}" format.name$
    duplicate$
    'current.name.str :=
    current.name.int #1 >
        {
        duplicate$
        "others," =
            {
            pop$
            *
            bbl.etal
            add.space
            remaining.names.int #1 - 'remaining.names.int :=
            }
            { 
            swap$
            names.separate.comma.bool
                { add.comma }
                { add.semicolon } 
            if$
%<*bio>
%              remaining.names.int #1 >
%                { }
%                { bbl.and add.space * }
%              if$ 
%s
            swap$
            *
            }
        if$  
        }
        { }
    if$   
    remaining.names.int #1 - 'remaining.names.int :=
    current.name.int #1 + 'current.name.int :=
    }
while$
}

FUNCTION { format.names.all } {
total.names.int 'remaining.names.int :=
format.names.loop
}

FUNCTION { format.names.etal } {
ctrl.etal.firstonly.bool
    { #1 'remaining.names.int := }
    { ctrl.etal.number.int 'remaining.names.int := }
if$ 
format.names.loop
current.name.str "others," =
    { }
    {
    add.space
    bbl.etal
    add.space
    *
    }
if$ 
}

FUNCTION { format.names } {
'names.separate.comma.bool :=
'names.str :=
#1 'current.name.int :=
names.str num.names$ 'total.names.int :=
total.names.int ctrl.etal.number.int >
    {
    ctrl.etal.number.int #0 =
        { format.names.all }
        { format.names.etal }
    if$ 
    }
    { format.names.all }
if$ 
}

%

% From           : https://tex.stackexchange.com/a/28104
% Originally from: http://ctan.org/pkg/tamethebeast
INTEGERS{ l }
FUNCTION{ string.length }
{
#1 'l :=
{duplicate$ duplicate$ #1 l substring$ = not}
    {l #1 + 'l :=}
while$
pop$ l
}

STRINGS{replace find text}
INTEGERS{find_length}
FUNCTION{find.replace}
{ 'replace :=
'find :=
'text :=
find string.length 'find_length :=
""
    { text empty$ not }
    { text #1 find_length substring$ find =
        {
        replace *
        text #1 find_length + global.max$ substring$ 'text :=
        }
        { text #1 #1 substring$ *
        text #2 global.max$ substring$ 'text :=
        }
    if$
    }
while$
}
% new code
FUNCTION{find.replace.upper}
{ 'replace :=
'find :=
'text :=
find string.length 'find_length :=
""
    { text empty$ not }
    { text #1 find_length substring$ "u" change.case$ find =
        {
        replace *
        text #1 find_length + global.max$ substring$ 'text :=
        }
        { text #1 #1 substring$ *
        text #2 global.max$ substring$ 'text :=
        }
    if$
    }
while$
}
FUNCTION{find.replace.ignorecase}
{ swap$
"u" change.case$
swap$
find.replace.upper
}
% if assignee is not empty then check replace and w/ `;`. then check if it is parened. if not paren
FUNCTION { format.assignees } {
    assignee duplicate$ empty$
    { pop$ }
    { " AND" ";" find.replace.upper
    duplicate$
    duplicate$
    #-1 #1 substring$
    ")" =
    #1 #1 substring$
    "(" =
    and
        { paren }
        {  }
    if$
    output
    next.punct.period 'next.punct.int :=    
    }
if$   
}

FUNCTION { format.CAS } {
CAS_CAN duplicate$ empty$ not
    { 
    CAS_AN  duplicate$ empty$ not
        {
            "Chem. Abstr. " emph swap$ 
            "AN " "" find.replace 
                #1 #4 substring$
                add.comma bold
                *
            swap$
            "CAN " "" find.replace 
                ":" ",} " find.replace 
                "\emph{" swap$ *
                *
            output
        }
        { pop$ }
    if$
    }
    { pop$ }
if$
}

%

% Converting editions into their fixed representations

FUNCTION { bbl.first }  { "1st" }
FUNCTION { bbl.second } { "2nd" }
FUNCTION { bbl.third }  { "3rd" }
FUNCTION { bbl.fourth } { "4th" }
FUNCTION { bbl.fifth }  { "5th" }
FUNCTION { bbl.st }     { "st" }
FUNCTION { bbl.nd }     { "nd" }
FUNCTION { bbl.rd }     { "rd" }
FUNCTION { bbl.th }     { "th" }

STRINGS {
ord.input.str
ord.output.str
}

FUNCTION { make.ordinal } {
duplicate$
"1" swap$
*
#-2 #1 substring$
"1" =
    {
    bbl.th *
    }
    {
    duplicate$
    #-1 #1 substring$
    duplicate$
    "1" =
        {
        pop$
        bbl.st *
        }
        {
        duplicate$
        "2" =
            {
            pop$
            bbl.nd *
            }
            {
            "3" =
                { bbl.rd * }
                { bbl.th * }
            if$ 
            }
        if$
        }
    if$
    }
if$ 
}

FUNCTION { convert.to.ordinal } {
extract.number
"l" change.case$ 'ord.input.str :=
ord.input.str "first" = ord.input.str "1" = or
    { bbl.first 'ord.output.str := }
    {
    ord.input.str "second" = ord.input.str "2" = or
        { bbl.second 'ord.output.str := }
        {
        ord.input.str "third" = ord.input.str "3" = or
            { bbl.third 'ord.output.str := }
            {
            ord.input.str "fourth" = ord.input.str "4" = or
                { bbl.fourth 'ord.output.str := }
                {
                ord.input.str "fifth" = ord.input.str "5" = or
                    { bbl.fifth 'ord.output.str := }
                    {
                    ord.input.str #1 #1 substring$
                    is.a.number
                        { ord.input.str make.ordinal }
                        { ord.input.str }
                    if$
                    'ord.output.str :=
                    }
                if$
                }
            if$ 
            }
        if$ 
        }
    if$ 
    }
if$
ord.output.str
}

% Functions for each type of entry

FUNCTION { format.address } {
address empty$
    { }
    {
    address
    output
    }
if$
}

FUNCTION { format.authors } {
author empty$
    { }
    {
%<*bio>
%      author names.separate.comma format.names
%
%<*!bio>
    author names.separate.semicolon format.names
%
    output
    next.punct.space 'next.punct.int :=
    }
if$   
}

FUNCTION { format.editors } {
editor empty$
    { }
    {
    editor names.separate.comma format.names
    add.comma
    editor num.names$ #1 >
        { bbl.editors }
        { bbl.editor }
    if$ 
    *
    output
    next.punct.semicolon 'next.punct.int :=
    } 
if$ 
}

FUNCTION { format.authors.or.editors } {
author empty$
    { format.editors }
    { format.authors }
if$ 
next.punct.space 'next.punct.int :=
}

FUNCTION { format.chapter } {
chapter empty$
    { }
    {
    bbl.chapter add.space
    chapter
    *
    output
    }
if$
}

FUNCTION { format.doi } {
doi empty$
    'skip$
    {
    bbl.doi add.space
    "\doi{" * doi * "}" *
    output
    }
if$
}

FUNCTION { format.edition } {
edition empty$
    { }
    {
    edition convert.to.ordinal
    add.space bbl.edition *
    output
    }
if$ 
next.punct.semicolon 'next.punct.int :=
}

FUNCTION { format.group.address } {
duplicate$
empty$
    { pop$ }
    {
    address empty$
        { }
        {
        add.colon
        address
        *
        }
    if$ 
    output
    }
if$ 
}

FUNCTION { format.howpublished } {
howpublished empty$
    { }
    {
    howpublished
    output
    }
if$
}

FUNCTION { format.journal } {
journal emph
output
next.punct.space 'next.punct.int :=
}

FUNCTION { format.journal.unpub } { journal emph output }
FUNCTION { format.note } { note empty$ { }{ note output } if$ }

FUNCTION { format.number.series } {
series empty$
    { }
    {
    series
    number empty$
        { }
        {
        add.space
        number *
        }
    if$ 
    output
    next.punct.semicolon 'next.punct.int :=
    }
if$ 
}

FUNCTION { format.organization } {
organization empty$
    { }
    {
    organization paren
    output
    next.punct.period 'next.punct.int :=
    }
if$ 
}

FUNCTION { format.organization.address } { organization format.group.address }

FUNCTION { format.pages } {
pages empty$
    { }
    {
    pages multi.page.check
        {
        bbl.pages
        pages hyphen.to.dash
        }
        { bbl.page pages }
    if$
    tie.or.space.connect
    output
    }
if$
ctrl.doi.bool
    { format.doi }
    'skip$
if$
}

FUNCTION { format.pages.article } {
pages empty$
    { }
    {
    pages hyphen.to.dash
    output
    }
if$
ctrl.doi.bool
    { format.doi }
    'skip$
if$
}

FUNCTION { format.publisher.address } {
publisher format.group.address
}

FUNCTION { format.school.address } { 
school
duplicate$
empty$
    { pop$ }
    {
    address empty$
        { }
        {
        add.comma
        address
        *
        }
    if$ 
    output
    }
if$
}

FUNCTION { format.title } {
title empty$
    { }
    {
    title
    output
    next.punct.period 'next.punct.int :=
    }
if$
}

FUNCTION { format.title.article } {
ctrl.article.title.bool
    {
    title empty$
        { }
        {
        title
        output
        next.punct.period 'next.punct.int :=
        }
    if$ 
    }
    { }
if$
}

FUNCTION { format.title.techreport } {
title empty$
    { }
    {
    title emph
    output
    next.punct.semicolon 'next.punct.int :=
    }
if$
}

FUNCTION { format.title.booktitle } {
title empty$
    { }
    {
    title
    output
    next.punct.period 'next.punct.int :=
    }
if$ 
booktitle empty$
    { }
    {
    booktitle
    output
    next.punct.period 'next.punct.int :=
    }
if$   
}

STRINGS {
book.title
chapter.title
}

FUNCTION { format.title.booktitle.book } {
"" 'chapter.title :=
booktitle empty$
    {
    "" 'chapter.title :=
    title 'book.title :=
    }
    {
    ctrl.chapter.title.bool
        {
        title empty$
            'skip$
            { title 'chapter.title := }
        if$
        }
        'skip$
    if$
    maintitle empty$
        { booktitle 'book.title := }
        { maintitle add.period booktitle * 'book.title := }
    if$ 
    }
if$
chapter.title empty$
    { }
    {
    chapter.title
    output
    next.punct.period 'next.punct.int :=
    }
if$ 
book.title emph
chapter.title empty$
    {
    author empty$
        { }
        {
        editor empty$
            { }
            { bbl.in add.space swap$ * }
        if$  
        }
    if$ 
    }
    { bbl.in add.space swap$ * }
if$ 
output
}

FUNCTION { format.type } {
type empty$
    { }
    {
    pop$
    type
    }
if$
output
}

FUNCTION { format.type.number } {
type empty$
    { }
    {
    number empty$
        { }
        { 
        type 
        number tie.or.space.connect 
        *
        output        
        }
    if$ 
    }
if$
}
FUNCTION { format.type.patent } {
type empty$
    { "Patent" }{ type }
if$ output
}

FUNCTION { format.type.number.patent } {
number empty$
    { }{
    format.type.patent " " number * * output
    next.punct.comma 'next.punct.int :=
    }
if$
}

FUNCTION { format.url } {
url empty$
    { }
    {
    "\url{" url * "}" *
    output
    }
if$ 
}

FUNCTION { format.year } {
year empty$
    { }
    {
    year
    output
    next.punct.semicolon 'next.punct.int :=
    }
if$ 
}

STRINGS  { sYear sMonth sDay d}
INTEGERS { iYear iMonth iDay } 

FUNCTION { format.date } {

"" 'sYear  := 
"" 'sMonth := 
"" 'sDay   := 

date empty$ not 
    { % if the date is note empty, then 
    
    % split the date into year month and day
    date #1 #4 substring$ 'sYear  :=
    date #6 #2 substring$ 'sMonth :=
    date #9 #2 substring$ 'sDay   := 

    % remove leading zero from sMonth and sDay iff leading zeros exist
    sMonth #1 #1 substring$ "0" = { sMonth #2 #1 substring$ 'sMonth := }{ } if$
    sDay   #1 #1 substring$ "0" = { sDay   #2 #1 substring$ 'sDay   := }{ } if$

    }{ % else, gather values from appropriate fields

    year  empty$ not {year }{""}if$ 'sYear  := 
    month empty$ not {month}{""}if$ 'sMonth := 
    day   empty$ not {day  }{""}if$ 'sDay   := 
    
    } if$    

    % Convert sMonth from number string to abbreviated month
    sMonth "1" = sMonth "01" = or { "Jan"   'sMonth := }{ } if$
    sMonth "2" = sMonth "02" = or { "Feb"   'sMonth := }{ } if$
    sMonth "3" = sMonth "03" = or { "March" 'sMonth := }{ } if$
    sMonth "4" = sMonth "04" = or { "April" 'sMonth := }{ } if$
    sMonth "5" = sMonth "05" = or { "May"   'sMonth := }{ } if$
    sMonth "6" = sMonth "06" = or { "June"  'sMonth := }{ } if$
    sMonth "7" = sMonth "07" = or { "July"  'sMonth := }{ } if$
    sMonth "8" = sMonth "08" = or { "Aug"   'sMonth := }{ } if$
    sMonth "9" = sMonth "09" = or { "Sept"  'sMonth := }{ } if$
    sMonth "10" =                 { "Oct"   'sMonth := }{ } if$
    sMonth "11" =                 { "Nov"   'sMonth := }{ } if$
    sMonth "12" =                 { "Dec"   'sMonth := }{ } if$


    sYear empty$ not {
    sMonth empty$ not {
        sMonth sDay empty$ not {" " sDay *}{ "" }if$ ", " * *
        }{ "" }if$
    sYear * output
    }{}if$ 

    next.punct.semicolon 'next.punct.int :=
}

FUNCTION { format.year.article } {
year empty$
    { }
    {
%<*bio>
%      year paren
%      output
%      next.punct.space 'next.punct.int :=
%
%<*!bio>
    year bold
    output
%
    }
if$
}

FUNCTION { format.version } {
version empty$
    { }
    {
    bbl.version add.space
    version
    *
    output
    } 
if$ 
}

FUNCTION { format.volume.article } {
volume emph
output
}

FUNCTION { format.volume } {
volume empty$
    { }
    {
    bbl.volume
    volume
    tie.or.space.connect
    output
    next.punct.semicolon 'next.punct.int :=
    }
if$   
}

% The functions to deal with each entry type

FUNCTION { article } {
begin.bibitem
format.authors
%<*bio>
%  format.year.article
%
format.title.article
format.journal
%<*!bio>
format.year.article
%
format.volume.article
format.pages.article
format.note
end.bibitem
}

FUNCTION { book } {
begin.bibitem
format.authors.or.editors
format.title.booktitle.book
format.edition
author empty$
    { }
    { format.editors }
if$
format.number.series
format.publisher.address
format.year
format.volume
format.chapter
format.pages
format.note
end.bibitem
}

FUNCTION { inbook } { book }

FUNCTION { booklet } {
begin.bibitem
format.authors
format.title
format.howpublished
format.address
format.year
format.note
end.bibitem
}

FUNCTION { collection } { book }

FUNCTION { incollection } { book }

FUNCTION { inpress } {
begin.bibitem
format.authors
%<*bio>
%  format.year.article
%
format.journal.unpub
doi empty$
    {
    bbl.inpress
    output
    }
    {
%<*!bio>
    format.year.article
%
    next.punct.comma 'next.punct.int :=
    format.doi
    }
if$ 
format.note
end.bibitem
}

FUNCTION { inproceedings } {
begin.bibitem
format.authors
format.title.booktitle
format.address
format.year
format.pages
format.note
end.bibitem
}

FUNCTION { manual } {
begin.bibitem
format.authors
format.title
format.version
format.organization.address
format.year
format.note
end.bibitem
}

FUNCTION { mastersthesis } {
begin.bibitem
format.authors
format.title
bbl.msc format.type
format.school.address
format.year
format.note
end.bibitem
}

FUNCTION { misc } {
begin.bibitem
format.authors
format.title
format.howpublished
format.year
format.url
format.note
end.bibitem
}

FUNCTION { patent } {
begin.bibitem
format.authors
format.assignees
format.title
format.type.number.patent  
format.date
format.CAS  
format.note
end.bibitem
}

FUNCTION { phdthesis } {
begin.bibitem
format.authors
format.title
bbl.phd format.type
format.school.address
format.year
format.note
end.bibitem
}

FUNCTION { proceeding } {
begin.bibitem
format.title
format.address
format.year
format.pages
format.note
end.bibitem
}

FUNCTION { techreport } {
begin.bibitem
format.authors.or.editors
format.title.techreport
format.type.number
format.organization.address
format.year
format.volume
format.pages
format.note
end.bibitem
}

FUNCTION { unpublished } {
begin.bibitem
format.authors
format.journal.unpub
doi empty$
    { format.howpublished }
    {
    format.year
    next.punct.comma 'next.punct.int :=
    format.doi
    }
if$ 
format.note
end.bibitem
}

FUNCTION { default.type } { misc }

% Macros containing pre-defined short cuts

% patent macros per the biblatex documentaion
% See page 283 of http://mirrors.ibiblio.org/CTAN/macros/latex/contrib/biblatex/doc/biblatex.pdf
MACRO { patent   } { "Patent" }
MACRO { patentau } { "Au. Patent" }
MACRO { patentuk } { "Br. Patent" }
MACRO { patenteu } { "Eur. Patent" }
MACRO { patentfr } { "Fr. Patent" }
MACRO { patentde } { "Ger. Offen." }    % seen in acs reference - maybe short for "offensichtlich"; a possible translation of patent
MACRO { patentus } { "U.S. Patent" }

% patent requests
MACRO { patapp   } { "Patent Appl." }
MACRO { patappau } { "Au. Pat. Appl." }
MACRO { patappuk } { "Br. Pat. Appl." }
MACRO { patappeu } { "Eur. Pat. Appl." }
MACRO { patappfr } { "Fr. Pat. Appl." }
MACRO { patappde } { "Ger. Pat. Appl." }
MACRO { patappjp } { "kōkai tokkyo kōhō" }
MACRO { patappus } { "U.S. Pat. Appl." }

% alias using biblatex style abbreviations
MACRO { patreq   } { "Patent Appl." }
MACRO { patreqde } { "Ger. Pat. Appl." }
MACRO { patreqeu } { "Eur. Pat. Appl." }
MACRO { patreqfr } { "Fr. Pat. Appl." }
MACRO { patrequk } { "Br. Pat. Appl." }
MACRO { patrequs } { "U.S. Pat. Appl." }

% Corrected in accordance with ACS Style guide, 3rd ed, pp. 161
MACRO { jan } { "Jan" }
MACRO { feb } { "Feb" }
MACRO { mar } { "March" }
MACRO { apr } { "April" }
MACRO { may } { "May" }
MACRO { jun } { "June" }
MACRO { jul } { "July" }
MACRO { aug } { "Aug" }
MACRO { sep } { "Sept" }
MACRO { oct } { "Oct" }
MACRO { nov } { "Nov" }
MACRO { dec } { "Dec" }

MACRO { acbcct } { "ACS Chem.\ Biol." }
MACRO { achre4 } { "Acc.\ Chem.\ Res." }
MACRO { acncdm } { "ACS Chem.\ Neurosci." }
MACRO { ancac3 } { "ACS Nano" }
MACRO { ancham } { "Anal.\ Chem." }
MACRO { bichaw } { "Biochemistry" }
MACRO { bcches } { "Bioconjugate Chem." }
MACRO { bomaf6 } { "Biomacromolecules" }
MACRO { bipret } { "Biotechnol.\ Prog." }
MACRO { crtoec } { "Chem.\ Res.\ Toxicol." }
MACRO { chreay } { "Chem.\ Rev." }
MACRO { cmatex } { "Chem.\ Mater." }
MACRO { cgdefu } { "Cryst.\ Growth Des." }
MACRO { enfuem } { "Energy Fuels" }
MACRO { esthag } { "Environ.\ Sci.\ Technol." }
MACRO { iechad } { "Ind.\ Eng.\ Chem.\ Res." }
MACRO { inoraj } { "Inorg.\ Chem." }
MACRO { jafcau } { "J.~Agric.\ Food Chem." }
MACRO { jceaax } { "J.~Chem.\ Eng.\ Data" }
MACRO { jceda8 } { "J.~Chem.\ Ed." }
MACRO { jcisd8 } { "J.~Chem.\ Inf.\ Model." }
MACRO { jctcce } { "J.~Chem.\ Theory Comput." }
MACRO { jcchff } { "J. Comb. Chem." }
MACRO { jmcmar } { "J. Med. Chem." }
MACRO { jnprdf } { "J. Nat. Prod." }
MACRO { joceah } { "J.~Org.\ Chem." }
MACRO { jpcafh } { "J.~Phys.\ Chem.~A" }
MACRO { jpcbfk } { "J.~Phys.\ Chem.~B" }
MACRO { jpccck } { "J.~Phys.\ Chem.~C" }
MACRO { jpclcd } { "J.~Phys.\ Chem.\ Lett." }
MACRO { jprobs } { "J.~Proteome Res." }
MACRO { jacsat } { "J.~Am.\ Chem.\ Soc." }
MACRO { langd5 } { "Langmuir" }
MACRO { mamobx } { "Macromolecules" }
MACRO { mpohbp } { "Mol.\ Pharm." }
MACRO { nalefd } { "Nano Lett." }
MACRO { orlef7 } { "Org.\ Lett." }
MACRO { oprdfk } { "Org.\ Proc.\ Res.\ Dev." }
MACRO { orgnd7 } { "Organometallics" }

% Construction of bibliography: reading of data, construction of
% labels, output of formatted bibliography

READ

EXECUTE { initialize.control.values }
EXECUTE { initialize.count.entries }
EXECUTE { initialize.name.separator }
EXECUTE { initialize.tracker }

ITERATE { calculate.names }
ITERATE { count.entries }

EXECUTE { begin.bib }

ITERATE { call.type$ }

EXECUTE { end.bib }
%

Python转换工具

这个简短的程序接受一个包含一个或多个Tagged Format (.txt)格式的D42引用的文件,并将其转换为一个与上面的BibTeX样式相匹配的bibtex条目。

代码语言:javascript
复制
r"""Simple sctipt to convert Scifinder refernces for patents into bibtex entries for my customized acspat.bst bibtex style. 

    note: acspat.bst complies with the ACS format as outlined by The ACS Style Guide, 3rd Edition

    usage 
        python "c:\...\cas2bibtex.py" "c:\...\ref.txt" > "c:\...\ref.bib"
"""

import sys
import re
import inspect
import httplib2
import pycountry 
from datetime import datetime
from tkinter import Tk
from tkinter.filedialog import askopenfilename
from pylatexenc import latexencode

# method to convert patents to bibtex
def convert_Patent(fields):
    """Convert the passed dictionary from SciFinder `Tagged Format (*.txt)` to Custom @Patent bibtex entry

    Args:
        fields (dict): dictionary verions of Scifinder CAS reference sheet
    """

    assert isinstance(fields, dict)
    assert fields["Document Type"]=="Patent"

    # ACS spec uses the application date, not publishing date
    # pub_date = datetime.strptime(fields["Publication Date"],"%Y%m%d")
    app_date = datetime.strptime(fields["Patent Application Date"],"%Y%m%d")
    print()
    # print(pycountry.countries.get(name=str(fields["Patent Assignee"]).replace(r"\(.*\)","").title()))

    # construct a google patent url from the text file data
    google_pat_url =("https://patents.google.com/patent/" + 
            fields["Patent Country"].strip() + 
            fields["Patent Number"].strip() + 
            fields["Patent Kind Code"].strip())

    # check if the url is valid, if not, ditch it. store result as url
    url = google_pat_url if int(httplib2.Http().request(google_pat_url, "HEAD")[0]["status"])<400 else ''

    

    bib_str = inspect.cleandoc(
            # Key, abstract and keywords - stuf to make finding the right doc easier when writing
        r"""@Patent{{{citation_key},
            abstract    = {{{abstract}}},
            keywords    = {{{keywords}}},
            """
            # Basic Publication Info  
        r"""author      = {{{author}}},
            assignee    = {{{assignee}}},
            title       = {{{title}}},
            date        = {{{date}}},
            year        = {{{year}}},
            month       = {{{month}}},
            day         = {{{day}}},
            """
            # Document info
        r"""pages       = {{{pages}}},
            language    = {{{language}}},
            """
            # patent-specific info
        r"""type        = {{{patent_type}}},
            number      = {{{patent_number}}}, 
            """
            # Search-related info
        r"""CODEN       = {{{CODEN}}},
            CAS_AN      = {{{accession_number}}},
            CAS_CAN     = {{{chemical_abstracts_number}}}, 
            url         = {{{url}}}
        }}""".format(
            citation_key                = fields["Inventor Name"].split(maxsplit=1)[0].replace(',','') +
                                          app_date.strftime("%Y"), 
            abstract                    = latexencode.unicode_to_latex(fields["Abstract"].strip()),
            # abstract                    = fields["Abstract"],
            keywords                    = fields["Index Terms"], 

            author                      = fields["Inventor Name"].replace('; '," and "),
            assignee                    = fields["Patent Assignee"] if 
                                            pycountry.countries.get(name=fields["Patent Assignee"].strip('()'))==None
                                            else '',
            title                       = latexencode.unicode_to_latex(fields["Title"]).title(),
            date                        = app_date.strftime("%Y-%m-%d"),
            year                        = app_date.strftime("%Y"),
            month                       = app_date.strftime("%m"),
            day                         = app_date.strftime("%d"), 
            
            pages                       = fields["Page"],
            language                    = fields["Language"],
 
            patent_type                 = fields["Journal Title"]+'.',  # this holds the abbreviation for patent type
                                                                        # period added as this an abbreviation; this is an exception
            patent_number               = fields["Patent Number"],
            
            CODEN                       = fields["CODEN"],
            accession_number            = fields["Accession Number"],
            chemical_abstracts_number   = fields["Chemical Abstracts Number(CAN)"],

            url                         = fields["URL"] if fields["URL"] != '' else url # use the cas specified url, if it exists, else use the url selected above
            ))


    return(bib_str)

# Main method - runs at execution-time
def main():
    """ Check if any command-line arguments for what files to look at are passed, 
        if not, prompt user for what files to use

        Assert that the file is of the appropriate scifinder format

        extract records, and one by one, convert those records to the bibtex entries 
        using `convert_Patent()`
    """

    if len(sys.argv)<2:
        tk = Tk()
        tk.withdraw()
        tk.call('wm', 'attributes', '.', '-topmost', True)
        files = askopenfilename(title="Choose File(s) to Convert",multiple=True)
    else:
        files = sys.argv[1:]

    # Define patterns to seek out the records
    rec_pattern = r"""START_RECORD\n    # Begins with start record
                    (?P.+?)     # Capture as few lines as possible between 
                    END_RECORD"""       # Ends with End Record
    fld_pattern = r"""FIELD\s           # Begins with 
                    (?P.+?)\:      # Capture the key (everything before the `:` - newlines excluded)
                    (?P.+?)?\.?    # Capture the definition (everything after the `:` - trailing periods excluded)
                    \n"""               # Ends with a non-optional newline

    # Compile the patterns into regex objects
    rec_regex = re.compile(rec_pattern, re.VERBOSE | re.MULTILINE | re.DOTALL)
    fld_regex = re.compile(fld_pattern, re.VERBOSE | re.MULTILINE)

    # iter over all passed files
    for filePath in files:

        # open and read the file into memory
        file = open(filePath)
        fileTxt = file.read()
        file.close()
        
        # find records using regexp and iter over them 
        for record in rec_regex.findall(fileTxt):
            # convert the records into dicts
            fields = dict(fld_regex.findall(record))
            
            # decision tree for converting based off of doc type
            # print result with intention that this can be used 
            # at the shell and piped into a file
            if fields["Document Type"]=="Patent":
                print(convert_Patent(fields))
            else:
                print("Attempted to covert file: {}\nHowever, document type <{}> is yet not supported".format(
                    filePath, fields["Document Type"]))

# Force auto-run of main
if __name__ == "__main__": main()

样本输入

样本输出

输出

Expected输出

从ACS风格指南第三版,310-311,专利预计将作为

Recommended格式:

专利所有人1;专利所有人2;等等。专利的所有权。专利号日期。

示例:

  1. 低成本光纤压力传感器。美国专利9738,537,2004年5月18日。
  2. 兰森,K. C.;Jantscheff,P.;Kiedrowski,G.;用丝氨酸骨架聚合U.阳离子脂质,用于转移生物Molecules.Eu。帕特。Appl.1457483,2004年。
  3. L. Langhals,H.;Wetzel,F.;具有金属效应的聚甲苯色素。杰。奥芬。10357978.8,2013年12月11日;化学。Abstr.2005, 143,134834。
  4. 岛津,Y.;Kajiyama,H. (日本Kanebo有限公司;Kanebo合成纤维有限公司)。Jpn2004176197 A2 20040624,2004年。

Actual输出

使用上述乳胶中的代码,以及使用Python将来自Scifinder的数据转换为BibTeX,相同的专利(不包括在SciFinder数据库中不存在的第3项专利)将产生以下引用

看看在背页上产生这个的MWE.

讨论

与预期值相比,从SciFinder数据库获得的数据似乎存在相当数量的不一致。

最令人关切的是,根据风格指南引用的日期是不一致的。对于指南中的例子25,引用了发布日期,对于3.4.则使用了应用日期。示例1似乎使用了一个完全无关的日期。

因此,我决定在我的所有引文中使用出版日期,但如果有人能向我提供关于日期为何变化以及我如何以编程方式处理这一问题的指导,我将非常感激。

其他关切

在BibTeX样式文件中,format.assigneesformat.CAS的定义都使用find.replace函数或其变体。这个函数最后在每个引文生成上增加了相当多的步骤,我担心这会对我的大量书目的产生产生很大的负面影响。如何才能改善这种情况?

在下面的Python中,latexencode.unicode_to_latex(...)函数有时不正确地转换输入--例如字符γ(\gamma)没有正确地处理到预期的\gamma --是否有更好的选择来将unicode转换为D72

您是否建议使用BST文件(来自驯服兽)的资源?关于这种格式的文件非常稀少。

你还有什么其他的改进建议吗?

EN

回答 2

Code Review用户

发布于 2020-06-12 15:08:01

好的第一步总是查看BibTeX输出,并尝试解决出现的所有错误。在你的情况下,我得到

这是BibTeX,0.99d版(TeX Live 2020)顶级辅助文件: main.aux样式文件: acspat.bst数据库文件#1: ref.bib achemso 2019-02-14 v3.12a 1是整数字面值,而不是字符串,用于在执行文件acspat.bst的第1932行时输入Sheem1997,在执行文件acspat.bst 1的第1932行时不能弹出输入Sheem1997的空文字堆栈,对于在执行文件acspat.bst的第1932行时,您不能在执行时弹出输入Lenssen2003的空文字堆栈-文件acspat.bst的第1932行-执行时不能弹出输入Langhals2003的空文字堆栈-- acspat.bst 1的第1932行是整数字面,而不是字符串,用于执行- 1932行文件acspat.bst -在执行- 1932行文件acspat.bst时,不能弹出输入Shimizu2002的空文字堆栈(有7条错误消息)

这是大量的重复,但基本上BibTeX为每个条目报告了两个问题:

代码语言:javascript
复制
1 is an integer literal, not a string, for entry Sheem1997
while executing---line 1932 of file acspat.bst
You can't pop an empty literal stack for entry Sheem1997
while executing---line 1932 of file acspat.bst

这些是从哪里来的?1 is an integer literal是由

代码语言:javascript
复制
FUNCTION { format.assignees } {
  assignee duplicate$ empty$
    { pop$ }
    { " AND" ";" find.replace.upper
      duplicate$
      duplicate$
      #-1 #1 substring$
      ")" = % <--------- After this comparison, a `1` (which represents "true" is at the top of the stack
            % not the string you duplicated earlier
      #1 #1 substring$
      "(" =
      and
        { paren }
        {  }
      if$
      output
      next.punct.period 'next.punct.int :=    
    }
  if$   
}

要解决这个问题,在")" =#1 #1 substring$之间添加一个D6,以使字符串再次高于数字(也称为布尔值)。

这就产生了“您不能弹出一个空的文字堆栈”。看

代码语言:javascript
复制
FUNCTION { format.type.patent } {
  type empty$
    { "Patent" }{ type }
  if$ output
}

FUNCTION { format.type.number.patent } {
   number empty$
    { }
    {
      format.type.patent " " number * * output
      next.punct.comma 'next.punct.int :=
    }
  if$
}

尤其是这条线

代码语言:javascript
复制
format.type.patent " " number * * output

这将尝试将“”和number连接到format.type.patent遗留在堆栈上的字符串,但是由于format.type.patentoutput结尾,所以字符串不再在堆栈上,而是已经打印出来了。

若要修复此问题,请将format.type.patent更改为

代码语言:javascript
复制
FUNCTION { format.type.patent } {
  type empty$
    { "Patent" }{ type }
  if$
}

以避免过早输出。(您可能想重命名函数名,因为它的行为不再像其他format.函数那样)

票数 5
EN

Code Review用户

发布于 2020-06-12 17:18:02

适用于乳胶编码的K21修正

谢谢戴维·卡莱尔帮我搞清楚这个问题。

编码\gamma和类似unicode字符的问题不是来自latexencode.unicode_to_latex(...)函数,而是来自错误地读取文件。与其使用open(filePath),不如使用open(filePath,encoding="utf-8")来确保正确读取unicode字符。

此外,这种unicode转换应该应用于keywords字段,因为这也可能包含unicode字符,如Langhals示例所示。

实现时,这将呈现表单的python文件cas2bibtex.py

代码语言:javascript
复制
r"""Simple sctipt to convert Scifinder refernces for patents into bibtex entries for my customized acspat.bst bibtex style. 

    note: acspat.bst complies with the ACS format as outlined by The ACS Style Guide, 3rd Edition

    usage 
        python "c:\...\cas2bibtex.py" "c:\...\ref.txt" > "c:\...\ref.bib"
"""

import sys
import re
import inspect
import httplib2
import pycountry 
from datetime import datetime
from tkinter import Tk
from tkinter.filedialog import askopenfilename
from pylatexenc import latexencode

# method to convert patents to bibtex
def convert_Patent(fields):
    """Convert the passed dictionary from SciFinder `Tagged Format (*.txt)` to Custom @Patent bibtex entry

    Args:
        fields (dict): dictionary verions of Scifinder CAS reference sheet
    """

    assert isinstance(fields, dict)
    assert fields["Document Type"]=="Patent"

    # ACS spec uses application and publication date interchangably. I am not sure which to use, so the code for both is provided below.
    pub_date = datetime.strptime(fields["Publication Date"],"%Y%m%d")
    app_date = datetime.strptime(fields["Patent Application Date"],"%Y%m%d")

    # construct a google patent url from the text file data
    google_pat_url =("https://patents.google.com/patent/" + 
            fields["Patent Country"].strip() + 
            fields["Patent Number"].strip() + 
            fields["Patent Kind Code"].strip())

    # check if the url is valid, if not, ditch it. store result as url
    url = google_pat_url if int(httplib2.Http().request(google_pat_url, "HEAD")[0]["status"])<400 else ''

    bib_str = inspect.cleandoc(
            # Key, abstract and keywords - stuf to make finding the right doc easier when writing
        r"""@Patent{{{citation_key},
            abstract    = {{{abstract}}},
            keywords    = {{{keywords}}},
            """
            # Basic Publication Info  
        r"""author      = {{{author}}},
            assignee    = {{{assignee}}},
            title       = {{{title}}},
            date        = {{{date}}},
            year        = {{{year}}},
            month       = {{{month}}},
            day         = {{{day}}},
            """
            # Document info
        r"""pages       = {{{pages}}},
            language    = {{{language}}},
            """
            # patent-specific info
        r"""type        = {{{patent_type}}},
            number      = {{{patent_number}}}, 
            """
            # Search-related info
        r"""CODEN       = {{{CODEN}}},
            CAS_AN      = {{{accession_number}}},
            CAS_CAN     = {{{chemical_abstracts_number}}}, 
            url         = {{{url}}}
        }}""".format(
            citation_key                = fields["Inventor Name"].split(maxsplit=1)[0].replace(',','') +
                                        app_date.strftime("%Y"), 
            abstract                    = latexencode.unicode_to_latex(fields["Abstract"].strip()),
            keywords                    = latexencode.unicode_to_latex(fields["Index Terms"]),
            author                      = fields["Inventor Name"].replace('; '," and "),
            assignee                    = fields["Patent Assignee"] if 
                                            pycountry.countries.get(name=fields["Patent Assignee"].strip('()'))==None
                                            else '',
            title                       = latexencode.unicode_to_latex(fields["Title"]).title(),

            date                        = app_date.strftime("%Y-%m-%d"),
            year                        = app_date.strftime("%Y"),
            month                       = app_date.strftime("%m"),
            day                         = app_date.strftime("%d"), 
            
            pages                       = fields["Page"],
            language                    = fields["Language"],

            patent_type                 = fields["Journal Title"]   # this holds the abbreviation for patent type
                                            + '.' if str(fields["Journal Title"]).find('.') > 0 else '',
                                                                    # period added if the entry is a period - subject to change
            patent_number               = fields["Patent Number"],
            
            CODEN                       = fields["CODEN"],
            accession_number            = fields["Accession Number"],
            chemical_abstracts_number   = fields["Chemical Abstracts Number(CAN)"],

            url                         = fields["URL"] if fields["URL"] != '' else url # use the cas specified url, if it exists, else use the url selected above
            ))

    return(bib_str)

# Main method - runs at execution-time
def main():
    """ Check if any command-line arguments for what files to look at are passed, 
        if not, prompt user for what files to use

        Assert that the file is of the appropriate scifinder format

        extract records, and one by one, convert those records to the bibtex entries 
        using `convert_Patent()`
    """

    if len(sys.argv)<2:
        tk = Tk()
        tk.withdraw()
        tk.call('wm', 'attributes', '.', '-topmost', True)
        files = askopenfilename(title="Choose File(s) to Convert",multiple=True)
    else:
        files = sys.argv[1:]

    # Define patterns to seek out the records
    rec_pattern = r"""START_RECORD\n    # Begins with start record
                    (?P.+?)     # Capture as few lines as possible between 
                    END_RECORD"""       # Ends with End Record
    fld_pattern = r"""FIELD\s           # Begins with 
                    (?P.+?)\:      # Capture the key (everything before the `:` - newlines excluded)
                    (?P.+?)?\.?    # Capture the definition (everything after the `:` - trailing periods excluded)
                    \n"""               # Ends with a non-optional newline

    # Compile the patterns into regex objects
    rec_regex = re.compile(rec_pattern, re.VERBOSE | re.MULTILINE | re.DOTALL)
    fld_regex = re.compile(fld_pattern, re.VERBOSE | re.MULTILINE)

    # iter over all passed files
    for filePath in files:

        # open and read the file into memory
        file = open(filePath,encoding="utf-8")
        fileTxt = file.read()
        file.close()
        
        # find records using regexp and iter over them 
        for record in rec_regex.findall(fileTxt):
            # convert the records into dicts
            fields = dict(fld_regex.findall(record))
            
            # decision tree for converting based off of doc type
            # print result with intention that this can be used 
            # at the shell and piped into a file
            if fields["Document Type"]=="Patent":
                print(convert_Patent(fields))
            else:
                print("Attempted to covert file: {}\nHowever, document type <{}> is yet not supported".format(
                    filePath, fields["Document Type"]))

# Force auto-run of main
if __name__ == "__main__": main()

更新样本输出

票数 0
EN
页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://codereview.stackexchange.com/questions/243576

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档