这在从Markdown转换为HTML时会很方便,例如,如果需要防止注释出现在最终的HTML源代码中。
示例输入my.md
# Contract Cancellation
Dear Contractor X, due to delays in our imports, we would like to ...
<!--
... due to a general shortage in the Y market
TODO make sure to verify this before we include it here
-->
best,
me <!-- ... or should i be more formal here? -->输出my-filtered.md示例
# Contract Cancellation
Dear Contractor X, due to delays in our imports, we would like to ...
best,
me在Linux上,我会这样做:
cat my.md | remove_html_comments > my-filtered.md我也能够写一个AWK脚本来处理一些常见的情况,但据我所知,AWK和任何其他用于简单文本操作的常用工具(如sed)都不能胜任这项工作。需要使用HTML解析器。
如何编写合适的remove_html_comments脚本,使用什么工具?
发布于 2017-11-01 20:06:24
我从你的评论中看到你主要使用Pandoc。
Pandoc version 2.0,发布于2017年10月29日,adds a new option --strip-comments。related issue为此更改提供了一些上下文。
升级到最新版本并将--strip-comments添加到命令中应该会在转换过程中删除HTML注释。
发布于 2017-10-26 20:28:03
这可能有点违反直觉,但我会使用HTML解析器。
Python和BeautifulSoup的示例:
import sys
from bs4 import BeautifulSoup, Comment
md_input = sys.stdin.read()
soup = BeautifulSoup(md_input, "html5lib")
for element in soup(text=lambda text: isinstance(text, Comment)):
element.extract()
# bs4 wraps the text in <html><head></head><body>…</body></html>,
# so we need to extract it:
output = "".join(map(str, soup.find("body").contents))
print(output)输出:
$ cat my.md | python md.py
# Contract Cancellation
Dear Contractor X, due to delays in our imports, we would like to ...
best,
me 它不应该破坏你的.md文件中的任何其他超文本标记语言(它可能会稍微改变代码的格式,但不是它的意思):

当然,如果您决定使用它,请对其进行粗略的测试。
编辑-在这里在线试用:https://repl.it/NQgG (输入是从input.md读取的,而不是标准输入)
发布于 2017-10-26 21:27:30
这个awk应该可以工作
$ awk -v FS="" '{ for(i=1; i<=NF; i++){if($i$(i+1)$(i+2)$(i+3)=="<!--"){i+=4; p=1} else if(!p && $i!="-->"){printf $i} else if($i$(i+1)$(i+2)=="-->") {i+=3; p=0;} } printf RS}' file
Dear Contractor X, due to delays in our imports, we would like to ...
best,
me为了获得更好的可读性和更好的解释:
awk -v FS="" # Set null as field separator so that each character is treated as a field and it will prevent the formatting as well
'{
for(i=1; i<=NF; i++) # Iterate through each character
{
if($i$(i+1)$(i+2)$(i+3)=="<!--") # If combination of 4 chars makes a comment start tag
{ # then raise flag p and increment i by 4
i+=4; p=1
}
else if(!p && $i!="-->") # if p==0 then print the character
printf $i
else if($i$(i+1)$(i+2)=="-->") # if combination of 3 fields forms comment close tag
{ # then reset flag and increment i by 3
i+=3; p=0;
}
}
printf RS
}' filehttps://stackoverflow.com/questions/46952210
复制相似问题