文章/答案/技术大牛

发布

社区首页 >问答首页 >标记静态博客程序

问标记静态博客程序
EN

Code Review用户

提问于 2011-12-16 16:39:17

回答 3查看 765关注 0票数 11

我希望能让Stack溢出中的一些人了解一下我的Python静态博客应用程序。我已经用了好几年了。最近，我决定把它清理干净，并把它放在吉特布身上。我希望一些更聪明的Python程序员能给我一些建议和智慧，帮助我改进、优化和简化代码。

程序在这里：https://github.com/mshea/Pueblo

一些哲学：

我不想要更多的特征。我要尽可能的简单。
我总是喜欢本地模块。我的ISP不允许我安装新模块，所以Markdown模块是我在缺省值之外唯一使用的模块。
我想把它保留在一个脚本上，除非将它分开使事情变得简单或简单。
我对任何潜在的安全问题都特别感兴趣。现在我什么都没看到。
我不太喜欢灵活性。我宁愿用一种方式做得很好，而不是很多方法都很糟糕。如果人们想要一个灵活的博客平台，可以使用WordPress。

#!/usr/local/bin/python
#
# Pueblo: Python Markdown Static Blogger    
#
# 17 December 2011
#
# A single Python script to build a simple blog from a directory full of markdown files.
#
# This script requires the Markdown python implementation available at:
# http://pypi.python.org/pypi/Markdown/2.1.0
#
# This script requires markdown files using the following multimarkdown metadata as the first three lines
# of the processed .txt markdown files as follows:
#
# Title: the Title of your Document
# Author: Joe Blow
# Date: 15 December 2011
#
# The program will generate an index.html homepage file, an archive.html archive file, 
# and an index.xml RSS file.
#
# Header and footer data can be edited in the variables throughout the program. 
#
# This script expects the following additional files:
# style.css: The main site's stylesheet.
# iphone.css: The mobile version of the site's stylesheet.
# sidebar.html: A secondary set of data usually displayed as a sidebar.
#
# Instructions
# Install the Markdown python module.
# Configure this script by changing the configuration variables below.
# Put your static markdown .txt files in the configured directory
# Run the script either manually, with a regular cronjob, or as a CGI script.
# View the output at index.html

config = {
    "directory": ".", # No trailing slash.
    "site_url": "http://yoursite.net/", # Must have a trailing slash.
    "site_title": "Your Website",
    "site_description": "Your blog tagline.",
    "google_analytics_tag": "UA-111111-1",
    "author_name": "Your Name",
    "author_bio_link": "about.html",
    "amazon_tag": "mikesheanet-20",
    "twitter_tag": "twitterid",
    "author_email": "your@emailaddress.com",
    "header_image_url": "",
    "header_image_width": "",
    "header_image_height": "",
    "sidebar_on_article_pages": False,
    "minify_html": False,
}

nonentryfiles = []

# Main Program
import glob, re, rfc822, time, cgi, datetime, markdown
from time import gmtime, strftime, localtime, strptime
def rebuildsite ():
    textfiles = glob.glob(config["directory"]+"//*.txt")
    for nonfile in nonentryfiles:
        textfiles.remove(config["directory"]+"/"+nonfile)
    indexdata = []

    # Rip through the stack of .txt markdown files and build HTML pages from it.
    for eachfile in textfiles:
        eachfile = eachfile.replace(config["directory"]+"\\", "")
        content = open(eachfile).read()
        lines = re.split("\n", content)
        title = re.sub("(Title: )|(  )", "", lines[0])
        title = cgi.escape(title)
        urltitle = title.replace("&", "%26")
        author = lines[1].replace("Author: ","")
        date = re.sub("(  )|(\n)|(Date: )","",lines[2])
        numdate = strftime("%Y-%m-%d", strptime(date, "%d %B %Y"))
        content = markdown.markdown(re.sub("(Title:.*\n)|(Author:.*\n)|(Date:.*\n\n)|    ", "", content))
        summary = re.sub("<[^<]+?>","", content)
        summary = summary.replace("\n", " ")[0:200]
        htmlfilenamefull = htmlfilename = eachfile.replace(".txt", ".html")
        htmlfilename = htmlfilename.replace(config["directory"]+"/", "")
        postname = htmlfilename.replace(".html", "")
        # Build the HTML file, add a bit of footer text.
        htmlcontent = [buildhtmlheader("article", title, date)]
        htmlcontent.append(content)
        htmlcontent.append(buildhtmlfooter("article", urltitle))
        htmlfile = open(htmlfilenamefull, "w")
        htmlfile.write(minify("".join(htmlcontent)))
        htmlfile.close()
        if numdate <= datetime.datetime.now().strftime("%Y-%m-%d"):
            indexdata.append([[numdate],[title],[summary],[htmlfilename],[content]])

    # The following section builds index.html, archive.html and index.xml.  
    indexdata.sort()
    indexdata.reverse()
    indexbody=archivebody=rssbody=""
    count=0

    for indexrow in indexdata:
        dateobject = strptime(indexrow[0][0], "%Y-%m-%d")
        rssdate = strftime("%a, %d %b %Y 06:%M:%S +0000", dateobject)
        nicedate = strftime("%d %B %Y", dateobject)
        articleitem = '''
<h2><a href="%(article_link)s">%(article_title)s</a></h2>
<p>%(date)s - %(summary)s...</p>
'''     % {
        'article_link': indexrow[3][0],
        'article_title': indexrow[1][0],
        'date': nicedate,
        'summary': indexrow[2][0],
        }

        rssitem = '''
<item>
<title>%(title)s</title>
<link>%(link)s</link>
<guid>%(link)s</guid>
<pubDate>%(pubdate)s</pubDate>
<description>%(description)s</description>
<content:encoded>
<![CDATA[%(cdata)s]]>
</content:encoded>
</item>
'''     % {
        'title': indexrow[1][0],
        'link': config["site_url"]+indexrow[3][0],
        'pubdate': rssdate,
        'description': indexrow[2][0],
        'cdata': indexrow[4][0],
        }

        count = count + 1
        if count < 15:
            rssbody = rssbody + rssitem
        if count < 30:
            indexbody = indexbody+articleitem
        archivebody = archivebody + articleitem
    sidebardata = open(config["directory"]+"/sidebar.html").read()
    rssdatenow = rfc822.formatdate()

    indexdata = [buildhtmlheader("index", config["site_title"], "none")]
    indexdata.append(indexbody)
    indexdata.append("<h2><a href=\"archive.html\">View All %(article_count)s Articles</a></h2>\n</div>\n" 
        % { 'article_count': str(count) })
    indexdata.append(buildhtmlfooter("index", ""))
    indexfile = open(config["directory"]+"/index.html", "w").write(minify("".join(indexdata)))

    archivedata = [buildhtmlheader("archive", config["site_title"]+" Article Archive", "none")]
    archivedata.append(archivebody)
    archivedata.append("\n</div>\n")
    archivedata.append(buildhtmlfooter("archive", ""))
    archivefile = open (config["directory"]+"/archive.html", "w").write(minify("".join(archivedata)))

    rsscontent = '''<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:dc="http://purl.org/dc/elements/1.1/\"
>

<channel>
<title>%(site_title)s</title>
<link>%(site_url)s</link>
<description>%(site_description)s</description>
<pubDate>%(rssdatenow)s</pubDate>
<language>en</language>
<atom:link href="%(site_url)sindex.xml" rel="self" type="application/rss+xml" />
%(rssbody)s
</channel>
</rss>
''' % {
    'site_url': config["site_url"],
    'site_title': config["site_title"],
    'site_description': config["site_description"],
    'rssdatenow': rssdatenow,
    'rssbody': rssbody,
    }

    rssfile = open(config["directory"]+"/index.xml", "w").write(minify(rsscontent))

# Subroutine to build out the page's HTML header
def buildhtmlheader(type, title, date):
    if config["header_image_url"] != "":
        headerimage = '''
<img class="headerimg" src="%(header_image_url)s" alt="%(site_title)s: %(site_description)s" height="%(header_image_height)s" width="%(header_image_width)s" />
'''     % {
        'header_image_url': config["header_image_url"],
        'site_title': config["site_title"],
        'site_description': config["site_description"],
        'header_image_height': config["header_image_height"],
        'header_image_width': config["header_image_width"],
        }

    htmlheader = ['''
<!DOCTYPE html>
<html>
<head>
<title>%(title)s</title>
<link rel="stylesheet" type="text/css" media="screen and (min-width: 481px)" href="style.css">
<link rel="stylesheet" type="text/css" media="only screen and (max-width: 480px)" href="iphone.css">
<link rel="alternate" type="application/rss+xml" title="%(title)s" href="index.xml">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="viewport" content="user-scalable=no, width=device-width" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black" />
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', '%(google_analytics_tag)s']);
_gaq.push(['_trackPageview']);
(function() {  var ga = document.createElement('script');
 ga.type = 'text/javascript';
 ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(ga, s);
})();
</script>
</head>
<body>
'''     % { 
        'title': title, 
        'google_analytics_tag': config["google_analytics_tag"], 
        } ]

    # Tons of conditional checks lay ahead. Does it use a header image 
    # and do you want the sidebar on article pages?
    if config["sidebar_on_article_pages"] != True and type == "article":
        htmlheader.append("\n<div class=\"article_container\">\n")
    else:
        htmlheader.append("\n<div class=\"container\">\n")
    if config["header_image_url"] != "" and type == "index":
        htmlheader.append(headerimage)
    elif config["header_image_url"] != "" and type != "index":
        htmlheader.append("<a href=\"/\">\n" + headerimage + "</a>\n")
    elif config["header_image_url"] == "" and type == "index":

        htmlheader.append('''
<div class="header">
<h1>%(site_title)s</h1>
<p>%(site_description)s</p>
</div>
'''     % {
        'site_title': config["site_title"],
        'site_description': config["site_description"],
        } )

    elif config["header_image_url"] == "" and type != "index":
        htmlheader.append('''
<p class="return_link">
<a href="index.html">%(site_title)s</a>
</p>
'''     % {
        'site_title': config["site_title"]
        } )
    if type == "index":
        htmlheader.append("\n<div class=\"article_list\">\n")
    elif type == "archive":
        htmlheader.append("\n<div class=\"article_list\">\n<h1>Article Archive</h1>\n")
    elif type == "article":
        htmlheader.append('''
<div class="article">
<h1>%(title)s</h1>
<p>by <a href="%(author_bio_link)s">%(author_name)s</a> on %(date)s</p>
'''     % {
        'author_bio_link': config["author_bio_link"],
        'title': title,
        'author_name': config["author_name"],
        'date': date,
        } )
    return "".join(htmlheader)

# Subroutine to remove all line breaks to make for some packed fast HTML
def minify(content):
    if config["minify_html"]:
        content = re.sub("\n","",content)
    return content

# Subroutine to build out the footer.
def buildhtmlfooter (type, urltitle):
    footer_parts = []
    sidebardata = open(config["directory"]+"/sidebar.html").read()
    if type == "index" or type == "archive" or config["sidebar_on_article_pages"]:
        footer_parts.append(sidebardata)
    if type == "article":
        footer_parts.append(
'''
<p>Send feedback to <a href="mailto:%(email)s">%(email)s</a> or <a href="http://twitter.com/share?via=%(twitter_tag)s&text=%(urltitle)s">share on twitter</a>.</p>
'''     % {
        'email': config['author_email'], 
        'twitter_tag': config['twitter_tag'], 
        'urltitle': urltitle,
        })
    footer_parts.append("\n</div>\n</body>\n</html>")
    return "".join(footer_parts)

# This program is designed to run as a CGI script so you can rebuild your site by hitting a URL.
print "Content-type: text/html\n\n"
rebuildsite()
print "<html><head><title>Site Rebuilt</title></head><body><h1>Site Rebuilt</h1></body></html>"

security

markdown

python

回答 3

Code Review用户

发布于 2011-12-17 02:09:25

以下是一些建议：

字符串比较应该使用== (和!=)，而不是is (和is not)。is可以工作，但意味着您正在比较标识(通常是一个内存地址)，而==则是比较值。有关详细信息，请参阅https://stackoverflow.com/a/2988117/331473。

您的布尔信任(例如：minify_html)应该是实际的布尔值True/False，而不是1/0。此外，当您检查这些，您应该放弃比较。示例：

if minify_html == 1:     # or minify_html == True 
   ...                   #    (if you've converted these to booleans)

可以写成：

if minify_html:
    ...

使用模块级别的vars作为信任通常可以用于一些事情，但是一旦您有了一个完整的目录，它就会有一些需要跟踪的地方。在查看您的代码时，我多次问自己“var从哪里来的？”

如果你想解决这个问题，你可以把这些放在字典里，这样就可以说它是“命名空间”了。示例：

config = {
    "site_url": "...",
    "site_name": "..."
}

然后，在您的代码中，您可以更容易地发现配置位：

if config['minify_html']:
    ....

还有更多的清理工作可以做，但这只是我最先想到的几件事。还有一件事。我现在没有时间来解决这个问题，但是if/elses在buildhtmlheader中的长链可能会被重新分解，以使事情少一点冗余。

票数 8

Code Review用户

发布于 2011-12-18 09:34:29

我更喜欢强调大写的常量。
导入通常应该在单独的行上，例如:是:导入os导入sys No: import sys，os
使用空格而不是制表符。
避免将变量命名为内置函数/对象。我是说file。如果您稍后需要此函数，则需要时间来找出它不工作的原因。
如果发生异常，With关键字将自动关闭文件。它更适合打开文件。打开(文件名，'r')为f: content = f.read()
如果您需要导入您的模块或它的一部分，您的代码每次都会执行rebuildsite。要防止这种情况，请使用__name__ ==‘__main_’：print "Content-type: text/html\n\n“rebuildsite() print”站点重建“
使用os.path.join os.path.join(目录'sidebar.html')连接路径更好
在python中连接字符串的更有效的方法是加入一个列表，用三元引号表示新行，%用于格式化: parts = “%”(头衔)S‘%{’标题‘：标题} (如果不是SIDEBAR_ON_ARTICLE_PAGES )，并键入==“parts.append”(‘’)返回‘. join’(Patrs)

遵循其他PEP8建议

票数 7

Code Review用户

发布于 2011-12-18 23:10:24

到目前为止，我喜欢其他两个答案，而且我看到你已经在其中添加了一些建议。

我有两个小的补充：

这可能更短，但更难读:对于非nonentryfiles中的非文件: textfiles.remove(config“目录”+"/"+nonfile) #，除了非条目文件之外，我猜您来自一种语言，它强调要考虑，但在python中，您更愿意做一些在非nonentryfiles中对非文件进行调整时更容易理解的事情: textfiles.remove(config“目录” + "/“+非文件)新行空格可以使它更容易阅读。(此外，我还在字符串连接中添加了空格。也使阅读更容易。)另一个空格就是那些如果是其他人的.很难找到一个开始而另一个结束的地方。试一试:如果测试：.其他：..。如果测试：..。还有别的东西：..。使查找每个if语句更容易。
评论每一行，比如：# Open up the file并没有真正的帮助。这只会使有用的评论更难在所有的噪音中找到。相反，尝试找到概括代码部分并删除其余部分的注释。或者如果你不得不做一些不寻常的事，也要把它们留在家里。想一想“如果我不知道这段代码做了什么，我需要解释哪些行”。

票数 5

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/6923

复制

相似问题

问标记静态博客程序
EN

回答 3

Code Review用户

Code Review用户

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问标记静态博客程序EN

回答 3

Code Review用户

Code Review用户

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问标记静态博客程序
EN