我写了一个基于水瓶的web应用程序,它从用户那里获取文本,并返回给定分类的概率(下面是完整的脚本)。该应用程序加载一些经过训练的模型,以便在提出任何请求之前进行预测。我目前正试图将它部署到Heroku上,并遇到了一些问题。
在执行python ml_app.py时,我能够在本地运行它。但是,当我使用Heroku命令heroku local web尝试在部署前在本地运行它以进行测试时,我会得到以下错误
AttributeError:模块“__main__”没有属性“令牌化”
此错误与在行中发现的名为TFIDF的文本向量器的加载相关联。
tfidf_model = joblib.load('models/tfidf_vectorizer_train.pkl')我在脚本顶部导入了所需的函数,以确保正确加载(from utils import tokenize)。这是可行的,因为我可以在使用python ml_app.py时运行它。但出于不知道的原因,当我使用heroku local web时,它不会加载。当我尝试在本地运行flask run命令时,它也不起作用。知道为什么吗?
我承认我对这里的引擎盖下发生了什么事情没有很好的理解(关于代码的web ./部署方面),所以任何解释都有帮助。
from flask import Flask, request, render_template
from sklearn.externals import joblib
from utils import tokenize # custom tokenizer required for tfidf model loaded in load_tfidf_model()
app = Flask(__name__)
models_directory = 'models'
@app.before_first_request
def nbsvm_models():
global tfidf_model
global logistic_identity_hate_model
global logistic_insult_model
global logistic_obscene_model
global logistic_severe_toxic_model
global logistic_threat_model
global logistic_toxic_model
tfidf_model = joblib.load('models/tfidf_vectorizer_train.pkl')
logistic_identity_hate_model = joblib.load('models/logistic_identity_hate.pkl')
logistic_insult_model = joblib.load('models/logistic_insult.pkl')
logistic_obscene_model = joblib.load('models/logistic_obscene.pkl')
logistic_severe_toxic_model = joblib.load('models/logistic_severe_toxic.pkl')
logistic_threat_model = joblib.load('models/logistic_threat.pkl')
logistic_toxic_model = joblib.load('models/logistic_toxic.pkl')
@app.route('/')
def my_form():
return render_template('main.html')
@app.route('/', methods=['POST'])
def my_form_post():
"""
Takes the comment submitted by the user, apply TFIDF trained vectorizer to it, predict using trained models
"""
text = request.form['text']
comment_term_doc = tfidf_model.transform([text])
dict_preds = {}
dict_preds['pred_identity_hate'] = logistic_identity_hate_model.predict_proba(comment_term_doc)[:, 1][0]
dict_preds['pred_insult'] = logistic_insult_model.predict_proba(comment_term_doc)[:, 1][0]
dict_preds['pred_obscene'] = logistic_obscene_model.predict_proba(comment_term_doc)[:, 1][0]
dict_preds['pred_severe_toxic'] = logistic_severe_toxic_model.predict_proba(comment_term_doc)[:, 1][0]
dict_preds['pred_threat'] = logistic_threat_model.predict_proba(comment_term_doc)[:, 1][0]
dict_preds['pred_toxic'] = logistic_toxic_model.predict_proba(comment_term_doc)[:, 1][0]
for k in dict_preds:
perc = dict_preds[k] * 100
dict_preds[k] = "{0:.2f}%".format(perc)
return render_template('main.html', text=text,
pred_identity_hate=dict_preds['pred_identity_hate'],
pred_insult=dict_preds['pred_insult'],
pred_obscene=dict_preds['pred_obscene'],
pred_severe_toxic=dict_preds['pred_severe_toxic'],
pred_threat=dict_preds['pred_threat'],
pred_toxic=dict_preds['pred_toxic'])
if __name__ == '__main__':
app.run(debug=True)发布于 2018-03-26 08:14:24
修好了。这是因为我选择了存储在tfidf_vectorizer_train.pkl中的类实例。该模型是在一个ipython笔记本中创建的,它的一个属性依赖于我在笔记本中交互定义的令牌程序函数。我很快就了解到,泡菜不会保存类的确切实例,这意味着tfidf_vectorizer_train.pkl不包含我在笔记本中定义的函数。
为了解决这个问题,我将tokenizer函数移动到一个单独的实用程序python文件中,并将该函数导入到我训练并随后对模型进行腌制的文件中,以及我对其进行解压缩的文件中。
在代码中,我做了
from utils import tokenize
...
tfidfvectorizer = TfidfVectorizer(ngram_range=(1, 2), tokenizer=tokenize,
min_df=3, max_df=0.9, strip_accents='unicode',
use_idf=1, smooth_idf=True, sublinear_tf=1)
train_term_doc = tfidfvectorizer.fit_transform(train[COMMENT])
joblib.dump(tfidfvectorizer, 'models/tfidf_vectorizer_train.pkl')
...在我训练模特的档案里
from utils import tokenize
...
@app.before_first_request
def load_models():
# from utils import tokenize
global tfidf_model
tfidf_model =
joblib.load('{}/tfidf_vectorizer_train.pkl'.format(models_directory))
...在包含web应用程序代码的文件中。
https://stackoverflow.com/questions/49483732
复制相似问题