我在MongoDB和文本进程中是新的。我有一个数据库,里面有一个经过分析的推文。示例:
{
"_id" : ObjectId("59b24aa1a0c99b0b85732406"),
"idt" : "906060929829183489",
"tweet" : [
"RT",
"@moocowpong1",
":",
"@whitequark",
"isn't",
"the",
"cloud",
"just",
"your",
"data",
"relocating",
"to",
"san",
"francisco"
],
"createdDate" : ISODate("2017-09-08T07:45:34Z"),
"userName" : "Fiora Aeterna",
"userLocation" : "San Jose, CA",
"geo" : null,
"geoCoord" : null,
"Lang" : "en",
"retweet_count" : 0,
"sentimiento" : "",
"score_tag" : ""
}我标记了推特上的单词。我的下一步是删除断句。
我的守则:
for doc in tweets.find({},{'tweet': 1}).limit(1):
print (doc)
for term in (doc['tweet']):
if set(stop).intersection(term.split()):
print ("Found One")
tweets.update( { 'idt': doc['_id'] }, { '$pull': { 'tweet': { '$eq': term } } } )stop是一个带有停止词的数组。我想从tweet的数组中删除该项,但是我的代码失败了:
引发WriteError(error.get("errmsg")、error.get(“代码”)、错误) pymongo.errors.WriteError:未知顶级操作符:$eq
我不确定我的更新是否正确,你能帮我吗?
我的最后一个理由是登记册(类似的):
{
"_id" : ObjectId("59b24aa1a0c99b0b85732406"),
"idt" : "906060929829183489",
"tweet" : [
"@moocowpong1",
"@whitequark",
"cloud",
"just",
"data",
"relocating",
"san",
"francisco"
],
"createdDate" : ISODate("2017-09-08T07:45:34Z"),
"userName" : "Fiora Aeterna",
"userLocation" : "San Jose, CA",
"geo" : null,
"geoCoord" : null,
"Lang" : "en",
"retweet_count" : 0,
"sentimiento" : "",
"score_tag" : ""
}发布于 2017-09-08 12:52:46
您应该使用$in运算符,而不是$eq。因此,不需要控制for循环中的每个停止字。您可以同时给出所有停止词,并在一个查询中将它们全部提取,如下所示:
db.collection.update({}, { $pull: { "tweet": { $in: ["stopWord1", "stopWord2"] } } } )
https://stackoverflow.com/questions/46113147
复制相似问题