我有一根线,是
"contributors_enabled": false, "geo_enabled": false, "created_at": "Fri Nov 11 15:38:06 +0000 2016"}, "text": "Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket", "timestamp_ms": "1509073455803",.我将使用正则表达式,选择文本:
Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket它在"text“:”and“之前,”timestamp_ms“之后:
能收集这些文字吗?
发布于 2017-10-27 04:47:08
虽然从字符串中看,您的整个字符串可能会被解析,因为它看起来是JSON。但是,由于您正在寻找regex相关的解决方案,我希望下面的工作适合您。
import re
pattern = '"text": "(.*), "timestamp_ms"'
str = """
"contributors_enabled": false, "geo_enabled": false, "created_at": "Fri Nov 11 15:38:06 +0000 2016"}, "text": "Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket", "timestamp_ms": "1509073455803",.
"""
print re.findall(pattern, string=str)[0]输出:
Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket"发布于 2017-10-27 04:34:22
可能吗?是的。
def text_scrap(text, start, end):
"""This function returns the data between start and end."""
_,_,rest = text.partition(start)
result,_,_ = rest.partition(end)
return result
my_text = "contributors_enabled": false, "geo_enabled": false, "created_at": "Fri Nov 11 15:38:06 +0000 2016"}, "text": "Facts On Managed Forex Trading htps:////t.co////E4cxCvvjD #forex #binaryoptions #cryptocurrency #stockmarket", "timestamp_ms": "1509073455803",.
data_scrapped = text_scrap(my_text, start=' "text": "', end="timestamp_ms") # use our new shiny function
print(data_scrapped)好主意?可能不是。
您的代码是dict,因此您可以更容易地访问dict的“文本”键。请查看这以了解有关dicts的知识。
https://stackoverflow.com/questions/46967512
复制相似问题