我在将twint结果转换为dataframe时遇到问题。我无法获取推文结果并将其存储到dataframe中。每次我设置c.Pandas=True时,我都会得到一个错误。有没有办法解决这个问题。
我知道我总是可以将其存储到json/csv中,然后再将其重新导入,但我希望避免这样做。
我正在使用的代码:
import twint
from datetime import datetime, timedelta
import nest_asyncio
import pandas as pd
nest_asyncio.apply()
c = twint.Config()
c.Limit=10
c.Username='ProtonMail'
c.Store_object=True
c.Pandas=True
twint.run.Search(c)错误日志如下:
Traceback (most recent call last):
File "<ipython-input-39-e0414b83fe16>", line 17, in <module>
twint.run.Search(c)
File "c:\users\xx\appdata\local\programs\python\python37-32\lib\site-packages\twint\run.py", line 292, in Search
run(config, callback)
File "c:\users\xx\appdata\local\programs\python\python37-32\lib\site-packages\twint\run.py", line 213, in run
get_event_loop().run_until_complete(Twint(config).main(callback))
File "c:\users\xx\appdata\local\programs\python\python37-32\lib\site-packages\nest_asyncio.py", line 61, in run_until_complete
return f.result()
File "c:\users\xx\appdata\local\programs\python\python37-32\lib\asyncio\futures.py", line 178, in result
raise self._exception
File "c:\users\xx\appdata\local\programs\python\python37-32\lib\asyncio\tasks.py", line 251, in __step
result = coro.throw(exc)
File "c:\users\xx\appdata\local\programs\python\python37-32\lib\site-packages\twint\run.py", line 154, in main
await task
File "c:\users\xx\appdata\local\programs\python\python37-32\lib\asyncio\futures.py", line 260, in __await__
yield self # This tells Task to wait for completion.
File "c:\users\xx\appdata\local\programs\python\python37-32\lib\asyncio\tasks.py", line 318, in __wakeup
future.result()
File "c:\users\xx\appdata\local\programs\python\python37-32\lib\asyncio\futures.py", line 178, in result
raise self._exception
File "c:\users\xx\appdata\local\programs\python\python37-32\lib\asyncio\tasks.py", line 249, in __step
result = coro.send(None)
File "c:\users\xx\appdata\local\programs\python\python37-32\lib\site-packages\twint\run.py", line 198, in run
await self.tweets()
File "c:\users\xx\appdata\local\programs\python\python37-32\lib\site-packages\twint\run.py", line 145, in tweets
await output.Tweets(tweet, self.config, self.conn)
File "c:\users\xx\appdata\local\programs\python\python37-32\lib\site-packages\twint\output.py", line 142, in Tweets
await checkData(tweets, config, conn)
File "c:\users\xx\appdata\local\programs\python\python37-32\lib\site-packages\twint\output.py", line 116, in checkData
panda.update(tweet, config)
File "c:\users\xx\appdata\local\programs\python\python37-32\lib\site-packages\twint\storage\panda.py", line 67, in update
day = weekdays[strftime("%A", localtime(Tweet.datetime))]
OSError: [Errno 22] Invalid argument`enter code here`发布于 2020-01-07 21:08:56
我也遇到了同样的问题,删除"store object“和"pandas = true”并将它们替换为下面的代码(c.Store_csv,c.Custom_csv)对我有效。你也应该写出输出的整个路径。
import twint
import nest_asyncio
nest_asyncio.apply()
# Configure
c = twint.Config()
c.Search = "data science"
c.Store_csv = True
c.Custom_csv = ["id", "user_id", "username", "tweet"]
c.Output = ("C:\Users\name\Downloads\tweet`enter code here`.csv")发布于 2021-07-27 00:15:32
要使用twint.run.Search(c),首先需要使用要搜索的文本定义c.Search= ""。但是,如果您有兴趣从ProtonMail的配置文件中提取tweet,则应该改为运行twint.run.Profile(c)。根据您需要的数据类型,有不同的选项可供运行(请参阅此reference on github了解更多信息)。
发布于 2021-11-25 14:13:28
你在正确的轨道上。您只需要从twint.storage.panda.Tweets_df检索保存的搜索并将其存储在一个变量中。
import twint
import pandas
c = twint.Config()
c.Pandas = True
c.Lang = 'en'
c.Username='ProtonMail'
c.Limit=10
twint.run.Search(c)
test_df = twint.storage.panda.Tweets_df有关更多信息,请参阅https://github.com/twintproject/twint/issues/173
如果有帮助,我在Python3.7上使用的是twint版本2.1.21,它是在anaconda提示符下使用pip install git+https://github.com/twintproject/twint.git@origin/master#egg=twint命令下载的。
https://stackoverflow.com/questions/57445244
复制相似问题