我正在尝试从数据库中的sqlite3列创建一个嵌套字典,该数据库是基于我看过的动漫(有几百个条目长)创建的。数据库中的两列是"DateWatched“,这是我观看该特定动漫的日期(如6月6日至6月8日等),另一列是" year”,这是我观看该动漫的年份。
下面是这两列中数据的一个小示例:
DateWatched | Year
---------------------------------+----------------
Dec 18-Dec 23 | 2013
Dec 25-Jan 10 | 2013 and 2014
Feb 2014 and Jan 1-Jan 3 2016 | 2014 and 2016 #Some anime get another season years later so any date after an "and" is another season
Mar 10th | 2014
Mar 13th | 2014这是我的两个专栏的基本结构。我想做的是将它存储在字典或列表中,并记录下我每年每月(从1月到12月)看了多少动漫。
我想我希望它是这样的(基于我的示例):
Final = {'2013':{'Dec':2},
'2014':{'Jan':1, 'Feb':1,'Mar':2}
'2016':{'Jan':1}}我想出了如何单独创建每个列的列表:
MonthColumn = [i[0] for i in c.execute("SELECT DateWatched FROM Anime").fetchall()] #'Anime' is just the name of arbitrary name for the database
x = [item.replace('-',' ') for item in [y for x in MonthColumn for y in re.split(' and ', x)]] #Gets rid of '-' in each row and splits into two strings any place with an 'and'
v = [' '.join(OrderedDict((w,w) for w in item.split()).keys()) for item in x] # Removes duplicate words ("Dec 18-Dec 23" becomes "Dec 18 23")
j = [y for j in v for y in j.split()] #Splits into separate strings ("Dec 18 23" becomes "Dec", "18", "23")
Month = [item for item in j if item.isalpha()] #Final list and removes any string with numbers (So "Dec","18","23" becomes "Dec")
YearColumn = [i[0] for i in c.execute("SELECT Year FROM Anime").fetchall()]
Year = [item for Year in YearColumn for item in re.split(' and ', Year)] #Final list and removes any "and" and splits the string into 2 (So "2013 and 2014" becomes "2013","2014")
#So in the example columns I gave above, my final lists become
Month = ['Dec','Dec','Jan','Feb','Jan','Mar','Mar']
Year = ['2013','2013','2014','2014','2016','2014',2014']最大的问题,也是我最需要帮助的地方,是试图找出如何将这两个列表转换为嵌套字典或类似的东西,并在Matplotlib中使用它来创建一个条形图,将年份作为x轴(每年12条),y轴是x轴上每年该月观看动漫的数量。
谢谢你的帮助,如果我错过了任何东西或没有包含任何东西,我很抱歉(第一次发帖)。
发布于 2018-06-10 09:10:22
我建议使用一种稍微不同的解析方法来处理月到日范围,需要考虑这些范围以实现可视化所需的字典,然后可以使用这些字典来创建更清晰的图:
import re, sqlite3
import itertools, collections
data = list(sqlite3.connect('db_tablename.db').cursor().execute("SELECT DateWatched, Year FROM tablename"))
new_parsed = [[list(filter(lambda x:x != 'and', re.findall('[a-zA-Z]+', a))), re.findall('\d+', b)] for a, b in data]
new_results = [i for b in [list(zip(*i)) for i in new_parsed] for i in b]
groups = {a:collections.Counter([c for c, _ in b]) for a, b in itertools.groupby(sorted(new_results, key=lambda x:x[-1]), key=lambda x:x[-1])}这给出了{'2013': Counter({'Dec': 2}), '2014': Counter({'Mar': 2, 'Jan': 1, 'Feb': 1}), '2016': Counter({'Jan': 1})}的结果。
要绘制图表:
import matplotlib.pyplot as plt
months = ['Dec', 'Jan', 'Feb', 'Mar']
new_months = {a:[[i, b.get(i, 0)] for i in months] for a, b in groups.items()}
labels = iter(['Dec', 'Jan', 'Feb', 'Mar'][::-1])
for i in range(len(new_months['2013'])):
i = len(new_months['2013'])-i-1
_current = [b[i][-1] for _, b in sorted(new_months.items(), key=lambda x:int(x[0]))]
_previous = [sum(c[-1] for c in b[:-i]) for _, b in sorted(new_months.items(), key=lambda x:int(x[0]))]
if not all(_previous):
plt.bar(range(len(new_months)), _current, label = next(labels))
else:
plt.bar(range(len(new_months)), _current, label = next(labels), bottom = _previous)
plt.xticks(range(len(new_months)), sorted(new_months, key=lambda x:int(x)))
plt.legend(loc='upper left')
plt.show()

https://stackoverflow.com/questions/50779508
复制相似问题