在Python中,我试图在Pandas列上执行value_counts。我可以让它正常工作,但我不知道如何获得其他相关的列。
守则:
import pandas as pd
myframe = pd.DataFrame({"Server":["Server_1","Server_1","Server_1","Server_1","Server_1","Server_2","Server_2","Server_2","Server_2","Server_3","Server_3","Server_3","Server_3","Server_3"],
"CVE_ID":["CVE-2017-1111","CVE-2017-1112","CVE-2017-1113","CVE-2017-1114","CVE-2017-1115","CVE-2017-1111","CVE-2017-1112","CVE-2017-1113","CVE-2017-1114","CVE-2017-1113","CVE-2017-1114","CVE-2017-1115","CVE-2017-1116","CVE-2017-1117"],
"VulnName":["Java Update 1","Java Update 2","Java Update 3","Adobe 1","Chrome 1","Java Update 1","Java Update 2","Java Update 3","Adobe 1","Java Update 3","Adobe 1","Chrome 1","Chrome 2","Chrome 3"],
"ServerOwner":["Alice","Alice","Alice","Alice","Alice","Bob","Bob","Bob","Bob","Carol","Carol","Carol","Carol","Carol"]})
print "The dataframe: \n", myframe
print "Top 10 offending CVEs, Vulnerability and Count: \n"
print myframe['CVE_ID'].value_counts()最后一行打印出2列:一列是CVE,另一列是发生了多少次。但是,我想打印类似的内容,其中它维护CVE和漏洞名称之间的链接(见中间一栏):
Top 10 offending CVEs, Vulnerability and Count:
CVE-2017-1113 Java Update 1 3
CVE-2017-1114 Java Update 2 3
...etc...我该怎么做?我做的每件事都会出错。
发布于 2017-04-11 19:34:03
编辑:更改,以便在输出中有列名访问
as_index=False (注1中添加的和见源5和6 )
1第一个groupby在CVE_ID列上,并使用size:
counts = myframe.groupby(['CVE_ID','VulnName','ServerOwner'], as_index=False).size().unstack(fill_value=0).reset_index()
ServerOwner CVE_ID VulnName Alice Bob Carol
0 CVE-2017-1111 Java Update 1 1 1 0
1 CVE-2017-1112 Java Update 2 1 1 0
2 CVE-2017-1113 Java Update 3 1 1 1
3 CVE-2017-1114 Adobe 1 1 1 1
4 CVE-2017-1115 Chrome 1 1 0 1
5 CVE-2017-1116 Chrome 2 0 0 1
6 CVE-2017-1117 Chrome 3 0 0 12然后将Alice、Bob和Carol列相加得到:
counts['Count'] = counts[['Alice','Bob','Carol']].sum(axis=1)
ServerOwner CVE_ID VulnName Alice Bob Carol Count
0 CVE-2017-1111 Java Update 1 1 1 0 2
1 CVE-2017-1112 Java Update 2 1 1 0 2
2 CVE-2017-1113 Java Update 3 1 1 1 3
3 CVE-2017-1114 Adobe 1 1 1 1 3
4 CVE-2017-1115 Chrome 1 1 0 1 2
5 CVE-2017-1116 Chrome 2 0 0 1 1
6 CVE-2017-1117 Chrome 3 0 0 1 13然后使用df.drop在names上删除名称列
counts.drop(['Carol','Bob','Alice'],inplace=True,axis=1)
ServerOwner CVE_ID VulnName Count
0 CVE-2017-1111 Java Update 1 2
1 CVE-2017-1112 Java Update 2 2
2 CVE-2017-1113 Java Update 3 3
3 CVE-2017-1114 Adobe 1 3
4 CVE-2017-1115 Chrome 1 2
5 CVE-2017-1116 Chrome 2 1
6 CVE-2017-1117 Chrome 3 14然后在sort_values列上使用sum:
counts.sort_values(by='Count', ascending=False, inplace=True)
ServerOwner CVE_ID VulnName Count
2 CVE-2017-1113 Java Update 3 3
3 CVE-2017-1114 Adobe 1 3
0 CVE-2017-1111 Java Update 1 2
1 CVE-2017-1112 Java Update 2 2
4 CVE-2017-1115 Chrome 1 2
5 CVE-2017-1116 Chrome 2 1
6 CVE-2017-1117 Chrome 3 1合并:
counts = myframe.groupby(['CVE_ID','VulnName','ServerOwner'], as_index=False).size().unstack(fill_value=0).reset_index()
counts['Count'] = counts[['Alice','Bob','Carol']].sum(axis=1)
counts.drop(['Carol','Bob','Alice'],inplace=True,axis=1)
counts.sort_values(by='Count', ascending=False, inplace=True)
print "The dataframe: \n", myframe
print "Top 10 offending CVEs, Vulnerability and Count: \n"
print counts
Top 10 offending CVEs, Vulnerability and Count:
ServerOwner CVE_ID VulnName Count
2 CVE-2017-1113 Java Update 3 3
3 CVE-2017-1114 Adobe 1 3
0 CVE-2017-1111 Java Update 1 2
1 CVE-2017-1112 Java Update 2 2
4 CVE-2017-1115 Chrome 1 2
5 CVE-2017-1116 Chrome 2 1
6 CVE-2017-1117 Chrome 3 1如果需要,可以使用reset_index()在此时重置索引。
编辑:响应关于serverOwner索引的评论,您可以重置索引、删除旧索引和重命名新索引:
counts.reset_index(drop=True, inplace = True)
counts.index.names = ['index']给予:
ServerOwner CVE_ID VulnName Count
index
0 CVE-2017-1113 Java Update 3 3
1 CVE-2017-1114 Adobe 1 3
2 CVE-2017-1111 Java Update 1 2
3 CVE-2017-1112 Java Update 2 2
4 CVE-2017-1115 Chrome 1 2
5 CVE-2017-1116 Chrome 2 1
6 CVE-2017-1117 Chrome 3 1( ServerOwner名称保留为原始groupby命令的残余,以详细说明所使用的列。)
这一答复的来源:
发布于 2017-04-11 23:15:11
使用join追加value_counts
myframe.join(myframe['CVE_ID'].value_counts().rename('Count'), on='CVE_ID')
CVE_ID Server ServerOwner VulnName Count
0 CVE-2017-1111 Server_1 Alice Java Update 1 2
1 CVE-2017-1112 Server_1 Alice Java Update 2 2
2 CVE-2017-1113 Server_1 Alice Java Update 3 3
3 CVE-2017-1114 Server_1 Alice Adobe 1 3
4 CVE-2017-1115 Server_1 Alice Chrome 1 2
5 CVE-2017-1111 Server_2 Bob Java Update 1 2
6 CVE-2017-1112 Server_2 Bob Java Update 2 2
7 CVE-2017-1113 Server_2 Bob Java Update 3 3
8 CVE-2017-1114 Server_2 Bob Adobe 1 3
9 CVE-2017-1113 Server_3 Carol Java Update 3 3
10 CVE-2017-1114 Server_3 Carol Adobe 1 3
11 CVE-2017-1115 Server_3 Carol Chrome 1 2
12 CVE-2017-1116 Server_3 Carol Chrome 2 1
13 CVE-2017-1117 Server_3 Carol Chrome 3 1如果您想将其限制在顶部n,(我的示例显示2),请使用head和how='inner'
myframe.join(
myframe['CVE_ID'].value_counts().head(2).rename('Count'),
on='CVE_ID', how='inner')
CVE_ID Server ServerOwner VulnName Count
2 CVE-2017-1113 Server_1 Alice Java Update 3 3
7 CVE-2017-1113 Server_2 Bob Java Update 3 3
9 CVE-2017-1113 Server_3 Carol Java Update 3 3
3 CVE-2017-1114 Server_1 Alice Adobe 1 3
8 CVE-2017-1114 Server_2 Bob Adobe 1 3
10 CVE-2017-1114 Server_3 Carol Adobe 1 3https://stackoverflow.com/questions/43354454
复制相似问题