我正在尝试用Python语言编写这段简单的代码:如果csv文件一行的第二个元素包含"malware_list“列表中指定的系列之一,则主程序应输出"true”。然而,结果是程序总是打印"FALSE“。
文件中的每一行都是这样的:"NAME,FAMILY“
代码如下:
malware_list = ["FakeInstaller","DroidKungFu", "Plankton",
"Opfake", "GingerMaster", "BaseBridge",
"Iconosys", "Kmin", "FakeDoc", "Geinimi",
"Adrd", "DroidDream", "LinuxLotoor", "GoldDream"
"MobileTx", "FakeRun", "SendPay", "Gappusin",
"Imlog", "SMSreg"]
def is_malware (line):
line_splitted = line.split(",")
family = line_splitted[1]
if family in malware_list:
return True
return False
def main():
with open("datset_small.csv", "r") as f:
for i in range(1,100):
line = f.readline()
print(is_malware(line))
if __name__ == "__main__":
main()发布于 2018-10-30 02:08:18
line = f.readline()readline没有去掉结果中的尾随换行符,所以这里的line很可能看起来像"STEVE,FakeDoc\n"。然后family变成"FakeDoc\n",它不是malware_list的成员,所以您的函数返回False。
读完后试着去掉空格:
line = f.readline().strip()发布于 2018-10-30 02:09:51
python有一个名为pandas的包。通过使用pandas,我们可以读取数据帧格式的CSV文件。
import pandas as pd df=pd.read_csv("datset_small.csv")
请将您的内容发布为CSV文件,以便我可以帮助您解决问题
发布于 2018-10-30 02:29:31
使用dataframe可以很容易地实现。示例代码如下
import pandas as pd
malware_list = ["FakeInstaller","DroidKungFu", "Plankton",
"Opfake", "GingerMaster", "BaseBridge",
"Iconosys", "Kmin", "FakeDoc", "Geinimi",
"Adrd", "DroidDream", "LinuxLotoor", "GoldDream"
"MobileTx", "FakeRun", "SendPay", "Gappusin",
"Imlog", "SMSreg"]
# read csv into dataframe
df = pd.read_csv('datset_small.csv')
print(df['FAMILY'].isin(malware_list))输出为
0 True
1 True
2 True使用的csv示例为
NAME,FAMILY
090b5be26bcc4df6186124c2b47831eb96761fcf61282d63e13fa235a20c7539,Plankton
bedf51a5732d94c173bcd8ed918333954f5a78307c2a2f064b97b43278330f54,DroidKungFu
149bde78b32be3c4c25379dd6c3310ce08eaf58804067a9870cfe7b4f51e62fe,Planktonhttps://stackoverflow.com/questions/53051312
复制相似问题