我的代码是-
df=pd.read_csv("file")
l1=[]
l2=[]
for i in range(0,len(df['unions']),len(df['district'])):
l1.append(' '.join((df['unions'][i], df['district'][i])))
l2.append(({"entities": [[(ele.start(), ele.end() - 1) for ele in re.finditer(r'\S+', df['unions'][i])] ,df['subdistrict'][i]],}))
TRAIN_DATA=list(zip(l1,l2))
print(TRAIN_DATA)结果- [('Dhansagar Bagerhat', {'entities': [[(0, 8)], 'Sarankhola']})]
My expected - [('Dhansagar Bagerhat', {'entities': [[(0, 8)], 'Sarankhola'],[[(10, 17)], 'AnyLabel']})]如何获得所有行的输出?我只得到了一行的结果。看起来我的循环不工作了。有谁能指出我的错误吗?
我的csv文件如下所示。"AnyLabel“是另一列。我大概有500行-
unions subdistrict district
Dhansagar Sarankhola Bagerhat
Daibagnyahati Morrelganj Bagerhat
Ramchandrapur Morrelganj Bagerhat
Kodalia Mollahat Bagerhat发布于 2021-09-05 07:46:47
尝试使用str.join
df=pd.read_csv("file")
l1=[]
l2=[]
for idx, row in df.iterrows():
l1.append(' '.join((row['unions'], row['district'])))
l2.append(({"entities": [[[ele.start(), ele.end() - 1], ele.group(0)] for ele in re.finditer(r'\S+', ' '.join([row['unions'] ,row['subdistrict']]))]}))
TRAIN_DATA=list(zip(l1,l2))
print(TRAIN_DATA)输出:
[('Dhansagar Bagerhat', {'entities': [[[0, 8], 'Dhansagar'], [[10, 19], 'Sarankhola']]}), ('Daibagnyahati Bagerhat', {'entities': [[[0, 12], 'Daibagnyahati'], [[14, 23], 'Morrelganj']]}), ('Ramchandrapur Bagerhat', {'entities': [[[0, 12], 'Ramchandrapur'], [[14, 23], 'Morrelganj']]}), ('Kodalia Bagerhat', {'entities': [[[0, 6], 'Kodalia'], [[8, 15], 'Mollahat']]})]发布于 2021-09-05 07:43:12
你使用range是错误的,你基本上是在告诉它迭代从0到len(df['unions'])的所有数字,但是要以相同长度的len(df['district'])步长来做。所以你基本上是在告诉它只迭代第一行。您可以通过打印行号来查看:
for i in range(0,len(df['unions']),len(df['district'])):
print(i)另外,您也不应该像那样迭代行,而应该使用df.iterrows()
df=pd.read_csv("file")
l1=[]
l2=[]
for i, row in df.iterrows():
l1.append(' '.join((row['unions'], row['district'])))
l2.append(({"entities": [[(ele.start(), ele.end() - 1) for ele in re.finditer(r'\S+', ' '.join([row['unions'] ,row['subdistrict']]))]]}))https://stackoverflow.com/questions/69061409
复制相似问题