我的数据是以一种不太有用的方式生成的,首先是几个空格,然后是索引号(在本例中是1-12),然后是与索引关联的实际值。我想要的是将字符串分成两个列表:一个列表包含索引,另一个列表包含值。我已经写了下面的代码,它可以为我想要的工作。然而,对于几千行的数据集,它似乎很麻烦,而且需要几秒钟的时间。对于大型数据集,有没有办法加快这一速度?
data = [' 11.814772E3',
' 2-1.06152E3',
' 33.876477E1',
' 4-2.65704E3',
' 51.141537E4',
' 61.378482E4',
' 71.401565E4',
' 86.782599E3',
' 9-1.22921E3',
' 103.400054E3',
' 111.558086E3',
' 121.017818E4']
values_total = [] #without empty strings
location = [] #index when id goes to value
ids = [] #Store ids
values = [] #Store values
step_array = np.linspace(1,1E3,1E3) #needed to calculate index values
for i in range(len(data)):
#Check how many indices have to be removed
location.append([])
location[i].append(int(math.log10(step_array[i]))+1)
#Store values after empty strings
for j in range(len(data[i])):
values_total.append([])
if data[i][j] != ' ':
values_total[i].append(data[i][j])
#Split list based on calculated lengths
ids.append(values_total[i][:location[i][0]])
values.append(values_total[i][location[i][0]:])发布于 2020-11-30 21:30:51
您可以尝试使用以下代码:
indices = []
vals = []
for i, d in enumerate(data, 1): # enumerate starting from 1, so we know current index
tmp = d.strip() # remove whitespace
split_idx = len(str(i)) # figure out the length of the current index
indices.append(i) # current index
vals.append(float(tmp[split_idx:])) # everything after current index lengthhttps://stackoverflow.com/questions/65074379
复制相似问题