我有这个字符串:
Model: ARIMA BIC: 417.2273
Dependent Variable: D.Sales of shampoo over a three year period Log-Likelihood: -196.17
Date: 2018-09-24 13:20 Scale: 1.0000
No. Observations: 35 Method: css-mle
Df Model: 6 Sample: 02-01-1901
Df Residuals: 29 12-01-1903
Converged: 1.0000 S.D. of innovations: 64.241
No. Iterations: 19.0000 HQIC: 410.098
AIC: 406.3399我想把它写进字典里。我已经使用了: split("\n"),并且我得到
Model: ARIMA BIC: 417.2273
Dependent Variable: D.Sales of shampoo over a three year period Log-Likelihood: -196.17
Date: 2018-09-24 13:20 Scale: 1.0000
No. Observations: 35 Method: css-mle
Df Model: 6 Sample: 02-01-1901
Df Residuals: 29 12-01-1903
Converged: 1.0000 S.D. of innovations: 64.241
No. Iterations: 19.0000 HQIC: 410.098
AIC: 406.3399但我没有看到一个好的方法来拆分,把它放进字典。也许我漏掉了什么明显的东西?
另外,请注意“Sample:”旁边日期的格式。
我想要这样的东西:{"Model":"ARIMA","BIC":417.2273,...}
发布于 2018-09-25 17:34:40
主要问题是有几列并排存在。由于键和值都包含空格,因此不能对其进行拆分。相反,您必须首先分隔列,然后解析数据。
如果列的长度未知
使用第一行来标识列的长度。一旦分离了列,就可以很容易地在冒号处分隔键和值。
如果键的位置是稳定的,您可以利用第一行只有键没有空格的情况。
lines = input_string.splitlines()
key_values = lines[0].split() # split first line into keys and values
column_keys = key_values[::2] # extract the keys by taking every second element
column_starts = [lines[0].find(key) for key in column_keys] # index of each key一旦到了这一步,就像知道列的长度一样继续操作。
如果列的长度已知
分隔起始索引上的列。
column_ends = column_starts[1:] + [None]
# separate all key: value lines
key_values = [
line[start:end]
# ordering is important - need to parse column-first for the next step
for start, end in zip(column_starts, column_ends)
for line in lines
]因为Sample使用多行值,所以我们不能从冒号上的值中整齐地拆分关键字。相反,我们必须跟踪之前看到的键,以便为无键的值插入它。
data = {}
for line in key_values:
if not line:
continue
# check if there is a key at the start of the line
if line[0] != ' ':
# insert key/value pairs
key, value = line.split(':', 1)
data[key.strip()] = value.strip()
else:
# append dangling values
value = line
data[key.strip()] += '\n' + value.strip()这将为您提供一个键:字符串的值字典:
{'Model': 'ARIMA',
'Dependent Variable': 'D.Sales of shampoo over a three year period',
'Date': '2018-09-24 13:20',
'No. Observations': '35',
'Df Model': '6',
'Df Residuals': '29',
'Converged': '1.0000',
'No. Iterations': '19.0000',
'AIC': '406.3399',
'BIC': '417.2273',
'Log-Likelihood': '-196.17',
'Scale': '1.0000',
'Method': 'css-mle',
'Sample': '02-01-1901\n12-01-1903',
'S.D. of innovations': '64.241',
'HQIC': '410.098'}如果需要将值转换为非字符串,我建议显式转换每个字段。您可以为每个键使用调度表来定义转换。
import time
converters = {
'Model': str, 'Dependent Variable': str,
'Date': lambda field: time.strptime(field, '%Y-%m-%d %H:%M'),
'No. Observations': int, 'Df Model': int, 'Df Residuals': int,
'Converged': float, 'No. Iterations': float, 'AIC': float,
'BIC': float, 'Log-Likelihood': float, 'Scale': float,
'Method': str, 'Sample': str, 'S.D. of innovations': float,
'HQIC': float
}
converted_data = {key: converters[key](data[key]) for key in data}https://stackoverflow.com/questions/52487275
复制相似问题