首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >用于值提取、拆分数据和重新格式化的python脚本

用于值提取、拆分数据和重新格式化的python脚本
EN

Stack Overflow用户
提问于 2016-01-23 16:16:47
回答 1查看 63关注 0票数 0

这个问题很大程度上与逻辑有关,在某种程度上也与语法有关。

我正在创建一个简短的python脚本,以从数百条记录中提取一些“小贴士”的信息。到目前为止,我已经非常接近了,但是代码需要进行修改,而我似乎无法进行修改。

我有以下表格:

代码语言:javascript
复制
368 1   "Overall evaluation: 1
Invite to interview: 1
Strength or novelty of the idea (1): 2
Strength or novelty of the idea (2): 3
Strength or novelty of the idea (3): 2
Use or provision of open data (1): 2
Use or provision of open data (2): 2
""Open by default"" (1): 3
""Open by default"" (2): 2
Value proposition and potential scale (1): 2
Value proposition and potential scale (2): 2
Market opportunity and timing (1): 2
Market opportunity and timing (2): 1
Triple bottom line impact (1): 2
Triple bottom line impact (2): 2
Triple bottom line impact (3): 2
Knowledge and skills of the team (1): 3
Knowledge and skills of the team (2): 3
Capacity to realise the idea (1): 2
Capacity to realise the idea (2): 1
Capacity to realise the idea (3): 1
Appropriateness of the budget to realise the idea: 1"
368 2   "Overall evaluation: 2
Invite to interview: 3
Strength or novelty of the idea (1): 3
Strength or novelty of the idea (2): 4
Strength or novelty of the idea (3): 4
Use or provision of open data (1): 4
Use or provision of open data (2): 2
""Open by default"" (1): 3
""Open by default"" (2): 3
Value proposition and potential scale (1): 2
Value proposition and potential scale (2): 3
Market opportunity and timing (1): 3
Market opportunity and timing (2): 3
Triple bottom line impact (1): 3
Triple bottom line impact (2): 2
Triple bottom line impact (3): 1
Knowledge and skills of the team (1): 2
Knowledge and skills of the team (2): 2
Capacity to realise the idea (1): 3
Capacity to realise the idea (2): 2
Capacity to realise the idea (3): 2
Appropriateness of the budget to realise the idea: 3"

我需要获取这些值,但也需要将它们与前面的数字联系起来,因此,例如,对于第一个值,我需要这样做:

代码语言:javascript
复制
368

=2+3+3+3+4+3+2+3+2+3+2+3+2+3+2+3+2+4+3+2+3+2

=2+3+3+3+4+3+2+3+2+3+2+3+2+3+2+3+2+4+3+2+3+2

以此类推,对于更多的例子。

因此,我需要在tweeze上输出实例标识符,在本例中是368,以及与这两个评论的记录相关联的值。

我知道如何提取评论的值,例如:

代码语言:javascript
复制
with open('data.txt', 'r') as f:
    for line in f:
        number = int(line.split(':')[1])
        array.append(number)
print '+'.join(array)

但是,我不知道如何用记录标识符来呈现它,因为我试图用上面的示例演示它

编辑

数据如下:

代码语言:javascript
复制
299 1   "Overall evaluation: 3
Invite to interview: 3
Strength or novelty of the idea (1): 4
Strength or novelty of the idea (2): 3
Strength or novelty of the idea (3): 3
Use or provision of open data (1): 4
Use or provision of open data (2): 3
""Open by default"" (1): 2
""Open by default"" (2): 3
Value proposition and potential scale (1): 4
Value proposition and potential scale (2): 2
Market opportunity and timing (1): 4
Market opportunity and timing (2): 4
Triple bottom line impact (1): 4
Triple bottom line impact (2): 2
Triple bottom line impact (3): 2
Knowledge and skills of the team (1): 3
Knowledge and skills of the team (2): 4
Capacity to realise the idea (1): 4
Capacity to realise the idea (2): 3
Capacity to realise the idea (3): 4
Appropriateness of the budget to realise the idea: 3"
299 2   "Overall evaluation: 3
Invite to interview: 3
Strength or novelty of the idea (1): 3
Strength or novelty of the idea (2): 2
Strength or novelty of the idea (3): 4
Use or provision of open data (1): 4
Use or provision of open data (2): 3
""Open by default"" (1): 3
""Open by default"" (2): 2
Value proposition and potential scale (1): 4
Value proposition and potential scale (2): 3
Market opportunity and timing (1): 4
Market opportunity and timing (2): 3
Triple bottom line impact (1): 3
Triple bottom line impact (2): 2
Triple bottom line impact (3): 1
Knowledge and skills of the team (1): 4
Knowledge and skills of the team (2): 4
Capacity to realise the idea (1): 4
Capacity to realise the idea (2): 4
Capacity to realise the idea (3): 4
Appropriateness of the budget to realise the idea: 2"

364 1   "Overall evaluation: 3
Invite to interview: 3
Strength or novelty of the idea (1): 4
Strength or novelty of the idea (2): 1
Strength or novelty of the idea (3): 3
Use or provision of open data (1): 3
Use or provision of open data (2): 3
""Open by default"" (1): 3
""Open by default"" (2): 3
Value proposition and potential scale (1): 4
Value proposition and potential scale (2): 4
Market opportunity and timing (1): 4
Market opportunity and timing (2): 4
Triple bottom line impact (1): 4
Triple bottom line impact (2): 4
Triple bottom line impact (3): 3
Knowledge and skills of the team (1): 3
Knowledge and skills of the team (2): 3
Capacity to realise the idea (1): 4
Capacity to realise the idea (2): 3
Capacity to realise the idea (3): 3
Appropriateness of the budget to realise the idea: 3"
364 2   "Overall evaluation: 3
Invite to interview: 3
Strength or novelty of the idea (1): 4
Strength or novelty of the idea (2): 3
Strength or novelty of the idea (3): 3
Use or provision of open data (1): 4
Use or provision of open data (2): 4
""Open by default"" (1): 4
""Open by default"" (2): 3
Value proposition and potential scale (1): 4
Value proposition and potential scale (2): 3
Market opportunity and timing (1): 2
Market opportunity and timing (2): 3
Triple bottom line impact (1): 4
Triple bottom line impact (2): 4
Triple bottom line impact (3): 1
Knowledge and skills of the team (1): 3
Knowledge and skills of the team (2): 3
Capacity to realise the idea (1): 2
Capacity to realise the idea (2): 4
Capacity to realise the idea (3): 4
Appropriateness of the budget to realise the idea: 2"
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-01-23 17:18:19

这就是我会做的。这做了你的工作,不是完美的,而是做的。

而且,1.txt和您的文本是相同的。

代码语言:javascript
复制
#!/usr/bin/python

f=open("1.txt",'r').read().splitlines()
head='0'
body=[]
for x in f:
    if x=="\n" or x.strip()=='':
        continue
    try:
        int(x[0])
        print(head +':'+'+'.join(body))
        tmp=x.split()
        head=tmp[0]+'-'+tmp[1]
        body=[tmp[4]]
    except ValueError as e:
        body.append(x.split(':')[1].strip().strip('\"'))
print(head +':'+'+'.join(body))

产出将是:

代码语言:javascript
复制
0:
299-1:3+3+4+3+3+4+3+2+3+4+2+4+4+4+2+2+3+4+4+3+4+3
299-2:3+3+3+2+4+4+3+3+2+4+3+4+3+3+2+1+4+4+4+4+4+2
364-1:3+3+4+1+3+3+3+3+3+4+4+4+4+4+4+3+3+3+4+3+3+3
364-2:3+3+4+3+3+4+4+4+3+4+3+2+3+4+4+1+3+3+2+4+4+2

现在可以跳过第一次打印,方法是添加数组长度检查,这样就不打印0:行了。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/34965758

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档