首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >删除jsonl文件末尾的所有EOFs (额外的空行)。

删除jsonl文件末尾的所有EOFs (额外的空行)。
EN

Stack Overflow用户
提问于 2020-06-07 13:53:56
回答 1查看 301关注 0票数 2

我正在处理jsonl文件,这些文件在VSCode编辑器中如下所示:

first.jsonl

代码语言:javascript
复制
1.{"ConnectionTime": 730669.644775033,"objectId": "eHFvTUNqTR","CustomName": "Relay Controller","FirmwareRevision": "FW V1.96","DeviceID": "F1E4746E-DCEC-495B-AC75-1DFD66527561","PeripheralType": 9,"updatedAt": "2016-12-13T15:50:41.626Z","Model": "DF Bluno","HardwareRevision": "HW V1.7","Serial": "0123456789","createdAt": "2016-12-13T15:50:41.626Z","Manufacturer": "DFRobot"}
2.{"ConnectionTime": 702937.7616419792, "objectId": "uYuT3zgyez", "CustomName": "Relay Controller", "FirmwareRevision": "FW V1.96", "DeviceID": "F1E4746E-DCEC-495B-AC75-1DFD66527561", "PeripheralType": 9, "updatedAt": "2016-12-13T08:08:29.829Z", "Model": "DF Bluno", "HardwareRevision": "HW V1.7", "Serial": "0123456789", "createdAt": "2016-12-13T08:08:29.829Z", "Manufacturer": "DFRobot"}
3.
4.
5.
6.

second.jsonl

代码语言:javascript
复制
1.{"ConnectionTime": 730669.644775033,"objectId": "eHFvTUNqTR","CustomName": "Relay Controller","FirmwareRevision": "FW V1.96","DeviceID": "F1E4746E-DCEC-495B-AC75-1DFD66527561","PeripheralType": 9,"updatedAt": "2016-12-13T15:50:41.626Z","Model": "DF Bluno","HardwareRevision": "HW V1.7","Serial": "0123456789","createdAt": "2016-12-13T15:50:41.626Z","Manufacturer": "DFRobot"}
2.{"ConnectionTime": 702937.7616419792, "objectId": "uYuT3zgyez", "CustomName": "Relay Controller", "FirmwareRevision": "FW V1.96", "DeviceID": "F1E4746E-DCEC-495B-AC75-1DFD66527561", "PeripheralType": 9, "updatedAt": "2016-12-13T08:08:29.829Z", "Model": "DF Bluno", "HardwareRevision": "HW V1.7", "Serial": "0123456789", "createdAt": "2016-12-13T08:08:29.829Z", "Manufacturer": "DFRobot"}
3.
4.

然后,更多的,有随机数目的结束线/ EOF标记。我希望在每个文件的末尾有单行或空行。我一直在使用以下方法获取错误raise JSONDecodeError("Expecting value", s, err.value) from Nonejson.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1)

代码语言:javascript
复制
filenames = glob.glob("folder_with_all_jsonl/*.jsonl")

#read file by file, write file by file. Simple.

for f in filenames:
#path to the jsonl file/s 
    data_json = io.open(f, mode='r', encoding='utf-8-sig') # Opens in the JSONL file
    data_python = extract_json(data_json)
#.....code omitted
    for line in data_python: # it would fail here because of an empty line
        print(line.get(objectId))
        #and so on

我手动删除了一些额外的行,并且能够处理我的2个jsonl文件。

我看过这些板:

1>Removing a new line feed in json file using Python.

2>Replace multiple newlines with single newlines during reading file

请给我提示/帮助。我会感激你的!!

我希望每个文件都采用这种格式: first.jsonl

代码语言:javascript
复制
1.{"ConnectionTime": 730669.644775033,"objectId": "eHFvTUNqTR","CustomName": "Relay Controller","FirmwareRevision": "FW V1.96","DeviceID": "F1E4746E-DCEC-495B-AC75-1DFD66527561","PeripheralType": 9,"updatedAt": "2016-12-13T15:50:41.626Z","Model": "DF Bluno","HardwareRevision": "HW V1.7","Serial": "0123456789","createdAt": "2016-12-13T15:50:41.626Z","Manufacturer": "DFRobot"}
2.{"ConnectionTime": 702937.7616419792, "objectId": "uYuT3zgyez", "CustomName": "Relay Controller", "FirmwareRevision": "FW V1.96", "DeviceID": "F1E4746E-DCEC-495B-AC75-1DFD66527561", "PeripheralType": 9, "updatedAt": "2016-12-13T08:08:29.829Z", "Model": "DF Bluno", "HardwareRevision": "HW V1.7", "Serial": "0123456789", "createdAt": "2016-12-13T08:08:29.829Z", "Manufacturer": "DFRobot"}

编辑:我使用了正阳歌曲答案和chepner建议,我实际上有两个4gb文件,这样做:

代码语言:javascript
复制
results = []
for f in glob.glob("folder_with_all_jsonl/*.jsonl"):
    with open(f, 'r', encoding='utf-8-sig') as infile:
        for line in infile:
            try:
                results.append(json.loads(line)) # read each line of the file
            except ValueError:
                print(f)
    with open(f,'w', encoding= 'utf-8-sig') as outfile:
        for result in results:
            outfile.write(json.dumps(result) + "\n")

导致错误的line 852, in start _start_new_thread(self._bootstrap, ()) RuntimeError: can't start new thread,我在我的个人窗口机器。

编辑2:我迁移到了我的工作机器上,并且能够解决这个问题。任何输入,我们如何才能防止这种情况出现在个人机器上?就像并行处理??

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-06-07 14:10:07

只是为了回应你的最后一个代码片段。

你可以换行

代码语言:javascript
复制
json.dump(result, outfile, indent=None)

这样的事情:

代码语言:javascript
复制
for one_item in result:
    outfile.write(json.dumps(one_item)+"\n")
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62246370

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档