首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >将文本文件分成两个不同的部分

将文本文件分成两个不同的部分
EN

Stack Overflow用户
提问于 2016-06-02 04:09:39
回答 3查看 43关注 0票数 0

我编写了一个简单的脚本来收集JSON文件中的标题列表,并生成了一个包含该列表的文本文件。

结果如下:

代码语言:javascript
复制
Animal geography
Autobiogeography
Chorography
Economic geography
Footloose industry
Geomorphometry
Health geography
Human geography
Military geography
Philosophy of geography
Physical geography
Political geography
Regional geography
Satirical cartography
Settlement geography
Transport geography
Vernacular geography
Visual geography
Category:Cartography
Category:Economic geography
Category:Geodemography
Category:Human geography
Category:Military geography
Category:Physical geography
Category:Political geography
Category:Regional geography
Category:Settlement geography
Category:Topography
Category:Toponymy
Category:Transportation geography
Category:Vernacular geography
Category:Geography by place  

问题:

我现在面临的问题是如何将文本文件分成两部分:

第一部分是文本文件,包含:

代码语言:javascript
复制
Animal geography
Autobiogeography
Chorography
Economic geography
Footloose industry
Geomorphometry
Health geography
Human geography
Military geography
Philosophy of geography
Physical geography
Political geography
Regional geography
Satirical cartography
Settlement geography
Transport geography
Vernacular geography
Visual geography

第二个文本文件包含以“类别”一词开头的文本文件:

代码语言:javascript
复制
Category:Cartography
Category:Economic geography
Category:Geodemography
Category:Human geography
Category:Military geography
Category:Physical geography
Category:Political geography
Category:Regional geography
Category:Settlement geography
Category:Topography
Category:Toponymy
Category:Transportation geography
Category:Vernacular geography
Category:Geography by place  

我完全不知道该怎么做。请指点。

抱歉,标题太混乱了。我不知道如何解释我的问题。

谢谢。

编辑

例如,我从这个API (https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3ABranches%20of%20geography&cmlimit=100)中提取了所有标题:

代码语言:javascript
复制
{  
   "batchcomplete":"",
   "query":{  
      "categorymembers":[  
         {  
            "pageid":5259784,
            "ns":0,
            "title":"Animal geography"
         },
         {  
            "pageid":8670379,
            "ns":0,
            "title":"Autobiogeography"
         },
         {  
            "pageid":4254743,
            "ns":0,
            "title":"Chorography"
         },
         {  
            "pageid":177512,
            "ns":0,
            "title":"Economic geography"
         },
         {  
            "pageid":7907104,
            "ns":0,
            "title":"Footloose industry"
         },
         {  
            "pageid":5155886,
            "ns":0,
            "title":"Geomorphometry"
         },
         {  
            "pageid":2596739,
            "ns":0,
            "title":"Health geography"
         },
         {  
            "pageid":13372,
            "ns":0,
            "title":"Human geography"
         },
         {  
            "pageid":1794929,
            "ns":0,
            "title":"Military geography"
         },
         {  
            "pageid":5886597,
            "ns":0,
            "title":"Philosophy of geography"
         },
         {  
            "pageid":23263,
            "ns":0,
            "title":"Physical geography"
         },
         {  
            "pageid":1845092,
            "ns":0,
            "title":"Political geography"
         },
         {  
            "pageid":711230,
            "ns":0,
            "title":"Regional geography"
         },
         {  
            "pageid":42099944,
            "ns":0,
            "title":"Satirical cartography"
         },
         {  
            "pageid":33566568,
            "ns":0,
            "title":"Settlement geography"
         },
         {  
            "pageid":9710174,
            "ns":0,
            "title":"Transport geography"
         },
         {  
            "pageid":24644075,
            "ns":0,
            "title":"Vernacular geography"
         },
         {  
            "pageid":5329197,
            "ns":0,
            "title":"Visual geography"
         },
         {  
            "pageid":716309,
            "ns":14,
            "title":"Category:Cartography"
         },
         {  
            "pageid":2021084,
            "ns":14,
            "title":"Category:Economic geography"
         },
         {  
            "pageid":2245786,
            "ns":14,
            "title":"Category:Geodemography"
         },
         {  
            "pageid":1111700,
            "ns":14,
            "title":"Category:Human geography"
         },
         {  
            "pageid":7774333,
            "ns":14,
            "title":"Category:Military geography"
         },
         {  
            "pageid":2153059,
            "ns":14,
            "title":"Category:Physical geography"
         },
         {  
            "pageid":1898464,
            "ns":14,
            "title":"Category:Political geography"
         },
         {  
            "pageid":6645804,
            "ns":14,
            "title":"Category:Regional geography"
         },
         {  
            "pageid":44706236,
            "ns":14,
            "title":"Category:Settlement geography"
         },
         {  
            "pageid":6517504,
            "ns":14,
            "title":"Category:Topography"
         },
         {  
            "pageid":1086902,
            "ns":14,
            "title":"Category:Toponymy"
         },
         {  
            "pageid":41335672,
            "ns":14,
            "title":"Category:Transportation geography"
         },
         {  
            "pageid":24727902,
            "ns":14,
            "title":"Category:Vernacular geography"
         }
      ]
   }
}

如果你能为我指出解决这个问题的正确方向,我真的很感激。

谢谢大家的帮助和指导。

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2016-06-02 04:30:26

你可以试试这个:

代码语言:javascript
复制
with open('file.txt', 'r') as f:

    data = []
    category = []

    lines = f.readlines()

    for line in lines:
        if line.startswith('Category'):
            category.append(line)
        else:
            data.append(line)

    cat_file = open('category.txt', 'w')
    data_file = open('data.txt', 'w')

    cat_file.write(''.join(category))
    data_file.write(''.join(data))

    cat_file.close()
    data_file.close()

它逐行读取文件file.txt,并测试它是否以“类别”开头。如果是这样的话,它将该行添加到category数组,如果不是,则添加到data数组中。

处理完文件后,程序将所有行合并,并将它们写入category.txt和data.txt。

希望能帮上忙。

票数 0
EN

Stack Overflow用户

发布于 2016-06-02 04:35:12

要测试文件中的一行是否以“类别:”开头,只需执行以下操作:

代码语言:javascript
复制
with open("file.txt", "r") as f:
    for line in f.read().splitlines():
        if line[0:8] == "Category":
            <here your code that writes "Category:" lines in a new file>
        else:
            <here your code that writes all other lines in a new file>
票数 1
EN

Stack Overflow用户

发布于 2016-06-02 04:37:49

谢谢你告诉我要用“in”

代码语言:javascript
复制
f1 = open('List.text', 'r')
f2 = open('WordWithCat.text', 'w')
f3 = open('WordwithoutCat.text', 'w')
query = 'Category:'
lines = f1.read().splitlines()

for  line in lines:

    if query in line:
        f2.write(line+'\n')

    else:

        f3.write(line+'\n')

事实证明,事情并没有我想象的那么复杂。谢谢大家的帮助和指导。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/37582513

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档