我正在尝试使用python将所有这些市场研究数据放入csv文件中,目前它位于一个未字符串的txt文件中。
Big Hit Entertainment is among the top media & adtech startups for 2020
Big Hit Entertainment is a South Korean entertainment company that currently manages soloist Lee Hyun and idol group BTS. It helps bring the music and content from various sources in one place on its innovative platform.
Founded Year: 2005
Headquarters: Seoul, Seoul-t’ukpyolsi, South Korea
Website: www.ibighit.com
Twitter: www.twitter.com/bighitent
Founders: Bang Si-Hyuk
One97 Communications is among the top media & adtech startups for 2020
One97 is a startup that delivers mobile content and commerce services to millions of mobile consumers. It does so through India’s most widely deployed telecom applications cloud platform.
Founded Year: 2000
Headquarters: Noida, Uttar Pradesh, India
Website: www.one97.com
LinkedIn: www.linkedin.com/company/one97-communications-limited
Twitter: www.twitter.com/One97
Founders: Vijay Shekhar Sharma
Woowa Bros is among the top media & adtech startups for 2020
WOOWA BROS is a Korean developer of smartphone applications and advertising platforms. Amongst other services, their portfolio also includes local marketing and web services solutions and products. Some of these are sports events app and a food delivery app.
Founded Year: 2011
Headquarters: Seoul, Seoul-t’ukpyolsi, South Korea
Website: www.woowahan.com
LinkedIn: www.linkedin.com/company/woowa-bros-
Twitter: www.twitter.com/smartbaedal
Founders: Bong Jin Kim
Wochit is among the top media & adtech startups for 2020
Wochit is revolutionizing the short-form video platform. The cloud-based video creation platform helps brands and storytellers to instantly react to any story and economically scale branded studio-quality video production.
Founded Year: 2012
Headquarters: Yehud, HaMerkaz, Israel
Website: www.wochit.com
LinkedIn: www.linkedin.com/company/wochit
Twitter: www.twitter.com/wochit
Founders: Dror Ginzberg, Ran Oz
Pixellot is among the top media & adtech startups for 2020
Pixellot is a leading sports media company. They provide automatic production solutions for the amateur and semi-professional market. With their patented technology, this startup is able to streamline the production workflow by deploying an unmanned multi-camera system that covers the entire field. The company also makes use of advanced algorithms to enable dynamic coverage of the flow of play and highlight generation.
Founded Year: 2013
Headquarters: Petah Tiqwa, HaMerkaz, Israel
Website: www.pixellot.tv
LinkedIn: www.linkedin.com/company/pixellotltd
Twitter: www.twitter.com/pixellotltd
Founders: Gal Oz, Miky Tamir
Maimai is among the top media & adtech startups for 2020
Maimai is a China-based career and social-networking platform. In just a few years since its launch, the company has gathered tens of millions of users and surpasses LinkedIn in China for most used professional social networking sites.
Founded Year: 2013
Headquarters: Beijing, Beijing, China
Website: www.maimai.cn
Founders: Fan Lin我希望以一种格式获得它,以便每个公司的名称都是行标题,列标题是
我感到困惑的是,我应该从这一点出发去做什么,或者是否已经有了资源。谢谢!
注意:这不是整个txt文件,只是一个段。
编辑:我想要这样的txt是什么?这两种解决方案在结构不同时似乎不起作用。
NestAway is one of the top proptech startups for 2020
This Bangalore-based startup is a home rental network that aims to provide better rental solutions via design and technology. Their motto is to assist customers in booking, finding, and moving into a rental home of choice across Indian cities. All of this is made possible within an application. They also help their customer’s move-in, ask for services from tap leakage to door lock broken, rental payment, etc. Alongside this, they also assist customers in moving out. Here is some more information about this venture, one of the top proptech startups for 2020.
Founding Year: 2015
Headquarters: Bangalore, India
Website: www.nestaway.com
LinkedIn: www.linkedin.com/company/9334060/
Founders: Amarendra Sahu, Deepak Dhar, Jitendra Jagadev, Smruti Parida
Ucommune is one of the top proptech startups for 2020
This startup offers co-working space solutions. They also have provision for long-term leasing, hot desk, and corporate customization and professional solutions. They provide services to small-to-medium enterprises across China, Singapore, New York City, San Francisco in California, and London in the United Kingdom. Here is some more information about this venture, one of the top proptech startups for 2020.
Founding Year: 2015
Headquarters: Beijing, China
Website: www.ucommune.com
LinkedIn: www.linkedin.com/company/ucommune
Founders: Mao Daqing发布于 2020-05-21 16:44:33
幸运的是,文本文件有一个结构,我们可以使用它来判断每条记录何时开始。诀窍是只积累描述行,直到元数据“创建的年份”出现。此时,我们可以从下面包含冒号的行中获取键/值对,并假设记录在k/v对结束时结束。
编辑
这可能是一个疯狂的游戏,你必须调整条件来弥补文本中的不一致。最后,它们可能太大,无法解释,因此文本必须手工“规范化”。我增加了第二次检查,以弥补“创始年”的差异,解决这一问题。
import csv
def text_to_csv(infile, outfile):
fields = ["Description", "Founded Year", "Headquarters",
"Website", "Founders"]
with open(infile) as in_file, open(outfile, 'w', newline='') as out_file:
writer = csv.DictWriter(out_file, fieldnames=fields)
writer.writeheader()
row={}
description = []
for line in in_file:
line = line.strip()
if not line:
continue
# read in description til metadata found
if not line.startswith("Founded Year: ") and not line.startswith("Founding Year"):
description.append(line)
continue
# metadata found.
row["Founded Year"] = line.split(";", 1)[1].strip()
for line in in_file:
line = line.strip()
if not line:
continue
try:
key, val = line.split(":",1)
key = key.strip()
if key in fields:
row[key] = val.strip()
except ValueError:
break
# end of metadata
if row and description:
row["Description"] = " ".join(description)
writer.writerow(row)
row = {}
description = []text_to_csv("test.txt","test.csv")打印(打开(“test.csv”).read()
如果在这个示例中看到的结构在整个文档中不被保存,它将以眼泪结束。
发布于 2020-05-21 16:48:14
我的回答与@tdelaney类似,但使用正则表达式来完成任务。Python的正则表达式库提供了在非结构化文本中查找特定模式的强大方法。
import re
def text2csv(inname, outname):
with open(inname, 'r') as f:
data = f.read().strip().replace('\n', '\t').replace(',', '')
info = re.findall(r'\t(.*?)\ is\ (.*?\t\t.*?)\t\t.*?Founded Year:\ (.*?)\tHeadquarters:\ (.*?)\tWebsite:\ (.*?)\t.*?\tFounders:\ (.*?)\t', data, re.MULTILINE)
with open(outname, 'w') as f:
f.write('Name,Description,Founded Year,Headquarters,Website,Founders\n')
for i in info:
f.write(','.join(i).replace('\t', '') + '\n')https://stackoverflow.com/questions/61939060
复制相似问题