首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何计算文本中的行数?

如何计算文本中的行数?
EN

Stack Overflow用户
提问于 2022-04-04 19:37:44
回答 3查看 74关注 0票数 0

我有一个这样的文本列表--我想数数这里面的行数。我想用". "来分割它,但是它会为像"a.m."这样的单词创建额外的行。请帮帮我!

代码语言:javascript
复制
[
 'when people hear ai they often think about sentient robots and magic boxes. ai today is much more mundane and simple—but that doesn’t mean it’s not powerful. another misconception is that high-profile research projects can be applied directly to any business situation. ai done right can create an extreme return on investments (rois)—for instance through automation or precise prediction. but it does take thought, time, and proper implementation. we have seen that success and value generated by ai projects are increased when there is a grounded understanding and expectation of what the technology can deliver from the c-suite down.', 
 '“artificial intelligence (ai) is a science and a set of computational technologies that are inspired by—but typically operate quite differently from—the ways people use their nervous systems and bodies to sense, learn, reason and take action.”3 lately there has been a big rise in the day-to-day use of machines powered by ai. these machines are wired using cross-disciplinary approaches based on mathematics, computer science, statistics, psychology, and more.4 virtual assistants are becoming more common, most of the web shops predict your purchases, many companies make use of chatbots in their customer service and many companies use algorithms to detect fraud.', 
 'ai and deep learning technology employed in office entry systems will bring proper time tracking of each employee. as this system tries to learn each person with an image processing technology whose data is feed forwarded to a deep learning model where deep learning isn’t an algorithm per se, but rather a family of algorithms that implements deep networks (many layers). these networks are so deep that new methods of computation, such as graphics processing units (gpus), are required to train them, in addition to clusters of compute nodes. so using deep learning we can take detect the employee using face and person recognition scan and through which login, logout timing is recorded. using an ai system we can even identify each employee’s entry time, their working hours, non-working hours by tracking the movement of an employee in the office so that system can predict and report hr for the salary for each employee based on their working hours. our system can take feed from cctv to track movements of employees and this system is capable of recognizing a person even he/she is being masked as in this pandemic situation by taking their iris scan. with this system installed inside the office, the following are some of the benefits:', 
 'for several countries, regulations insist that the employer must keep documents available that can demonstrate the working hours performed by each employee. in the event of control from the labor inspectorate or a dispute with an employee, the employer must be able to explain and justify the working hours for the company. this can be made easy as our system is tracking employee movements', 
 'this is about monitoring user connection times to detect suspicious access times. in the event where compromised credentials are used to log on at 3 a.m. on a saturday, a notification on this access could alert the it team that an attack is possibly underway.',  
 'to manage and react to employees’ attendance, overtime thresholds, productivity, and suspicious access times, our system records and stores detailed and interactive reporting on users’ connection times. these records allow you to better manage users’ connection times and provide accurate, detailed data required by management.', 
 '4)if you want to avoid paying overtime, make sure that your employees respect certain working time quotas or even avoid suspicious access. our system will alert the hr officer about each employee’s office in and out time so that they can accordingly take action.', 
 '5)last but not least it reduces human resource needs to keep track of the records and sending the report to hr and hr officials has to check through the report so this system will reduce times and human resource needs', 
 'with the use of ai and deep learning technologies, we can automate some routines stuff with more functionality which humans need more resources to keep track thereby reducing time spent on manual data entry works rather companies can think of making their position high in the competitive world.'
]
EN

回答 3

Stack Overflow用户

发布于 2022-04-04 19:52:55

分句并不是一项琐碎的任务。我建议你使用像NLTK这样的现成图书馆。

代码语言:javascript
复制
import nltk

text = "..." # your raw text
sentences = nltk.sent_tokenize(text)

这是相当好的工作,但不要期望完美的结果。

票数 1
EN

Stack Overflow用户

发布于 2022-04-04 19:44:47

我有一张这样的短信清单--我想数不出来。这里面的台词。我想把它分成“。”

好主意,但是如果你想数数行数,为什么不直接取lenght of the list

代码语言:javascript
复制
lenght = len(that_long_text_list)

然后,您可以将此lenght.相加,通过regex表达式(如a.m. )删除。为此,请检查this关于regex模式匹配的缩略语问题。

票数 0
EN

Stack Overflow用户

发布于 2022-04-04 20:00:21

因为我们不知道在定义一行时有多少个字符适合,所以我们不能确定。我的建议如下:

  1. 将文本分割成段落,因为新段落的开头将在任何情况下引入行间隔。这很简单--列表中的每一项都是一段。
  2. 对于这些段落中的每一段,将其除以一行中的字符数,再舍入到最接近的整整数。

所以我的代码如下:

代码语言:javascript
复制
raw_text = # Your raw text in the form of a list
line_count = 0
chars_per_line = # Whatever the number of characters in a line of yours is
for paragraph in raw_text:
    line_count += (paragraph // chars_per_line) + 1

不是完美的,因为它没有考虑到音节化的规则,但我想足够接近了。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71742886

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档