我有一个这样的文本列表--我想数数这里面的行数。我想用". "来分割它,但是它会为像"a.m."这样的单词创建额外的行。请帮帮我!
[
'when people hear ai they often think about sentient robots and magic boxes. ai today is much more mundane and simple—but that doesn’t mean it’s not powerful. another misconception is that high-profile research projects can be applied directly to any business situation. ai done right can create an extreme return on investments (rois)—for instance through automation or precise prediction. but it does take thought, time, and proper implementation. we have seen that success and value generated by ai projects are increased when there is a grounded understanding and expectation of what the technology can deliver from the c-suite down.',
'“artificial intelligence (ai) is a science and a set of computational technologies that are inspired by—but typically operate quite differently from—the ways people use their nervous systems and bodies to sense, learn, reason and take action.”3 lately there has been a big rise in the day-to-day use of machines powered by ai. these machines are wired using cross-disciplinary approaches based on mathematics, computer science, statistics, psychology, and more.4 virtual assistants are becoming more common, most of the web shops predict your purchases, many companies make use of chatbots in their customer service and many companies use algorithms to detect fraud.',
'ai and deep learning technology employed in office entry systems will bring proper time tracking of each employee. as this system tries to learn each person with an image processing technology whose data is feed forwarded to a deep learning model where deep learning isn’t an algorithm per se, but rather a family of algorithms that implements deep networks (many layers). these networks are so deep that new methods of computation, such as graphics processing units (gpus), are required to train them, in addition to clusters of compute nodes. so using deep learning we can take detect the employee using face and person recognition scan and through which login, logout timing is recorded. using an ai system we can even identify each employee’s entry time, their working hours, non-working hours by tracking the movement of an employee in the office so that system can predict and report hr for the salary for each employee based on their working hours. our system can take feed from cctv to track movements of employees and this system is capable of recognizing a person even he/she is being masked as in this pandemic situation by taking their iris scan. with this system installed inside the office, the following are some of the benefits:',
'for several countries, regulations insist that the employer must keep documents available that can demonstrate the working hours performed by each employee. in the event of control from the labor inspectorate or a dispute with an employee, the employer must be able to explain and justify the working hours for the company. this can be made easy as our system is tracking employee movements',
'this is about monitoring user connection times to detect suspicious access times. in the event where compromised credentials are used to log on at 3 a.m. on a saturday, a notification on this access could alert the it team that an attack is possibly underway.',
'to manage and react to employees’ attendance, overtime thresholds, productivity, and suspicious access times, our system records and stores detailed and interactive reporting on users’ connection times. these records allow you to better manage users’ connection times and provide accurate, detailed data required by management.',
'4)if you want to avoid paying overtime, make sure that your employees respect certain working time quotas or even avoid suspicious access. our system will alert the hr officer about each employee’s office in and out time so that they can accordingly take action.',
'5)last but not least it reduces human resource needs to keep track of the records and sending the report to hr and hr officials has to check through the report so this system will reduce times and human resource needs',
'with the use of ai and deep learning technologies, we can automate some routines stuff with more functionality which humans need more resources to keep track thereby reducing time spent on manual data entry works rather companies can think of making their position high in the competitive world.'
]发布于 2022-04-04 19:52:55
分句并不是一项琐碎的任务。我建议你使用像NLTK这样的现成图书馆。
import nltk
text = "..." # your raw text
sentences = nltk.sent_tokenize(text)这是相当好的工作,但不要期望完美的结果。
发布于 2022-04-04 19:44:47
我有一张这样的短信清单--我想数不出来。这里面的台词。我想把它分成“。”
好主意,但是如果你想数数行数,为什么不直接取lenght of the list
lenght = len(that_long_text_list)然后,您可以将此lenght与.相加,通过regex表达式(如a.m. )删除。为此,请检查this关于regex模式匹配的缩略语问题。
发布于 2022-04-04 20:00:21
因为我们不知道在定义一行时有多少个字符适合,所以我们不能确定。我的建议如下:
所以我的代码如下:
raw_text = # Your raw text in the form of a list
line_count = 0
chars_per_line = # Whatever the number of characters in a line of yours is
for paragraph in raw_text:
line_count += (paragraph // chars_per_line) + 1不是完美的,因为它没有考虑到音节化的规则,但我想足够接近了。
https://stackoverflow.com/questions/71742886
复制相似问题