给定以下文本
var text="unicorns! and rainbows? and, cupcakes.Hello this is splitting by sentences. However, I am not sure.";我想在每个句号处拆分,在句子的末尾有一个句号,它将句子拆分成一个空字符串,如下所示。
(4) ["unicorns! and rainbows? and, cupcakes", "Hello this is splitting by sentences", " However, I am not sure", ""]什么是在句点进行拆分的好方法。但是考虑到文本的结尾呢?
发布于 2018-08-03 06:18:17
您可以使用.filter(Boolean)去掉任何空字符串,如下所示:
var text="unicorns! and rainbows? and, cupcakes.Hello this is splitting by sentences. However, I am not sure.";
var splitText = text.split(".");
var nonEmpty = splitText.filter(Boolean);
// var condensed = text.split(".").filter(Boolean);
console.log(nonEmpty);
这可能看起来像是一种奇怪的方式,但它很简单/高效,其概念是这样的:
var arr = ["foo", "bar", "", "baz", ""];
var nonEmpty = arr.filter(function (str) {
return Boolean(str);
});这使用强制的力量来确定字符串是否为空。事实上,将强制为false的字符串的唯一值是空字符串""。所有其他字符串值将强制为true。这就是为什么我们可以使用布尔构造函数来检查字符串是否为空。
此外,如果您想要修剪掉每个句子的前导/尾随空格,可以使用.trim()方法,如下所示:
var text="unicorns! and rainbows? and, cupcakes.Hello this is splitting by sentences. However, I am not sure.";
var nonEmpty = text.split(".").filter(Boolean).map(str => str.trim());
console.log(nonEmpty);
发布于 2018-08-03 06:22:55
这就是String#split的工作方式(这在某种程度上是合乎逻辑的)。字符串中的.后面没有任何内容,因此它应该是一个空字符串。如果你想去掉数组中的空字符串,你可以使用Array#filter过滤掉它们(使用一个箭头函数来简化它):
var result = text.split(".").filter(s => s); // an empty string is falsy so it will be excluded或者将String#match与简单的正则表达式一起使用,例如:
var result = text.match(/[^.]+/g); // matches any sequence of character that are not a '.'示例:
var text="unicorns! and rainbows? and, cupcakes.Hello this is splitting by sentences. However, I am not sure.";
var resultFilter = text.split(".").filter(x => x);
var resultMatch = text.match(/[^.]+/g);
console.log("With filter:", resultFilter);
console.log("With match:", resultMatch);
发布于 2018-08-03 06:27:23
将filter(Boolean)添加到split当然是一种变通方法,但是可以通过向split提供正则表达式来直接(并且灵活地)处理该问题。
例如,您可以拆分完全忽略句点的正则表达式或保留所有句点(或其他标点符号)的正则表达式:
const text = "unicorns! and rainbows? and, cupcakes.Hello this is splitting by sentences. However, I am not sure.";
// discard periods
console.log(text.match(/[^.]+/g));
// discard periods and leading whitespace
console.log([...text.matchAll(/(.+?)(?:\.\s*)/g)].map(e => e[1]));
// keep periods
console.log(text.match(/(.+?)\./g));
// keep periods periods but trim whitespace
console.log([...text.matchAll(/(.+?\.)\s*/g)].map(e => e[1]));
// discard various sentence-related punctuation
console.log(text.match(/[^.?!]+/g));
https://stackoverflow.com/questions/51662845
复制相似问题