给定输入
let sentence = `browser's
emoji
rød
continuïteit a-b c+d
D-er går en
المسجد الحرام
٠١٢٣٤٥٦٧٨٩
তার মধ্যে আশ্চর্য`;所需输出
我希望每个单词和空格都封装在<span>的中,表示是一个单词或空格
每个<span>都有带有值的type属性:
示例
<span type="w">D</span><span type="t">-</span>
<span type="w">er</span><span type="t"> </span>
<span type="w">går</span>
<span type="t"> </span><span type="w">en</span>
<span type="w">المسجد</span>
<span type="t"> </span><span type="w">الحرام</span>
<span type="t"> </span>
<span type="w">তার</span><span type="t"> </span>
<span type="w">মধ্যে</span><span type="t"> </span>
<span type="w">আশ্চর্য</span>思想考察
搜索堆栈交换
用字符分裂的Unicode字符串引导我到回答这个问题使用Unicode属性基座
使用split(/\w/)和split(/\W/)字边界。
当MDN报告RegEx \w和'W时,ASCII上的分裂
\w和\W只匹配基于ASCII的字符;例如,a到z、A到Z、0到9和_。
使用split("")
使用sentence.split("")将表情符号拆分为它的unicode字节。
Unicode码点属性Grapheme_Base和Grapheme_Extend
const matchGrapheme =
/\p{Grapheme_Base}\p{Grapheme_Extend}|\p{Grapheme_Base}/gu;
let result = sentence.match(matchGrapheme);
console.log("Grapheme_Base (+Grapheme_Extend)", result);分词,但仍有全部内容。
Unicode属性标点符号和White_Space
const matchPunctuation = /[\p{Punctuation}|\p{White_Space}]+/ug;
let punctuationAndWhiteSpace = sentence.match(matchPunctuation);
console.log("Punctuation/White_Space", punctuationAndWhiteSpace);好像是在取非词。
发布于 2022-02-28 13:16:32
通过将字形素基/字根扩展和标点符号/空格结果结合起来,我们可以循环遍历整个字形分割内容,并使用标点符号列表。
let sentence = `browser's
emoji
rød
continuïteit a-b c+d
D-er går en
المسجد الحرام
٠١٢٣٤٥٦٧٨٩
তার মধ্যে আশ্চর্য`;
const matchGrapheme = /\p{Grapheme_Base}\p{Grapheme_Extend}|\p{Grapheme_Base}/gu;
const matchPunctuation = /\p{Punctuation}|\p{White_Space}/ug;
sentence.split(/\n|\r\n/).forEach((v, i) => {
console.log(`Line ${i} ${v}`);
const graphs = v.match(matchGrapheme);
const puncts = v.match(matchPunctuation) || [];
console.log(graphs, puncts);
const words = [];
let word = "";
const items = [];
graphs.forEach((v, i, a) => {
const char = v;
if (puncts.length > 0 && char === puncts[0]) {
words.push(word);
items.push({ type: "w", value: "" + word });
word = "";
items.push({ type: "t", value: "" + v });
puncts.shift();
} else {
word += char;
}
});
if (word) {
words.push(word);
items.push({ type: "w", value: "" + word });
}
console.log("Words", words.join(" || "));
console.log("Items", items[0]);
// Rejoin wrapped in '<span>'
const l = items.map((v) => `<span type="${v.type}">${v.value}</span>`).join(
"",
);
console.log(l);
});发布于 2022-02-28 20:12:01
您还可以使用replace()、split()和join()的组合。
const sentence = `browser's
emoji
rød
continuïteit a-b c+d
D-er går en
المسجد الحرام
٠١٢٣٤٥٦٧٨٩
তার মধ্যে আশ্চর্য`;
const splitP = (sentence) => {
const oneLine = sentence.replace(/[\r\n]/g, " "); // replace all \r\ns by spaces
const splitted = oneLine.split(" ").filter(x => x); // split & filter out falsy values
return `<span>${splitted.join("</span><span>")}</span>`; // join with span tags
}
console.log(splitP(sentence));
如果你喜欢单线解决方案的话。
const sentence = `browser's
emoji
rød
continuïteit a-b c+d
D-er går en
المسجد الحرام
٠١٢٣٤٥٦٧٨٩
তার মধ্যে আশ্চর্য`;
const result = `<span>${sentence.replace(/[\r\n]/g, " ").split(" ").filter(x => x).join("</span><span>")}</span>`;
console.log(result);
https://stackoverflow.com/questions/71295592
复制相似问题