我的代码无法检测操作符以及非英语字符的用法:
const OPERATOR_REGEX = new RegExp(
/(?!\B"[^"|“|”]*)\b(and|or|not|exclude)(?=.*[\s])\b(?![^"|“|”]*"\B)/,
'giu'
);
const query1 = '(Java or "化粧" or 化粧品)';
const query2 = '(Java or 化粧 or 化粧品)';
console.log(query1.split(OPERATOR_REGEX));
console.log(query2.split(OPERATOR_REGEX));https://codepen.io/thewebtud/pen/vYraavd?editors=1111
相同的代码成功地使用unicode标志:https://regex101.com/r/FC84BH/1检测regex101.com上的所有操作符。
如何为JS修复这个问题?
发布于 2022-11-29 09:46:04
记住
\b (word boundary)可以写成(?:(?<=^)(?=\w)|(?<=\w)(?=$)|(?<=\W)(?=\w)|(?<=\w)(?=\W)) and\B (non-word boundary)可以写成(?:(?<=^)(?=\W)|(?<=\W)(?=$)|(?<=\W)(?=\W)|(?<=\w)(?=\w))支持Unicode的\w模式是[\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}] (参见),下面是ECMAScript 2018+解决方案:
const w = String.raw`[\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}]`;
const nw = String.raw`[^\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Connector_Punctuation}\p{Join_Control}]`;
const uwb = String.raw`(?:(?<=^)(?=${w})|(?<=${w})(?=$)|(?<=${nw})(?=${w})|(?<=${w})(?=${nw}))`;
const unwb = String.raw`(?:(?<=^)(?=${nw})|(?<=${nw})(?=$)|(?<=${nw})(?=${nw})|(?<=${w})(?=${w}))`;
const OPERATOR_REGEX = new RegExp(
String.raw`(?!${unwb}"[^"“”]*)${uwb}(and|or|not|exclude)(?=.*\s)${uwb}(?![^"“”]*"${unwb})`,
'giu'
);
const query1 = '(Java or "化粧" or 化粧品)';
const query2 = '(Java or 化粧 or 化粧品)';
console.log(query1.split(OPERATOR_REGEX));
console.log(query2.split(OPERATOR_REGEX));
https://stackoverflow.com/questions/74611855
复制相似问题