我有一个大的文本文件,格式如下。
{
"glossary": {
"title": "example glossary",
cm="私は今プログラミングーをしています";
"text2": "example glossary",
cm="私はABあああをしています"
}我需要注释掉包含日语字符的行。此行开头有4个或多个选项卡。每行的制表符计数各不相同。我需要修改上面的文件,如下:
{
"glossary": {
"title": "example glossary",
*/cm="私は今プログラミングーをしています";*/
"text2": "example glossary",
*/cm="私はABあああをしています";*/
}环境:
★我可以运行一个批处理文件。
★我可以运行VB脚本。
★我可以使用樱花编辑器。(首选)
★我无法使用/下载第三方软件。
我试过的东西。
使用正则表达式➞,我尝试将日语文本替换为"“使用正则表达式\p{平假名},然后\p{片假名},然后\p{Han},但这些仍然是符号。
使用VBA时,我尝试使用vba读取文本文件的每一行,并将匹配的行替换为"*/“。我不知道为什么,但它替换了整个文件。我使用的代码如下:
Set objFSO = CreateObject("Scripting.FileSystemObject")
If objFSO.FileExists("C:\Users\s162138\Desktop\test.txt") then
Set objFile = objFSO.OpenTextFile("C:\Users\s162138\Desktop\test.txt", 1)
Do Until objFile.AtEndOfStream
strLine = objFile.Readline
If strNextLine = "cm=*" then
strLine = "text"+ strLine + "text"
End If
strNewText = strLine + vbcrlf
Loop
Set objFile = Nothing
Set objFile = objFSO.OpenTextFile("C:\Users\s162138\Desktop\test.txt", 2)
objFile.Write strNewText
Set objFile = Nothing
End If如果有人能帮助我,我将不胜感激..
发布于 2020-08-19 04:16:06
像这样使用https://gist.github.com/ryanmcgrath/982242提供的日语正则表达式:
^([ \t]*)(.*?(?:[\u3000-\u303F]|[\u3040-\u309F]|[\u30A0-\u30FF]|[\uFF00-\uFFEF]|[\u4E00-\u9FAF]|[\u2605-\u2606]|[\u2190-\u2195]|\u203B).*?)([ \t]*)$替换为$1/*$2*/$3。参见proof。
说明
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[ \t]* any character of: ' ', '\t' (tab) (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
[\u3000-\u303F] punctuation
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
[\u3040-\u309F] hiragana
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
[\u30A0-\u30FF] katakana
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
[\uFF00-\uFFEF] Full-width roman + half-width katakana
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
[\u4E00-\u9FAF] Common and uncommon kanji
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
[\u2605-\u2606] Stars
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
[\u2190-\u2195] arrows
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\u203B Weird asterisk thing
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
[ \t]* any character of: ' ', '\t' (tab) (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
stringhttps://stackoverflow.com/questions/63467809
复制相似问题