文章/答案/技术大牛

发布

社区首页 >问答首页 >Regex/注释日语文本的方法

问Regex/注释日语文本的方法
EN

Stack Overflow用户

提问于 2020-08-18 19:37:08

回答 1查看 76关注 0票数 1

我有一个大的文本文件，格式如下。

{
    "glossary": {
        "title": "example glossary",
        cm="私は今プログラミングーをしています"; 
        "text2": "example glossary",
        cm="私はABあああをしています"
}

我需要注释掉包含日语字符的行。此行开头有4个或多个选项卡。每行的制表符计数各不相同。我需要修改上面的文件，如下：

{
    "glossary": {
        "title": "example glossary",
        */cm="私は今プログラミングーをしています";*/
        "text2": "example glossary",
        */cm="私はABあああをしています";*/
}

环境：

★我可以运行一个批处理文件。

★我可以运行VB脚本。

★我可以使用樱花编辑器。(首选)

★我无法使用/下载第三方软件。

我试过的东西。

使用正则表达式➞，我尝试将日语文本替换为"“使用正则表达式\p{平假名}，然后\p{片假名}，然后\p{Han}，但这些仍然是符号。

使用VBA时，我尝试使用vba读取文本文件的每一行，并将匹配的行替换为"*/“。我不知道为什么，但它替换了整个文件。我使用的代码如下：

Set objFSO = CreateObject("Scripting.FileSystemObject")
If objFSO.FileExists("C:\Users\s162138\Desktop\test.txt") then
Set objFile = objFSO.OpenTextFile("C:\Users\s162138\Desktop\test.txt", 1)

Do Until objFile.AtEndOfStream
strLine = objFile.Readline
If strNextLine = "cm=*" then
strLine = "text"+ strLine + "text"
End If

strNewText = strLine + vbcrlf
Loop
Set objFile = Nothing

Set objFile = objFSO.OpenTextFile("C:\Users\s162138\Desktop\test.txt", 2)
objFile.Write strNewText
Set objFile = Nothing
End If

如果有人能帮助我，我将不胜感激..

regex

vba

回答 1

Stack Overflow用户

发布于 2020-08-19 04:16:06

像这样使用https://gist.github.com/ryanmcgrath/982242提供的日语正则表达式：

^([ \t]*)(.*?(?:[\u3000-\u303F]|[\u3040-\u309F]|[\u30A0-\u30FF]|[\uFF00-\uFFEF]|[\u4E00-\u9FAF]|[\u2605-\u2606]|[\u2190-\u2195]|\u203B).*?)([ \t]*)$

替换为$1/*$2*/$3。参见proof。

说明

                         EXPLANATION
--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [ \t]*                   any character of: ' ', '\t' (tab) (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (                        group and capture to \2:
--------------------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture:
--------------------------------------------------------------------------------
      [\u3000-\u303F]          punctuation
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      [\u3040-\u309F]          hiragana
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      [\u30A0-\u30FF]          katakana
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      [\uFF00-\uFFEF]          Full-width roman + half-width katakana
                               
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      [\u4E00-\u9FAF]          Common and uncommon kanji
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      [\u2605-\u2606]          Stars
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      [\u2190-\u2195]          arrows
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      \u203B                    Weird asterisk thing
--------------------------------------------------------------------------------
    )                        end of grouping
--------------------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
--------------------------------------------------------------------------------
  )                        end of \2
--------------------------------------------------------------------------------
  (                        group and capture to \3:
--------------------------------------------------------------------------------
    [ \t]*                   any character of: ' ', '\t' (tab) (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \3
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/63467809

复制

相似问题

问Regex/注释日语文本的方法
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Regex/注释日语文本的方法EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Regex/注释日语文本的方法
EN