首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >JavaScript.srt和清理.srt

JavaScript.srt和清理.srt
EN

Stack Overflow用户
提问于 2022-11-21 09:32:12
回答 3查看 69关注 0票数 0

我有一个.srt文件,其文本如下:

19

00:01:05 100->00:01:08 820

西班牙这样的国家。另一个因素

20

00:01:08,820 -> 00:01:11,850

考虑一下西班牙海岸的南端

21

00:01:11,850 -> 00:01:15,060

离北非这么近

我已经找到了这段代码,它可以很好地清除信息,但是这段代码保留了初始数字(这些数字可以从一位数到四位数字)。

结果:

19个国家,如西班牙。另一个要考虑的因素是西班牙海岸的南端离北非如此之近

知道怎么移除这些数字吗?

这是我的密码:

代码语言:javascript
复制
 <script>
            document.querySelector('#files').addEventListener('change', (e) => {
                
                let files = e.target.files,
                    i = 0,
                    reader = new FileReader;
            
                
                reader.onload = (e) => {
                    //console.log(files[i].name, e.target.result);
                    var fileName = files[i].name;
                    var text = e.target.result;

                    text = text.replace(/WEBVTT[\r\n]/,"");
                    text = text.replace(/NOTE duration:.*[\r\n]/,"");
                    text = text.replace(/NOTE language:.*[\r\n]/,"");
                    text = text.replace(/NOTE Confidence:.+\d/g,"");
                    text = text.replace(/NOTE recognizability.+\d/g,"");
                    text = text.replace(/[\r\n].+-.+-.+-.+-.+/g,"");
                    text = text.replace(/[\r\n].+ --> .+[\r\n]/g,"");
                    text = text.replace(/.[\r\n]. --> .+[\r\n]/g,"");
                    text = text.replace(/[\n](.)/g," $1");
                    text = text.replace(/[\r\n]+/g,"");
                    text = text.replace(/^ /,"");
                
                    var heading = document.createElement('h3');
                    document.body.appendChild(heading);
                    heading.innerHTML = "Transcript for '" + files[i].name + "'";
                
                    var copyButton = document.createElement('button');
                    document.body.appendChild(copyButton);
                    copyButton.onclick = function() {copyToClip(text,fileName); };
                    copyButton.innerHTML = "Copy transcript";
                    copyButton.className = "copyButton";
                
                    var div = document.createElement('div');
                    document.body.appendChild(div);
                    div.className = "cleanVTTText";
                    div.innerHTML = text;
            
                    //console.log(files[i].name, text);
                    console.log(files[i].name);
                    
                    
                    if (i++ < files.length - 1) {
                        reader.readAsText(files[i]);
                    } else {
                        console.log('done');
                        
                    }
                };
                
                reader.readAsText(files[i]);
            
            }, false);
            
            function copyToClip(str,fileName) {
                function listener(e) {
                e.clipboardData.setData("text/html", str);
                e.clipboardData.setData("text/plain", str);
                e.preventDefault();
                }
                document.addEventListener("copy", listener);
                document.execCommand("copy");
                document.removeEventListener("copy", listener);
                alert("Copied transcript to clipboard:\n'"+fileName+"'");
            };     
            </script>
EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2022-11-22 08:33:00

对于这个问题,添加这一行代码是有效的:

代码语言:javascript
复制
text = text.replace(/\n?\d*?\n?^.* --> [012345]{2}:.*$/mg ,"");  
票数 0
EN

Stack Overflow用户

发布于 2022-11-21 09:39:55

我没有使用replace,而是提出了一种不同的解决方案,使用split根据换行符拆分字符串,然后得到一个数组,可以在其中任意构造字符串。

代码语言:javascript
复制
let text = `19
00:01:05,100 --> 00:01:08,820
countries such as Spain. Another factor to

20
00:01:08,820 --> 00:01:11,850
consider is the southern tip of Spain's coast

21
00:01:11,850 --> 00:01:15,060
being so close to northern Africa could have`
let newtext = text.split('\n').filter(el => el !== '' && !el.includes('-->') && el.match(/[^A-Za-z0-9\-_]/) );
console.log(newtext);
console.log('Just for example:');
console.log(`${newtext[0]} ${newtext[1]} ${newtext[2]}`);

参考资料:

票数 0
EN

Stack Overflow用户

发布于 2022-11-21 15:29:44

您可以删除开始一个可选空行的所有文本,然后删除一个只有一个整数的行,最后删除带有时间戳的一行,用"-->“分隔(包括任何终止的换行符)。

可选的WEBVTT行不是原始crt标准的一部分,但可以在相同的路径中删除:

代码语言:javascript
复制
function cleanCrt(text) {
    return text.replace(/^(WEBVTT\b.*)?[\r\n]*\d+(?:\r\n?|\n)[\d:,.]* --> [\d:,.]*[\r\n]*/gm, "");
}

const text = `WEBVTT
1
00:00:00,940 --> 00:00:04,630
Donkeys were first domesticated around 6000 years

2
00:00:04,630 --> 00:00:08,620
ago in northern Africa and Egypt, primarily for

3
00:00:08,620 --> 00:00:12,820
their milk and their meat. And around 2000 years

4
00:00:12,820 --> 00:00:15,970
ago, donkeys were used as draft animals, carrying 

5
00:00:15,970 --> 00:00:19,720
silk from the Pacific Ocean to the Mediterranean

6
00:00:19,720 --> 00:00:23,350
along the silk route. This was in return for trade [...]

19
00:01:05,100 --> 00:01:08,820
countries such as Spain. Another factor to

20
00:01:08,820 --> 00:01:11,850
consider is the southern tip of Spain's coast

21
00:01:11,850 --> 00:01:15,060
being so close to northern Africa could have`;

console.log(cleanCrt(text));

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/74516782

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档