搜索 - 腾讯云开发者社区-腾讯云

文章/答案/技术大牛

发布

来自专栏owent
POJ 3267 The Cow Lexicon 解题报告
POJ 3267 The Cow Lexicon 这题是一道DP问题,我的想法如下: 1.可以令 deleteNum[pos]为输入字符串在pos处需要删除的最少字符数量; 2.如果输入字符串长度为
30920发布于 2018-08-01
来自专栏深度学习自然语言处理
【论文解读】IJCAI2019: 面向中文NER 基于lexicon rethinking的CNN模型
，尤其是Lattice-LSTM； Lexicon冲突问题: 当句子中的某个字符可能与lexicon中的多个word有关时，基于RNN的模型难以对此做出判断。 Contibutions 本文总结如下三点贡献设计了能够将lexicon信息融入到中文NER的CNN结构模型,且有效地加速了模型的训练；设计了Rethinking机制来处理了上文所说的lexicon LR-CNN模型主要包括「Lexicon-Based CNNs」和「Refining Networks with Lexicon Rethinking」两部分 Lexicon-Based CNNs 首先将输入的句子表示为最后，作者通过消融实验得出如下结论消融实验结论去掉lexicon信息 lexicon信息对基于字符的中文NER是十分有用的去掉rethinking机制 rethinking机制能够有效提高融合lexicon 信息后模型的实验结果(因为它可以处理字符与lexicon中word的冲突问题) 同时去掉lexicon信息和rethinking机制通过对比「仅去掉lexicon信息」和「同时去掉lexicon和rethinking
2.2K10发布于 2020-03-06
来自专栏AIGC 先锋科技
Lexicon3D: 探索复杂3D场景理解的视觉基础模型！
作者将作者的工作命名为 _Lexicon3D_，这是一种统一的检测架构和作者对视觉基础模型在3D场景理解进行的第一的综合评估。 3 Probing Visual Encoders for Scene Understanding 作者 Lexicon3D 的目标是评估不同视觉基础模型在复杂场景理解任务中的表现。参考 [1].Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding.
68410编辑于 2024-09-13
来自专栏开发与安全
《Learn python the hard way》Exercise 48: Advanced User Input
测试文件 tests/lexicon_tests.py 摘自网站： from nose.tools import * from ex48 import lexicon def test_directions (): assert_equal(lexicon.scan("north"), [('direction', 'north')]) result = lexicon.scan("north ("go"), [('verb', 'go')]) result = lexicon.scan("go kill eat") assert_equal(result, [('verb', 表示从ex48 中导入lexicon 模块，即现在我们要在ex48 目录下写一个lexicon.py 文件，文件主要是scan 函数的实现，根据网站的提示，自己实现如下： #! 执行测试命令，即使用lexicon_tests.py 去测试lexicon.py 里面的函数，输出如下： simba@ubuntu:~/Documents/code/python/projects/ex48
80500发布于 2017-12-28
来自专栏成长道路
Jcseg分词器的实现详解
jcseg.keeppunctuations=@%.&+ ####about the lexicon #prefix of lexicon file. lexicon.prefix=lex #suffix of lexicon file. lexicon.suffix=lex #abusolte path of the lexicon file. lexicon/1;D:/jcseg/lexicon/2 (WinNT) #lexicon.path=C:/Users/admin/Downloads/jcseg-1.9.2/lexicon lexicon.path =D:/workspace/lexicon #Wether to load the modified lexicon file auto.
1.7K00发布于 2017-12-28
来自专栏杨丝儿的小站
SP Modules Review Contents (2)
Unilex is an ‘accent-independent’ lexicon based on the Unisyn database Classifies phones by keywords ‘Putt’ → STRUT class Use this to describe phonemic variation in English dialects/accents A single lexicon to encode different accents: run lexicon through accent specific rules to produce accent specific lexica Phoneset choice Unilex is more generalizable than CMUDict Unilex more compact: 1 base lexicon + rules
62150编辑于 2022-11-15
来自专栏小鹏的专栏
[语音识别] kaldi -- aidatatang_200zh脚本解析:词典准备
输入：text（所有录音的分词文本信息- 如果是自己的数据没有人工分词可能要提前jieba等工具分词一下）输出：data/local/dict文件夹(含extra_questions.txt、lexicon.txt 获取数据集中所有单词【从text中获取】-> 生成words.txt - 把整个数据集的单词分成ch和en两个词典 -> 生成words-{en,ch}.txt 2.生成【英文发音字典】，通过CMU字典生成lexicon-en.txt （数据集能在字典中取得的单词,及其对应的cmu音素） - 下载安装g2p_model（单词到音素模型，用于转换oov） - 生成lexicon-en-oov.txt （使用g2p_model生成，oov 及其对应的cmu因素 *由于words-en-oov中含有【VISA卡】这种中英文混用的单词会导致转换失败，会少21条数据，不知道对后面会不会有影响*） - 生成lexicon-en-phn.txt （merge in-vocab and oov lexicon） - 将cmu和拼音中无法转换的cmu音素替换成可以转换的因素
69020发布于 2021-07-19
来自专栏JasonhavenDai
自然语言处理构建文本向量空间1.百科2.源代码3.参考：
).splitlines() ''' 第一步:Basic term frequencies frequencies:计算文本各行的单词频度（出现次数）,存在问题，文本空间大小不一样 build_lexicon word in doc.split(): c[word]+=1 counters.append(c) return counters def build_lexicon (corpus): lexicon=set() for doc in corpus: lexicon.update([w for w in doc.split()]) return lexicon def tf(term,doc): return freq(term,doc) def freq(term,doc): return doc.split frequencies(doc_list) # for counter in counters: # print(counter) vocabulary=build_lexicon
64560发布于 2018-04-11
来自专栏生信宝典
生信宝典之傻瓜式 (五) - 文献挖掘查找指定基因调控网络
Concept Lexicon Limits Search: 如果需要把搜索结果限制在某个物种，则勾选。 Concept Lexicon: 通常是物种相关的选项，对Use aliases的判断和搜索结果提取有效，但不用于限制查询结果。 Interaction Lexicon: 限制判断相互作用的严格程度。 (把sxbd改为您的用户名) Interaction Lexicon：前面提到的limit, relax, empty每一个的效果都记录在文件interaction-lexicon-map.txt中，文件内容如下 Concept Lexicon 这个由文件concept-lexicon-map.txt控制，默认收录了常见物种的KEGG注释信息、基因的别名信息。
1.8K90发布于 2018-02-05
来自专栏小七的各种胡思乱想
中文NER的那些事儿3. SoftLexicon等词汇增强详解&代码实现
default = {'B' : set(), 'M' : set(), 'E' : set(), 'S' :set()} soft_lexicon = [deepcopy(default) for i soft_lexicon[i+1]['E'].add(word) else: soft_lexicon[i]['B'].add(word) soft_lexicon[j]['E'].add(word) for k in range(i+1, j): soft_lexicon [k]['M'].add(word) for key, val in soft_lexicon[i].items(): if not val: soft_lexicon ] Simple-Lexicon：Simplify the Usage of Lexicon in Chinese NER https://zhuanlan.zhihu.com/p/77788495 https
3.2K20编辑于 2022-03-22
来自专栏python3
python读取excel格式的文件
软件可以去这个地址http://www.lexicon.net/sjmachin/xlrd.htm下载。 row_data = sh.row_values(i) row_list.append(row_data) xlrd 模块内容详细的xlrd模块帮助在他的主页上http://www.lexicon.net
2.1K10发布于 2020-01-14
来自专栏小鹏的专栏
[语音识别] kaldi -- aidatatang_200zh脚本解析: run.sh
MFCC特征 ## utt2spk, spk2utt 用于CMVN # 词典准备 ## 输入：text ## 输出：data/local/dict文件夹 (含extra_questions.txt、lexicon.txt silence_phones.txt、nonsilence_phones.txt、optional_silence.txt等文件) local/prepare_dict.sh || exit 1; ## 主要生成音素相关词典例如：lexicon.txt
47210发布于 2021-07-19
来自专栏SnailTyan
CRNN论文翻译——中英文对照
and achieves remarkable performances in both lexicon-free and lexicon-based scene text recognition tasks In lexicon-free mode, predictions are made without any lexicon. 2.3.3 Lexicon-based transcription In lexicon-based mode, each test sample is associated with a lexicon Each image has been associated to a 50-words lexicon and a 1k-words lexicon. Red bars: lexicon search time per sample. Tested on the IC03 dataset with the 50k lexicon.
2.6K00发布于 2017-12-28
来自专栏自然语言处理
NLP系列（一）pkuseg-python：一个高准确度的中文分词工具包
我爱北京天安门') #进行分词 print(text) loading model finish ['我', '爱', '北京', '天安门'] 代码示例2 设置用户自定义词典 import pkuseg lexicon = ['北京大学', '北京天安门'] #希望分词时用户词典中的词固定不分开 seg = pkuseg.pkuseg(user_dict=lexicon) #加载模型，给定用户词典 text = seg.cut /models', nthread=20) 5 参数说明 pkuseg.pkuseg(model_name='msra', user_dict='safe_lexicon') model_name 默认为'safe_lexicon'表示我们提供的一个中文词典(仅pip)。用户可以传入一个包含若干自定义单词的迭代器。 pkuseg.test(readFile, outputFile, model_name='msra', user_dict='safe_lexicon', nthread=10) readFile
1.6K20发布于 2019-02-13
来自专栏数据分析与挖掘
哈工大LTP基本使用-分词、词性标注、依存句法分析、命名实体识别、角色标注
/model/ltp_data_v3.4.0/" self.segmentor = Segmentor() # load_with_lexicon用于加载自定义的词典 self.segmentor.load_with_lexicon(os.path.join(LTP_DIR, "cws.model"),os.path.join(LTP_DIR, "user_dict.txt ")) self.postagger = Postagger() self.postagger.load_with_lexicon(os.path.join(LTP_DIR
3.4K20发布于 2021-04-27
来自专栏NewBeeNLP
FLAT——中文NER该怎么做
近几年有不少论文围绕着中文NER中的词汇增强进行了不少工作，一种方式是在字向量中嵌入词级信息(ACL 2020: Simplify the Usage of Lexicon in Chinese NER Lexicon Rethink CNN(IJCAI 2019)[5]: 作者提出了含有rethink机制的CNN网络解决Lattice LSTM的词汇冲突问题。采用GNN对Lattice结构编码： Lexicon-based Graph Network (EMNLP 2019)[6] Collaborative Graph Network (EMNLP 2019 https://arxiv.org/abs/1908.05969 [4] Lattice LSTM (ACL 2018): https://arxiv.org/abs/1805.02023 [5] Lexicon Rethink CNN(IJCAI 2019): https://www.ijcai.org/Proceedings/2019/0692.pdf [6] Lexicon-based Graph Network
2.1K40发布于 2021-03-03
来自专栏mathor
Human Language Processing——Speech Recognition
需要用到一个词表，我们通常称之为Lexicon。形式如下 ? 以英文为例：该表包含了所有单词的Phoneme表示，英文单词有多少个，该表就有多少行。可想而知，表的条目是很多的。两者的Phoneme集合和Lexicon不一样 Grapheme 书写的最小单位对于英文来说，Grapheme指的就是26个英文字母；对于中文来说，Grapheme指的就是约4000+个常用汉字。中文Grapheme集合的数量为3755（一级汉字）+ 3008（二级汉字）+ 16（标点符号）值得一提的是，这种选取方式是Lexicon free的，它不需要语音学家的帮忙来制定复杂专业的Lexicon 但某些方式的弊端却是显而易见的：Phoneme方式，需要lexicon的辅助，并不是end-to-end的；word方式，token集合的个数通常 > 100k，解码复杂；Byte方式，想做到大一统，需要的训练语料必然异常庞大
1.1K10发布于 2020-07-27
来自专栏自然语言处理
pyltp的使用教程
熊高雄你吃饭了吗') print(type(words)) print('\t'.join(words)) segmentor.release() 输出熊高雄你吃饭了吗 4.3 使用自定义词典 lexicon ，模型名称为`cws.model` from pyltp import Segmentor segmentor = Segmentor() # 初始化实例 segmentor.load_with_lexicon (cws_model_path, 'lexicon') # 加载模型，第二个参数是您的外部词典文件路径 words = segmentor.segment('亚硝酸盐是一种化学物质') print('\ t'.join(words)) segmentor.release() 输出 [INFO] 2018-08-16 19:18:03 loaded 2 lexicon entries 亚硝酸盐
1.8K10发布于 2018-08-28
来自专栏小鹏的专栏
[语音识别] kaldi -- aidatatang_200zh脚本解析:语言模型训练
输入：data/local/train/text data/local/dict/lexicon.txt 输出：data/local/lm (含text.no_oov, word.counts, unigram.counts train/text的文件名索引替换成<UNK> word.counts 统计text.no_oov中单词出现的个数，并按出现次数倒序 unigram.counts 合并text.no_oov和dict/lexicon.txt
45310发布于 2021-07-19
来自专栏呱牛笔记
rv1106&rv1109&rv1126移植sherpa-onnx 实现TTS功能
/vits-icefall-zh-aishell3/model.onnx \ > --vits-lexicon=. /vits-icefall-zh-aishell3/lexicon.txt \ > --vits-tokens=. /szh-aishell3/model.onnx --vits-lexicon=./vits-icefall-zh-aishell3/lexicon.txt --vtts-rule-fsts=. /rule-10-0.wav 'WIFI配置完成' /home/alientek/rv1106/sherpa-onnx/sherpa-onnx/csrc/lexicon.cc:ConvertTextToToken
2.5K10编辑于 2024-04-10

第 2 页第 3 页第 4 页第 5 页第 6 页第 7 页第 8 页第 9 页第 10 页第 11 页

点击加载更多

POJ 3267 The Cow Lexicon 解题报告

【论文解读】IJCAI2019: 面向中文NER 基于lexicon rethinking的CNN模型

Lexicon3D: 探索复杂3D场景理解的视觉基础模型！

《Learn python the hard way》Exercise 48: Advanced User Input

Jcseg分词器的实现详解

SP Modules Review Contents (2)

[语音识别] kaldi -- aidatatang_200zh脚本解析:词典准备

自然语言处理构建文本向量空间1.百科2.源代码3.参考：

生信宝典之傻瓜式 (五) - 文献挖掘查找指定基因调控网络

中文NER的那些事儿3. SoftLexicon等词汇增强详解&代码实现

python读取excel格式的文件

[语音识别] kaldi -- aidatatang_200zh脚本解析: run.sh

CRNN论文翻译——中英文对照

NLP系列（一）pkuseg-python：一个高准确度的中文分词工具包

哈工大LTP基本使用-分词、词性标注、依存句法分析、命名实体识别、角色标注

FLAT——中文NER该怎么做

Human Language Processing——Speech Recognition

pyltp的使用教程

[语音识别] kaldi -- aidatatang_200zh脚本解析:语言模型训练

rv1106&rv1109&rv1126移植sherpa-onnx 实现TTS功能

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

POJ 3267 The Cow Lexicon 解题报告

【论文解读】IJCAI2019: 面向中文NER 基于lexicon rethinking的CNN模型

Lexicon3D: 探索复杂3D场景理解的视觉基础模型 ！

《Learn python the hard way》Exercise 48: Advanced User Input

Jcseg分词器的实现详解

SP Modules Review Contents (2)

[语音识别] kaldi -- aidatatang_200zh脚本解析:词典准备

自然语言处理构建文本向量空间1.百科2.源代码3.参考：

生信宝典之傻瓜式 (五) - 文献挖掘查找指定基因调控网络

中文NER的那些事儿3. SoftLexicon等词汇增强详解&代码实现

python读取excel格式的文件

[语音识别] kaldi -- aidatatang_200zh脚本解析: run.sh

CRNN论文翻译——中英文对照

NLP系列（一）pkuseg-python：一个高准确度的中文分词工具包

哈工大LTP基本使用-分词、词性标注、依存句法分析、命名实体识别、角色标注

FLAT——中文NER该怎么做

Human Language Processing——Speech Recognition

pyltp的使用教程

[语音识别] kaldi -- aidatatang_200zh脚本解析:语言模型训练

rv1106&rv1109&rv1126移植sherpa-onnx 实现TTS功能

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

Lexicon3D: 探索复杂3D场景理解的视觉基础模型！