单词向量集是从github链接:https://github.com/jianwei76/SoliAudit/blob/master/va/features/op.origin.csv.xz生成的。
使用op.origin.csv.xz ()函数将此.txt文件转换为gen_doc文件,
opfile=op.origin.csv.xz #downloaded and uploaded in google colab folder
binfile=model.bin # new binfile created to save the model generated from word2vec model
def op_name(op):
return op.rstrip('0123456789')
def filter_op(op_line):
filter_ops = [ op_name(op) for op in op_line.split() ]
return ' '.join(filter_ops)
def gen_doc(opfile, docfile):
op = pd.read_csv(opfile, compression='xz', index_col=0)
op.dropna(inplace=True)
op['Opcodes'] = op['Opcodes'].apply(filter_op)
def get_model(opfile, binfile, size=5):
docfile = 'op-doc.tmp.txt'
gen_doc(opfile, docfile)
logging.info('Training opcode word2vec...in=%s, out=%s, word-embed-size=%d' % (docfile, binfile, size))
word2vec.word2vec(docfile, binfile, size=size, verbose=True)
return word2vec.load(binfile)
```
For the Code snippet:
```
op_vecs = [ opline_to_vec(row['Opcodes'], w2v) for idx, row in data.iterrows() ]
```
invokes function
```
def opline_to_vec(line, w2v):
print('inside oplinetovec func')
ops = line.split()
print('ops and line.split done')
vec = np.zeros((len(ops), w2v.vectors.shape[1]))
print('vec computed')
for i, op in enumerate(ops):
print('each vec i values')
vec[i] = w2v.get_vector(op_name(op))***
print(vec[i])
print ('returning from opline_to_vec')
return vecop tem.txt->的输出
CALLDATASIZE SUB DUP ADD SWAP DUP DUP CALLDATALOAD PUSH AND SWAP PUSH ADD SWAP SWAP SWAP SWAP POP POP POP PUSH JUMP JUMPDEST PUSH MLOAD DUP DUP DUP MSTORE PUSH ADD SWAP POP POP PUSH MLOAD DUP SWAP SUB SWAP RETURN JUMPDEST PUSH PUSH DUP CALLDATASIZE SUB DUP ADD SWAP DUP DUP CALLDATALOAD PUSH AND SWAP PUSH ADD SWAP SWAP SWAP SWAP DUP CALLDATALOAD SWAP PUSH ADD SWAP DUP ADD DUP CALLDATALOAD SWAP PUSH ADD SWAP SWAP SWAP SWAP SWAP SWAP SWAP SWAP SWAP POP POP POP PUSH JUMP JUMPDEST STOP JUMPDEST CALLVALUE DUP ISZERO PUSH JUMPI PUSH DUP REVERT JUMPDEST POP PUSH PUSH DUP我突出显示了生成错误的代码段(veci=w2v.get_vector(op_name(Op):
/usr/local/lib/python3.7/dist-packages/word2vec/wordvectors.py in ix(self, word)
36 Returns the index on `self.vocab` and `self.vectors` for `word`
37 """
---> 38 return self.vocab_hash[word]
39
40 def word(self, ix):
KeyError: 'CALLDATASIZE'如果你能帮上忙那就太好了
发布于 2022-04-23 16:14:56
看起来你问的是一个单词的向量模型,'CALLDATASIZE',它不知道。
这组字向量是从哪里来的?(你是自己训练的,还是从其他地方进口的?你是怎么装的?)
你认为它会有一个向量来表达那个奇怪的操作码吗?如果是这样的话,跳过其他的环绕步骤,只需检查这个单词,然后返回到您认为应该创建单词向量的先前步骤。
如果它是合理的,集合没有这个词,而且您不能修复这个空白,那么修改您的代码来处理这种情况--也许可以忽略这个词。
https://stackoverflow.com/questions/71956134
复制相似问题