文章/答案/技术大牛

发布

社区首页 >问答首页 >优化字节数组简单模式匹配

问优化字节数组简单模式匹配
EN

Stack Overflow用户

提问于 2015-05-21 12:57:11

回答 4查看 1.3K关注 0票数 1

对于一个片段，我必须在字节数组中寻找一个特定的字节模式，这很容易，但我想知道是否可以简化代码，甚至优化代码：

package anti_virus;

import java.nio.file.Files;
import java.nio.file.Paths;

public class Main {

    public static void main(String[] args) throws Exception {
        byte[] virus = Files.readAllBytes(Paths.get("C:/Users/Nick/Desktop/Uni/infected.com"));

        byte[] payload = new byte[]{0x56, 0x69, 0x72, 0x75, 0x73, (byte)0xB4, 0x40, (byte) 0xBB, 0x01,
                0x00, (byte) 0xB9, 0x05, 0x00, (byte) 0xBA, 0x0, 0x0, (byte) 0xCD, 0x21};

        // payload[14] and payload[14] have varying values

        for (int i = 0; i < virus.length; i++) {
            if ((virus[i] == payload[0]) && (virus[i+1] == payload[1]) && (virus[i+2] == payload[2]) &&
                (virus[i+3] == payload[3]) && (virus[i+4] == payload[4]) && (virus[i+5] == payload[5]) &&
                (virus[i+6] == payload[6]) && (virus[i+7] == payload[7]) && (virus[i+8] == payload[8]) &&
                (virus[i+9] == payload[9]) && (virus[i+10] == payload[10]) && (virus[i+11] == payload[11]) &&
                (virus[i+12] == payload[12]) && (virus[i+13] == payload[13]) && (virus[i+16] == payload[16]) &&
                (virus[i+17] == payload[17])) {
                  System.out.println("This file is probably a Virus!");
                  return;
            }
        }

        System.out.println("This file is no Virus.");
    }
}

bytearray

java

pattern-matching

回答 4

Stack Overflow用户

回答已采纳

发布于 2015-05-21 13:12:54

是的，它可以简化/优化：

您可以使用KMP算法 (前14个字节)。该算法在O(payload.length + virus.length)中运行，适用于任意payload而不是O(payload.length * virus.length)。(与O(payload.length * virus.length)相比，您的代码工作效率更高，原因只有一个：0x56只作为数组的第一个元素出现)
即使您选择保留算法，我也会使用一个循环来使代码更短&更易读。我还会在循环中修复ArrayIndexOutOfBoundsException的源(您可以访问virus数组的索引i, ..., i+13, i+16, i+17，循环条件允许i获得与virus.length-1一样大的索引)。

票数 3

Stack Overflow用户

发布于 2015-05-21 14:16:23

您的代码是相当好的，它给了一个合理的21毫秒的非病毒6MB文件.但是我发现最好是为前14个字节做一些预循环。此外，您还必须注意结束字节。

begin = System.currentTimeMillis();
for (i = 0; i < virus.length-payload.length; i++) {
    for (j = 0; j < 14; j++) {
        // payload[14] and payload[15] have varying values
        if (virus[i+j] != payload[j]) {
            bFound = false;
            break;
        }
    }
    if ((bFound) && (virus[i+16] == payload[16]) && (virus[i+17] == payload[17])) {
        end = System.currentTimeMillis();
        System.out.println("time : "+(end-begin)+" ms");
        System.out.println("This file is probably a Virus!");
        return;
    }
}
end = System.currentTimeMillis();
System.out.println("time : "+(end-begin)+" ms");
System.out.println("This file is not a Virus.");

这个第一个optim给出了一个合理的14 ms (占CPU的33%)。

另一个优化(如果您能够以整数形式读取您的文件)是一次进行广泛的比较(4个字节)。您也应该将有效载荷设置为4的倍数。

begin = System.currentTimeMillis();
for (i = 0; i < virusInt.length-payloadInt.length; i++) {
    if ((virusInt[i] == payloadInt[0]) && 
        (virusInt[i+1] == payloadInt[1]) && 
        (virusInt[i+2] == payloadInt[2]) &&
        ((virusInt[i+3]&0xFFFF0000) == payloadInt[3]) && 
        ((virusInt[i+4]&0xFFFF0000) == payloadInt[4])) {
           end = System.currentTimeMillis();
           System.out.println("time : "+(end-begin)+" ms");
           System.out.println("This file is probably a Virus!");
           return;
       }
}
end = System.currentTimeMillis();
System.out.println("time : "+(end-begin)+" ms");
System.out.println("This file is not a Virus.");

这给了我更合理的2ms (-90%的CPU)。当然，我不计算转换为int数组的时间，因为我假设您将加载为int数组，而您的有效负载也是int数组。我还没有尝试使用long (在JAVA中是64位)，但是它可能会更快一些。

票数 1

Stack Overflow用户

发布于 2015-05-21 13:10:26

像这样的东西会检查数组中任何地方的签名，但是它还没有经过彻底的测试。

public static void main(String[] args) throws Exception {
    byte[] virus = FileUtil.readBytes(new File("c:/x.txt"));
    byte[] payload = "def".getBytes();

    for (int i = 0; i < virus.length; i++) {
        if ((i + payload.length) <= virus.length) {
            boolean found = true;
            for (int j = 0; j < payload.length; j++) {
                if (virus[i + j] != payload[j]) {
                    found = false;
                    break;
                }
            }

            if (found) {
                System.out.println("This file is probably a Virus!");
                return;
            }
        } else {
            break;
        }
    }

    System.out.println("This file is no Virus.");
}

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/30374471

复制

相似问题

问优化字节数组简单模式匹配
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问优化字节数组简单模式匹配EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问优化字节数组简单模式匹配
EN