有一个问题,有必要在两个方向对DNA代码进行编码。我已经找到了一半的解决方案:How to encode chars in 2-bits? in java,但它对我不起作用。4个符号:a-00位,C-01位,G-10位,T-11位.结果是编码,但是如何在打印中显示结果,同时分离字符。我需要编码输出十六进制符号。
我的代码:
public static String compileDnk(String input) {
if(input.isEmpty()) {
System.out.println("wrong command format");
return "";
}
input = input.toUpperCase();
if (!input.matches("[ACGT]+")) {
System.out.println("wrong command format");
return "";
}
byte store = 0;
for (char subString : input.toCharArray()) {
store = setByte(store, getByChar(subString), getByChar2(subString));
}
//But this output is: 11111111111111111111111111100000
return Integer.toBinaryString(store);
}
private static byte setByte(byte store, int index, int value) {
store = (byte)(store & ~(0x3 << (2 * index)));
return (byte) (store | (value & 0x3) << (2 * index));
}发布于 2022-02-23 13:05:00
我只能这么做了。一切都很简单。
public static void main(String[] args) throws Exception
{
String dna = "CGATAAG";
System.out.println(encode(dna)); //7 63 08
System.out.println(decode("7 63 8")); //CGATAAG
System.out.println(decode("7 63 08")); //CGATAAG
}
public static String encode(String dna)
{
final int size = dna.length();
dna = dna.replace("A","0").replace("C","1").replace("G","2").replace("T","3");
final byte[] bytes = new byte[(int)Math.ceil(size / 4D)];
for(int i=0, count=0, index=0; i<size; ++i, ++count) {
final int value = dna.charAt(i) - '0';
switch(count) {
case 0: bytes[index] |= (value & 3) << 6; break;
case 1: bytes[index] |= (value & 3) << 4; break;
case 2: bytes[index] |= (value & 3) << 2; break;
default: bytes[index] |= value & 3; break;
}
if(count == 3) { count = -1; ++index; }
}
return String.format("%d %s", size, toHexString(bytes, 0, bytes.length, false, ' '));
}
private static final char[] hex = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' };
private static final char[] HEX = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F' };
public static String toHexString(byte[] bytes, int startIndex, int quantity, boolean lowercase, char delimiter)
{
int bound = bytes.length;
final char[] hexDigit = lowercase ? hex : HEX;
final boolean hasDelimiter = delimiter == '\r' ? false : true;
final char[] hex = hasDelimiter ? new char[quantity * 3] : new char[quantity << 1];
final int value;
bound = startIndex + quantity;
for(int i=0; startIndex != bound; startIndex++) {
value = bytes[startIndex] & 0xFF;
hex[i++] = hexDigit[value >> 4];
hex[i++] = hexDigit[value & 0x0F];
if(hasDelimiter) hex[i++] = delimiter;
}
return hasDelimiter ? new String(hex, 0, (quantity * 3) - 1) : new String(hex, 0, quantity << 1);
}
public static String decode(String encodedDna)
{
final char[] dna = { 'A', 'C', 'G', 'T' };
final String[] fields = encodedDna.split(" ");
final int size = Integer.parseInt(fields[0], 10);
final char[] chars = new char[size];
int index = 0, count = 0;
for(int i=1; i<fields.length; ++i) {
final int value = Integer.parseInt(fields[i], 16);
for(int ii=0; ii<4 && count<size; ++ii, ++count) {
switch(ii) {
case 0: chars[index++] = dna[(value & 0xC0) >> 6]; break;
case 1: chars[index++] = dna[(value & 0x30) >> 4]; break;
case 2: chars[index++] = dna[(value & 0x0C) >> 2]; break;
default: chars[index++] = dna[value & 0x03]; break;
}
}
}
return new String(chars);
}发布于 2022-02-23 19:23:29
这里有一种方法。
设置
首先,设置一些映射以在编码值和非编码值之间映射.用于编码和解码的映射类型是不同的,以便于使用它们的每一种方法。
static Map<Character, Byte> dnaToCode =
Map.of('A', (byte) 0, 'G', (byte) 2, 'C', (byte) 1, 'T', (byte) 3);
static Map<Integer, String> codeToDna =
Map.of(0, "A", 2, "G", 1, "C", 3, "T");编码
总的思想是在dna链上迭代,将核苷酸映射到它的2位等价值,将每一对转换成一个十六进制数字并返回字符串。
processing
存储下一个值并追加一个十六进制数字。
i的值插入一个分隔符(在每个十六进制
)。
static String hexDigits = "0123456789ABCDEF";
public static String encode(String dna) {
StringBuilder encoded = new StringBuilder();
char[] n = dna.toCharArray();
int len = dna.length();
for (int i = 0; i < len; i += 2) {
byte b = dnaToCode.get(n[i]);
b <<= 2;
if (i < len - 1) {
b |= dnaToCode.get(n[i + 1]);
}
encoded.append(hexDigits.charAt(b));
if (i % 4 == 2) {
encoded.append(" ");
}
}
return len + " " + encoded.toString();
}译码
encoding.
f 241
public static String decode(String code) {
String[] s = code.split(" ");
int len = Integer.parseInt(s[0]);
StringBuilder decoded = new StringBuilder();
for (int i = 1; i < s.length; i++) {
int val = Integer.parseInt(s[i], 16);
if (s[i].length() > 1) {
decoded.append(codeToDna.get(val >> 6));
decoded.append(codeToDna.get((val >> 4) & 3));
}
decoded.append(codeToDna.get((val >> 2) & 3));
decoded.append(codeToDna.get(val & 3));
}
return decoded.toString().substring(0,len);
}测试
下面是测试代码的方法。
它做7 runs.
dna链,对其进行编码。
public static void testit() {
Random r = new Random();
for (int i = 0; i < 7; i++) {
int size = r.nextInt(10) + 1;
String generated =
r.ints(size, 0, 4).mapToObj(codeToDna::get)
.collect(Collectors.joining());
String encoded = encode(generated);
String decoded = decode(encoded);
System.out.println("orig = " + generated);
System.out.println("encoded = " + encoded);
System.out.println("decoded = " + decoded);
System.out.println();
}
}打印类似的东西
orig = GGTAACTC
encoded = 8 AC 1D
decoded = GGTAACTC
orig = CACCTATTT
encoded = 9 45 CF C
decoded = CACCTATTT
orig = TTCGGCACTA
encoded = 10 F6 91 C
decoded = TTCGGCACTA
orig = CATACTAG
encoded = 8 4C 72
decoded = CATACTAG
orig = ACCCCG
encoded = 6 15 6
decoded = ACCCCG
orig = GCATT
encoded = 5 93 C
decoded = GCATT
orig = AGTCAGG
encoded = 7 2D 28
decoded = AGTCAGGhttps://stackoverflow.com/questions/71235236
复制相似问题