首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Rosalind字符串算法问题

Rosalind字符串算法问题
EN

Code Review用户
提问于 2015-10-06 15:14:32
回答 2查看 197关注 0票数 2

我已经开始通过一些罗莎琳德字符串算法问题来学习锈病。

如果有人想指出可能的改进,或者其他什么,那就太好了。在subs和lcsm解决方案中,有几件事我特别好奇。

main.rs:

代码语言:javascript
复制
mod utils;
mod string_algorithms;

fn main() {

    //string_algorithms::dna();
    //string_algorithms::rna();
    //string_algorithms::revc();
    //string_algorithms::gc();
    //string_algorithms::subs();
    string_algorithms::lcsm();

    println!("done!");
}

utils.rs:

代码语言:javascript
复制
use std::io::Read;
use std::io::Write;
use std::fs::File;

pub fn read_file(abbr: &str) -> String {

    let name = "Input/rosalind_".to_string() + abbr + ".txt";

    let mut input = String::new();

    File::open(name)
        .unwrap()
        .read_to_string(&mut input)
        .unwrap();

    input
}

use std::collections::BTreeMap;
use std::char;

#[allow(dead_code)]
pub fn read_fasta(abbr: &str) -> BTreeMap<String, String> {

    let input = read_file(abbr);

    // read format:
    // >NAME\n
    // STRBLAHBLAHBLAH\n
    // STILLTHESAMESTR\n
    // >NAME\n
    // ...

    let mut result = BTreeMap::new();

    for x in input.split('>').skip(1) {

        let mut iter = x.split(char::is_whitespace);

        let name = iter.next().unwrap().to_string();
        let dna = iter.collect::<String>();

        result.insert(name, dna);
    }

    result
}

pub fn write_file(abbr: &str, output: &str) {

    let name = "Output/rosalind_".to_string() + abbr + ".txt";

    File::create(name)
        .unwrap()
        .write(output.as_bytes())
        .unwrap();
}

string_algorithms.rs:

代码语言:javascript
复制
use utils;

#[allow(dead_code)]
pub fn dna() {

    // Read DNA string from file and count A, C, G, T characters.
    // Is there a way to do this without the Counter class?

    let abbr = "dna";

    let input = utils::read_file(abbr);

    struct Counter {
        a: u32, c: u32, g: u32, t: u32,
    }

    impl Counter {
        fn new() -> Counter {
            Counter{
                a: 0u32, c: 0u32, g: 0u32, t: 0u32,
            }
        }
    }

    let count = input.chars().fold(Counter::new(), |mut total, ch| {
            total.a += (ch == 'A') as u32;
            total.c += (ch == 'C') as u32;
            total.g += (ch == 'G') as u32;
            total.t += (ch == 'T') as u32;

            total
        });

    let output = format!("{} {} {} {}", count.a, count.c, count.g, count.t);

    println!("{}", output);

    utils::write_file(abbr, &output);
}

#[allow(dead_code)]
pub fn rna() {

    // Read DNA string from file and replace all T characters with U.
    // (Easy enough...)

    let abbr = "rna";

    let input = utils::read_file(abbr);

    let output = input.replace("T", "U");

    println!("{}", output);

    utils::write_file(abbr, &output);
}

#[allow(dead_code)]
pub fn revc() {

    // Read DNA string from file, reverse it, then swap A with T and C with G.

    let abbr = "revc";

    let input = utils::read_file(abbr);

    let output : String = input.chars().rev().map(|mut ch| {
            if ch == 'A' { ch = 'T'; }
            else if ch == 'T' { ch = 'A'; }
            else if ch == 'C' { ch = 'G'; }
            else if ch == 'G' { ch = 'C'; }
            ch
        }).collect();

    println!("{}", output);

    utils::write_file(abbr, &output);
}

#[allow(dead_code)]
pub fn gc() {

    // Read Name / DNA String pairs from file...
    // Find string with highest percentage of C and G.

    let abbr = "gc";

    let input = utils::read_fasta(abbr);

    let mut max = ("", 0f32);

    for (k, v) in input.iter() {

        let gc_count = v.chars().filter(|&ch| ch == 'G' || ch == 'C').count();
        let gc_percent = 100f32 * gc_count as f32 / v.len() as f32;

        if gc_percent > max.1 {
            max = (&k, gc_percent);
        }
    }

    let output = format!("{} {}", max.0, max.1);

    println!("{}", output);

    utils::write_file(abbr, &output);
}

use std::char;

#[allow(dead_code)]
pub fn subs() {

    // Read string and substring from file.
    // Count the number of occurrences of the substring in the whole string
    // (including overlapping matches!).

    let abbr = "subs";

    let input = utils::read_file(abbr);

    // Extract whole and substring from file format: "wholestring\nsubstring"

    let mut iter = input.split(char::is_whitespace);

    let whole = iter.next().unwrap().to_string();
    let sub = iter.collect::<String>(); // Why doesn't next.unwrap().to_string() work here too?

    assert!(!whole.is_empty());
    assert!(!sub.is_empty());
    assert!(whole.len() >= sub.len());

    let mut positions = Vec::<usize>::new();

    for i in 0..((whole.len() - sub.len()) + 1) {

        let m = whole.chars().skip(i)
            .zip(sub.chars())
            .all(|(w, s)| w == s);

        if m {
            positions.push(i + 1);
        }
    }

    let output = positions.iter()
        .map(|p| p.to_string())
        .collect::<Vec<_>>()
        .join(" ");

    println!("{}", output);

    utils::write_file(abbr, &output);
}

pub fn lcsm() {

    // Read Name / DNA string pairs from file.
    // Find the longest substring present in all the strings.

    // Work through all substrings of the first string (starting with the longest),
    // and check if it's present in all the other strings too.

    // (Could sort the strings so the shortest is first,
    // but they're all about the same length so it makes no difference...)

    let abbr = "lcsm";

    let input = utils::read_fasta(abbr);

    let first: &String = input.values().next().unwrap();

    for length in (1..(first.len() + 1)).rev() {

        let mut start = 0;

        loop {

            let end = start + length;

            // BAD: This copies the string... how to avoid this?
            let sub = first.chars().skip(start).take(length).collect::<String>();

            if input.values().skip(1).all(|x| x.contains(&sub)) {

                println!("{}", sub);

                utils::write_file(abbr, &sub);

                return;
            }

            if end == first.len() {
                break;
            }

            start += 1;
        }
    }

    assert!(false);
}
EN

回答 2

Code Review用户

回答已采纳

发布于 2015-10-07 06:12:15

我也是生锈的新手,这可能不是最好的选择,但我有一个小小的建议,那就是我认为这比你得到的要好。

在这一特定部分:

代码语言:javascript
复制
let output : String = input.chars().rev().map(|mut ch| {
            if ch == 'A' { ch = 'T'; }
            else if ch == 'T' { ch = 'A'; }
            else if ch == 'C' { ch = 'G'; }
            else if ch == 'G' { ch = 'C'; }
            ch
}).collect();

您可以将其更改为(利用几乎所有事物都是表达式):

代码语言:javascript
复制
let output : String = input.chars().rev().map(|ch| {
             match ch {
                 'A' => 'T',
                 'T' => 'A',
                 'C' => 'G',
                 'G' => 'C',
                  _  => ch
             }
}).collect();
票数 3
EN

Code Review用户

发布于 2015-10-07 13:20:52

使用MAG的建议,我还改进了"dna“解决方案如下:

代码语言:javascript
复制
let (mut a, mut c, mut g, mut t) = (0u32, 0u32, 0u32, 0u32);

for ch in input.chars() {
    match ch {
        'A' => a += 1,
        'C' => c += 1,
        'G' => g += 1,
        'T' => t += t,
        _ => (),
    };
}
票数 0
EN
页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://codereview.stackexchange.com/questions/106746

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档