Repeated DNA Sequences

題意：

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

解題思路：

暴力法，把每十個字元存成一個字串放在hashset中，再一一比對。非常耗時。

hashset加bit manipulation法

A = 00，C = 01，G = 10，T = 11。

int key = 0, key = key << 2 | code(A|C|G|T)。

因為我們當第i-(i+9)的字串要改為(i+1)-(i+10)的字串時，只需要去掉第i個字元並加入第(i+10)個字元即可，我們可以利用bit manipulation用shift的方式並or起來即可，非常省時。

public class Solution {    
    public List<String> findRepeatedDnaSequences(String s) {
        Set<Integer> words = new HashSet<>();
        Set<Integer> doubleWords = new HashSet<>();
        List<String> rv = new ArrayList<>();
        char[] map = new char[26];
        //map['A' - 'A'] = 0;
        map['C' - 'A'] = 1;
        map['G' - 'A'] = 2;
        map['T' - 'A'] = 3;

        for(int i = 0; i < s.length() - 9; i++) {
            int v = 0;
            for(int j = i; j < i + 10; j++) {
                v <<= 2;
                v |= map[s.charAt(j) - 'A'];
            }
            if(!words.add(v) && doubleWords.add(v)) {
                rv.add(s.substring(i, i + 10));
            }
        }
        return rv;
    }
}

程式碼如下：

public class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        List<String> res = new ArrayList<String>();
        if (s == null || s.length() < 10) {
            return res;
        }

        HashSet<Integer> set = new HashSet<Integer>();
        int len = s.length();
        for (int i = 0; i <= len - 10; i++) {
            String str = s.substring(i, i + 10);
            int key = genKey(str);
            if (set.contains(key) && !res.contains(str)) {
                res.add(str);
            } else {
                set.add(key);
            }
        }

        return res;
    }

    private int genKey(String str) {
        HashMap<Character, Integer> map = new HashMap<Character, Integer>();
        map.put('A', 0);
        map.put('C', 1);
        map.put('G', 2);
        map.put('T', 3);

        int res = 0;
        for (int i = 0; i < str.length(); i++) {
            res = res << 2 | map.get(str.charAt(i));
        }
        return res;
    }
}

Repeated DNA Sequences

Repeated DNA Sequences

Reference

results matching ""

No results matching ""