OK, then, how many n-grams are in a genetic word? The nucleotides A, T, G and C are only 1-grams, which makes them pretty useless as search terms. So some fuzzy math is required. Liang says DNA sequences follow Zipf's law, which basically states that in any long document, half the words appear only once. This theory can be used to find an average length for DNA "words."