GenomeScope

Estimate genome heterozygosity, repeat content, and size from sequencing reads using a kmer-based statistical approach.

Run GenomeScope

Description

Kmer length

Read length

Max kmer coverage

Instructions

Upload results from running Jellyfish. Example: inputk21.hist

Instructions for running Jellyfish:

  1. Download and install jellyfish from: http://www.genome.umd.edu/jellyfish.html#Release
  2. Count kmers using jellyfish:

    $ jellyfish count -C -m 21 -s 1000000000 -t 10 *.fastq -o reads.jf

    Note you should adjust the memory (-s) and threads (-t) parameter according to your server. This example will use 10 threads and 1GB of RAM. The kmer length (-m) may need to be scaled if you have low coverage or a high error rate. You should always use "canonical kmers" (-C)

  3. Export the kmer count histogram

    $ jellyfish histo -t 10 reads.jf > reads.histo

    Again the thread count (-t) should be scaled according to your server.

  4. Upload reads.histo to GenomeScope
Note: High copy-number DNA such as chloroplasts can confuse the model. Set a max kmer coverage to avoid this. Default is -1 meaning no filter.

View analysis later