GenomeScope 2.0

Estimate genome heterozygosity, repeat content, and size from sequencing reads using a kmer-based statistical approach.

Run GenomeScope

Description

K-mer length

Ploidy

Max k-mer coverage

Average k-mer coverage for polyploid genome

Instructions

Upload results from running Jellyfish or KMC. Example: inputk21.hist

Instructions for running Jellyfish:

  1. Download and install jellyfish from: http://www.genome.umd.edu/jellyfish.html#Release
  2. Count k-mers using jellyfish:

    $ jellyfish count -C -m 21 -s 1000000000 -t 10 *.fastq -o reads.jf

    Note you should adjust the memory (-s) and threads (-t) parameters according to your server. This example will use 10 threads and 1GB of RAM. The k-mer length (-m) may need to be scaled if you have low coverage or a high error rate. You should always use "canonical k-mers" (-C).

  3. Export the k-mer count histogram

    $ jellyfish histo -t 10 reads.jf > reads.histo

    Again the thread count (-t) should be scaled according to your server.

  4. Upload reads.histo to GenomeScope
Instructions for running KMC:
  1. Download and install KMC from: http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&project=kmc&subpage=download
  2. Count k-mers using KMC:

    $ mkdir tmp

    $ ls *.fastq > FILES

    $ kmc -k21 -t10 -m64 -ci1 -cs10000 @FILES reads tmp/

    Note you should adjust the memory (-m) and threads (-t) parameters according to your server. This example will use 10 threads and 64GB of RAM. The k-mer length (-k) may need to be scaled if you have low coverage or a high error rate. The lower (-ci) and upper (-cs) bounds exclude k-mers with counts outside these boundaries. FILES is a file with a list of input files.

  3. Export the k-mer count histogram

    $ kmc_tools transform reads histogram reads.histo -cx10000

    The upper bound (-cx) gives the cutoff for the histogram.

  4. Upload reads.histo to GenomeScope
Note: High copy-number DNA such as chloroplasts can confuse the model. Set a max k-mer coverage to avoid this. Default is -1 meaning no filter.

View analysis later