[PMID:26651996] [BMC Genomics]

VariantSpark: population scale clustering of genotype information

“We hence developed a purpose-built approach in SPARK to perform machine learning tasks on genomic data, such as clustering of individual genomes. We utilise SPARK’s machine learning library, MLlib, and provide an interface to the standard variant data format, Variant Call Format (VCF) [4], which opens up the application of MLlib’s different machine learning algorithms to a wide range of genotype-based analysis tasks.”


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s