Detection and filtering Single Nucleotide Polymorphisms (SNPs) for fish population genetics in Mekong Delta

Doan Vu Thinh1, Vu Dang Ha Quyen2,4, Truong Thi Oanh2, Tran Linh Thuoc3, Dang Thuy Binh2

1Faculty of Information Technology, Nha Trang University, 02 Nguyen Dinh Chieu, Nha Trang, Viet Nam; 2Institute of Biotechnology and Environment, Nha Trang University, 02 Nguyen Dinh Chieu, Nha Trang, Viet Nam; 3Department of Molecular & Environmental Biotechnology, University of Science, 227 Nguyen Van Cu Str., Dist. 5, Ho Chi Minh City, VietNam; 4Phd Student in Biotechnology, University of Science, 227 Nguyen Van Cu Str, Dist. 5, Ho Chi Minh City, VietNam;

Over the past decades, single nucleotide polymorphisms (SNPs) have been proved as the valuable markers for population genetics. Recently, several molecular techniques and bioinformatics approach were applied to detect and filter SNPs. This study aim to compare different pipelines for SNPs discovery using sequence data of Blackhand paradise (Polynemus milanochir) populations in Mekong Delta, Vietnam. Reference genome (generated more than 7,000 reads) obtained by comparing reads from different genotypes using de novo assembly strategies from RAINBOW software. The SNP variants and insertions/deletions were detected using comparative Varscan and FreeBayes tools, creating 24,106 and 32,322, respectively. The raw SNPs from Freebayes were subjected to filter criteria (minimum read depth, the quality score, remove missing data (by vcftools), Allele frequency, Allele Balance, SNP type, Allele Count (by vcffilter software), while Varscan tool allowed filtering only vcftools.  As a result, high quality SNPs (7,517 and 2,657 by Varscan and Freebayes, respectively) were used for population structure analysis by STRUCTURE 2.3.4. Polynemus milanochir exhibited highly disconnected populations across the Vietnamese Mekong River as fine geographic scale with filter SNPS using Freebayes. SNPs data from Varscan showed week structure or highly connectivity of fish populations. The pipeline need to be optimize for detection and filtering high-throughput next-generation data.

Keywords: SNPs, Polynemus milanochir. Varscan, Freebayes

Related

Others