Abstract
The recent development of DNA sequencing technology has given rise to many statistical methods for Rare Variant Association Studies (RVASs), such as burden and sequence kernel association tests. However, these methods, which usually require large samples, can lose power in association studies with small samples. In this study, we propose two statistical approaches applicable for RVASs when the sample size is not large. Our approaches are based on the Hamming distance, which compares the dissimilarity of Single Nucleotide Polymorphisms (SNPs) components between cases and controls. Existing Hamming distance-based methods mainly analyse common variants. For rare variant data with a small sample size, we extended two existing methods by using the weight based on minor allele frequency. Through simulation studies, we show that our proposed approaches control type 1 error rates and are more powerful even when given very small sample sizes. They also work well regardless of the direction of causal SNP effects. Applying these methods to real data, we confirmed that they identified true causal genes well. Based on the results of this study, we firmly believe that our proposed methods are powerful for small sample data.
Original language | English |
---|---|
Pages (from-to) | 301-314 |
Number of pages | 14 |
Journal | International Journal of Data Mining and Bioinformatics |
Volume | 21 |
Issue number | 4 |
DOIs | |
Publication status | Published - 2018 |
Bibliographical note
Publisher Copyright:© 2018 Inderscience Enterprises Ltd.
Keywords
- Hamming distance
- MAF
- Minor allele frequency
- RVASs
- Rare variant association studies