# Time-stamp: <2006-07-26 19:45:29 zaykind> [written by Dmitri Zaykin] This directory contains programs to accompany Dmitri V. Zaykin, Zhaoling Meng, Margaret G. Ehm. 2006. Contrasting Linkage-Disequilibrium Patterns between Cases and Controls as a Novel Association-Mapping Method. Am. J. Hum. Genet., 78:737-746. (1) The scripts are written in R - it can be installed from http://www.r-project.org/ (2) If you don't have the following packages installed, type at the R command prompt: install.packages("mice") install.packages("ellipse") (3) The data format is as in "testdat-recoded.txt": phenotype is the column 1 (0: controls; 1: cases). Next columns are SNP genotypes, each column for each SNP. Genotypes should be coded as -1,0,1 ("AA" -> -1; Aa -> 0; aa -> 1; missing value -> NA). It doesn't matter which of the two alleles are considered A or a. Missing phenotype values (in column 1) are not allowed. (4) Getting p-values for the correlation and Delta-prime based statistics source("LD-contrast.r") LDcontrast("testdat-recoded.txt", "corr", 1000) LDcontrast("testdat-recoded.txt", "dprime", 1000) (5) Plotting LD based on correlation and Delta-prime: source("LD-contrast-Plot.r") LDplot("testdat-recoded.txt", "corr") LDplot("testdat-recoded.txt", "dprime") Notes: ------ (N1) Speed of the Dprime-based analysis can be greatly improved if you have a GNU C++ compiler. On a Linux system, the source can be compiled simply as R CMD SHLIB DprKK.cpp This only needs to be done once, to create a shared library, "DprKK.so" (to reside in the same directory as the rest of the scripts). Then the step (4) above is modified by sourcing the file "LD-contrast-C.r" instead of "LD-contrast.r" source("LD-contrast-C.r") LDcontrast("testdat-recoded.txt", "corr", 1000) LDcontrast("testdat-recoded.txt", "dprime", 1000) (N2) Missing data handling uses multiple imputation via polytomous logistic regression, as implemented in the package MICE, referenced in the paper. Good performance of polytomous logistic regression was reported in [1]. Presence of missing values considerably slows down speed of the calculations. The algorithm implemented here is as follows. First, generate the mean statistic value via multiple imputations for the original data set (hardcoded as MxImp). Then use a single multiple imputation per each phenotype permutation during the p-value computation. (N3) Only the Z2 statistic is currently fully implemented. For the principal component-based analysis (Z1 statistic), there is code in More/ldtstk.r - however currently this script has only been tested without missing data. (N4) SNPs can be recoded to -1,0,1 with the help of a perl script in "More/recode_to_-1_0_1.pl" as recode_to_-1_0_1.pl x testdat.txt > testdat-recoded.txt where 'x' denotes the original missing value which would be replaced by NA. The limitation of the script is that alleles are assumed to be coded as 1 or 2 only (see "More/testdat.txt"). References: ----------- 1. OW Souverein, AH Zwinderman, Tanck MWT 2006. Multiple Imputation of Missing Genotype Data for Unrelated Individuals. Annals of Human Genetics (in press)