MCD file format

MCD file format

Previous Top Next

WinQTLCart .MCD source data files use tokens to indicate the meaning of data. This topic describes valid tokens used in .MCD source data files.

The line numbers in this topic refer to the sample .MCD file, "NewMcd.mcd," which is included as part of the WinQTLCart distribution.

Token #FileID (line 1)
File's ID number; usually a 10-digit number.

Token /* and */ (Lines 2 – 5)
Insert multiple-line comments between these tokens.

Token #bychromosome (Line 6)
Indicates start of chromosome information.

Token // (Lines 6 – 10)
Insert one-line comments after the double-slash.

Token –type (Line 7)
Indicates how marker positions are numbered along the chromosome. It takes one parameter that can be either "position" or "interval".

Position indicates that the numbers are positions from the left telomere of the current chromosome. So numbers should be in increasing order.

Interval indicates that the numbers are the interval distance after a marker. So the last number and only the last number in the series should be zero.

-type

position 0.0 9.3 17.2 29.9 38.7 52.8 57.8 72.4 76.6 93.2 97.0 115.5 116.5

interval 9.3 7.9 12.7 8.8 14.1 5.0 14.6 4.2 16.6 3.8 18.5 1.0 0.0

Token –function (Line 8)
Indicates which map function is used to transfer recombination frequency (r) between markers to distance in Morgan (M). The parameter can be an integer from 1 – 8. Haldane and Kosambi are the two most useful map functions.

Code	Reference	Note
1	Haldane (1919)	Default
2	Kosambi (1944)
3	Morgan (1994)	"Fixed"
4	Carter and Falconer (1951)
5	Rao et al. (1979)	0 £ p £ 1
6	Sturt (1976)	L
7	Felsenstein (1979)	-¥ < K < ¥, K ¹ 2
8	Karlin (1984)	Binomial, N > 0

The following Haldane and Kosambi formula can be used to convert marker distance from r to M or vice versa.
------------------------------
Haldane: dM = -0.5ln(1-2r) r = 0.5(1-exp(-2dM)
Kosambi: dM = 0.25ln((1+2r)/(1-2r)) r = (1-exp(-4dM))/(2(1+exp(-4dM)))
------------------------------

Token –Units (Line 9)
Indicates unit of marker positions. There are three choices:

cM (centiMorgan)

M (Morgan)

r (Recombination frequency) - If you choose this parameter, then token -function should be 3 (Morgan)

Token –chromosome (Line 10)
Indicates total number of chromosomes for source data.

Token –maximum (Line 11)
Indicates the maximum number of markers for a chromosome for all of the source data.

Token –named (Line 12)
Either yes or no.

Yes means markers have names

No means markers will not have names.

Token –start and –end (Line 13 and Line 55)
Use these tokens to start and end the marker position data of all chromosomes.

Token –Chromosome (Line 14, 28, and 41)
Indicates chromosome name. The marker position data for this chromosome will start with the next line.

Token #bycross (Line 56)
Indicates that cross information is to begin.

Token –SampleSize (Line 57)
Indicates sample size or individual number.

Token –Cross (Line 58)
Indicates codes for crosstype mating design (see the following table).

Code	Design	Examples
B_i	Backcross to P_i	B1, B2
B_ij	Backcross j times to P_i	B13, B25
SF_i	Selfed generation i intercross	SF2, SF6
RF_i	Randomly mated generation i intercross	RF2, RF3
RI0	Doubled haploid	RI0
RI1	Recombinant inbred via selfing	RI1
RI2	Recombinant inbred via sib mating	RI2
T(B_i)SF_j	Testcross of SF_i to P_j	T(B1)SF3
T(SF_i+j)SF_i	Testcross of SF_i for j generations	T(SF4)SF3
T(B_j)RF_i	Testcross of RF_i to P_j	T(B1)RF3
T(D3)SF_i	Design III	T(D3)SF5

Token –traits (Line 59)
Indicates trait number of source data.

Token –otraits (Line 60)
Indicates other trait number of source data. Other trait (also called a categorical trait) is the trait with qualitative or categorical values, such as sex; color, and so on. Other traits can be used as factors that can be "regressed out" in regression analysis. This means a regression of the quantitative trait of interest on the categorical trait will have been performed and the residuals used as the phenotypes in the analysis.

Token –missingtrait (Line 61)
Indicates the symbol for missing trait value.

Token –case (Line 62)
Either yes or no.

Yes means all comparisons are case dependent.

No means all names of individuals, markers and traits are converted to lower case to make comparisons.

Token –TranslationTable (Line 62)
The token with the table (next 6 lines of data) will define how marker genotype data are translated. There are six rows and three columns (18 positions) in the table. Here is the default translation table:

-TranslationTable

AA 2 2

Aa 1 1

aa 0 0

A- 12 12

a- 10 10

-- -1 -1

The first column is the genotype. The program assumes that the A allele is diagnostic for the High (parental 1) line and the a allele is diagnostic for the Low (parental 2) line. A minus sign (-) means the allele is unknown (missing). Dominant as well as co-dominant markers can be encoded.

The middle column is how the output of these genotypes will be encoded.

The third column is how you will code the marker genotype data in this source data file. Just about any set of tokens can be used for the third column (corresponding to your dataset), but DO NOT change the first two columns.

The above TranslationTable maps 2 to 2, 1 to 1, 0 to 0, etc. Just about any set of tokens
can be used for the third column, but DO NOT change the first two columns. If you encoded
your P1 homozygotes as BB, heterozygotes as Bb, etc, your translation table might appear as:

-TranslationTable

AA 2 BB

Aa 1 Bb

aa 0 bb

A- 12 B

a- 10 b-

-- -1 --

Anything in the data file that is not recognized (doesn't match something in
column 3) will become unknown (-1) in the output.

Important: You need all 18 tokens following the -TranslationTable token and the
first two columns can't be altered. You should only alter the last column.

Token –start markers and –stop markers (Line 70 and Line 109)
Use these tokens to start and end the marker genotype data of all chromosomes. Please keep the same chromosome and marker order as the data between token –start and token -end above.

You can organize the data by marker or individual:

Order by marker: For each marker, you provide the genotype data for all individuals. The order of the individuals must be the same for each marker.

Order by individuals: For each individual, you provide the genotype data for all markers (all chromosomes). The order of chromosomes and markers has to be the same.

-start individuals markers

Ind1 2 2 1 1 2 2 2 2 2

Ind2 2 2 2 1 2 1 2 2 1

Ind3 2 2 2 2 2 1 2 1 1

Ind4 2 1 2 2 2 2 1 1 1

Ind5 2 1 2 2 2 2 1 1 1

-stop individuals markers

Note that the tokens are different and the first column is the individual's label.

Token –start traits and –stop traits (Line 110 and Line 114)
Use these tokens to start and end the trait values. The data should be organized by the trait's order for the trait value. That is, for each trait, you give the trait value of all individuals. If organized by the individual, then for each individual, you provide the trait value of all traits.

-start individuals traits 2 Trait_1 Trait_2 named

Ind1 5.0 15.0

Ind2 5.3 15.3

Ind3 6.2 16.2

Ind4 4.1 24.1

Ind5 5.5 25.5

-stop individuals traits

Note that the tokens are different and there are trait number and trait labels after the token -start individuals traits. The label "named" means data has trait names.

Token –start otraits and –stop otraits (Line 115 and Line 118)
Use these tokens to start and end the other trait values. The data must be organized by the other trait's order; that is, for each other trait, you give the other trait value of all individuals. If you order by individuals, then for each individual, you give the other trait value of all other traits.

Example:

-start individuals otraits 2 sex brood named

Ind1 M 1

Ind2 F 1

Ind3 M 0

Ind4 M 1

Ind5 M 1

-stop individuals otraits

Please notice that the tokens are different and there are other trait numbers and other trait labels after the token -start individuals otraits. Label "named" means data has other trait names.

Token –quit and –end (Line 119 and Line 120)
Indicate the end of the source data file.