We recommend uploading two separate files for phenotype and genotype data through Harvard Dataverse.
Genotype Data Files
Accepted formats are FastQ or VCF. If you upload, a VCF please describe or indicate:
- How your VCF was generated
- What software was used and which thresholds were applied
- If the variants are filtered and how this was performed
FastQ Paired-Ended Files
If you upload FastQ files that are paired end:
- Please label them as follows:
- isolate-name-here_R1.fastq
- isolate-name-here_R2.fastq
- Replace “isolate-name-here” with your desired isolate or strain name.
Phenotype Data Files
- We recommend formatting your file as a tab delimited or a comma delimited file. Excel files are also acceptable.
- Please include an ID field that links directly to the names (or isolate names) of the FastQ or VCF files that you upload.
- Drug resistance results should include information on what concentration of the drug was tested. For example if isoniazid drug resistance was tested at a concentration of 0.2 then the field containing the resistance status should be called inh_0.2 and so forth for other drugs. Please indicate the concentration units and method of testing in the description of your file during the Dataverse upload.
- Please include data on when, month & year, and where the sample was collected including city & country. If GIS or longitude/latitude coordinates are available please include these as well. Please label these fields date, city, country, and long and lat respectively. Please do not list postal or zip codes due to privacy concerns.
- Adding data on traditional fingerprint (RFLP, MIRU-VNTR, Spoligotype) is also helpful
- Any completely de-identified data on the host is also acceptable e.g. age, comorbidities such as diabetes, etc.