A good baseline for this type of research in human genetics is Standards and guidelines for the interpretation of sequence variants from ACMG. It is a guideline for clinicians, and it gives a good sense of good variants data, bad variants data and setting up confidence level.
Try to consolidate data from:
## Population databases
1. GWAS databases
* Exome Aggregation Consortium
* 1000 Genomes Project (there is data in vcf format which is pretty much what you need)
2. SNV databases
* dbSNP (it is uncurated so you got to be hypercritical)
## Disease databases
* ClinVar (be aware "12% of interpreted variants have ≥2 submitters in ClinVar , and 21% are interpreted differently")
* OMIM
* HGMD (it is a gold standard as it is curated, but you could only search for variants but not collect data because you could not download it unless you pay for it)