Some user generated databases
Monday, August 8th, 2011KEI is interested in the development of sustainable mechanisms to strengthen the evidence for public policy decisions. One element of this work concerns user generated databases, an area of considerable interest, but mixed experience, in recent years. The following are examples of several such projects, beginning with the excellent Ensembl project, followed by several others of varying degrees of success in their implementation.
As this brief list shows, there are all sorts of ways to design and manage user generated databases. In some cases, the database services seem to be set up more to showcase a technology or an idea for a platform. In other cases, the database is a focused effort to solve a practical and well identified user interest. Some are run by for profit companies, others by non-profits, individuals or communities. The databases take different approaches in terms of database design, attention to standards for data formats, and governance, among other issues.
The Ensembl project produces genome databases for vertebrates and other eukaryotic species, and makes this information freely available online. The Ensembl project was started in 1999, some years before the draft human genome was completed. Even at that early stage it was clear that manual annotation of 3 billion base pairs of sequence would not be able to offer researchers timely access to the latest data. The goal of Ensembl was therefore to automatically annotate the genome, integrate this annotation with other available biological data and make all this publicly available via the web. Since the website’s launch in July 2000, many more genomes have been added to Ensembl and the range of available data has also expanded to include comparative genomics, variation and regulatory data.
The number of people involved in the project has also steadily increased. Currently, the Ensembl group consists of between 40 and 50 people, divided in a number of teams. (more…)