Creating and maintaining accurate meta-data for a database is important, but often overlooked.
Meta-data is especially important when working on large databases with many collaborators. You may know what you've done, but to others it's not always obvious. Equally, come back to your database after a few months away and you may not remember what you did, what unit the data is in, or it's source.
I am working on a large collaborative project and received a database with relatively little meta-data. I am now spending the day deciphering what each variable is and will probably have to consult my collaborators for more information at some point.
Often it's easiest to add another worksheet tab into the excel file if you're using that, titled 'meta-data', rather than listing meta-data in a separate document that could easily become detached from the database.
Here, then, is a basic outline of what meta-data should always be included.
1. A few lines describing who created the database and when, who it was received from if it was emailed to you, and what it is about.
2. List of variable names as they appear in the raw database; a brief description of the variable; the coding or units; the type of variable (cat, cont, binary, percentage, etc); if it's a response or explanatory variable; the shortened name used for the variable in any R scripts.
I often find that this sort of meta-data is useful when writing a paper and rarely a waste of time to compile, because you almost have a ready-to-go table that could be added to the paper or supplemental information about your data. When I read papers with lots of variables I find such tables very helpful.
Meta-data is especially important when working on large databases with many collaborators. You may know what you've done, but to others it's not always obvious. Equally, come back to your database after a few months away and you may not remember what you did, what unit the data is in, or it's source.
I am working on a large collaborative project and received a database with relatively little meta-data. I am now spending the day deciphering what each variable is and will probably have to consult my collaborators for more information at some point.
Often it's easiest to add another worksheet tab into the excel file if you're using that, titled 'meta-data', rather than listing meta-data in a separate document that could easily become detached from the database.
Here, then, is a basic outline of what meta-data should always be included.
1. A few lines describing who created the database and when, who it was received from if it was emailed to you, and what it is about.
2. List of variable names as they appear in the raw database; a brief description of the variable; the coding or units; the type of variable (cat, cont, binary, percentage, etc); if it's a response or explanatory variable; the shortened name used for the variable in any R scripts.
I often find that this sort of meta-data is useful when writing a paper and rarely a waste of time to compile, because you almost have a ready-to-go table that could be added to the paper or supplemental information about your data. When I read papers with lots of variables I find such tables very helpful.
No comments:
Post a Comment