This lesson was recently cloned and expanded from Data Carpentry's "Better Spreadsheet Skills for Ecologists". Data Carpentry's aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. The lessons below were designed for those interested in working with data in .
Good data organization is the foundation of your research project. Most researchers have data or do data entry in spreadsheets. Spreadsheet programs are very useful graphical interfaces for designing data tables and handling very basic data quality control functions.
Much of your time as a manager or researcher will be spent in this 'data wrangling' stage.
It could be manipulating data to get it in a database, or modifying output from a database to do something. Examples: Data summaries for a project or institutional report, database migration, preparing for analysis in another software, share a temporary list of taxon names for a student working in a collection, or a data list for sharing and visualizing content in your collection.
Regarding data migration: Spreadsheets are very useful to visualize snapshots of your data. Your data workflow for cleaning data will be different based on your data transformation issues, but there are a few first steps to help you organize your process:
Understand the data you are merging. Carefully define the fields from the export database and map to the import database.
Make a list of the data transformations/data cleaning you need to do. This document is very useful, as most data cleaning efforts occur more than once.
Data cleaning is iterative. Not between datasets, but on the same dataset. This is the reason we suggest repeatable methods for data cleaning.
If you have a lot to do and its very challenging, it might be time to ask for help or learn another method. If you know what you need to do very specifically, it becomes clearer and easier (cheaper) when you do reach out for help.
In this lesson, we’re going to talk about:
Commands may differ a bit between programs, but general ideas for thinking about spreadsheets is the same. Group yourselves.
Spreadsheets encompass a lot of the things we need to be able to do as researchers. We can use them for:
In this lesson, we’re going to talk about: