Create a long-form dataset with tidyr or reshape222. Aug 2015
There are at least two packages, which let you create data in a long-format. This a step in the process of getting tidy data. For many functions you need your data in this format, for example when creating a chart with multiple lines in ggplot2 in R.
When do you need this?
When your data is stacked:
year 1 2 2013 54 65 2014 34 90 2015 89 100
This form violates two thirds of Hadley Wickham’s rules for tidy data:
Each variable forms a column
Each observation forms a row
Each data set contains information on only one observational unit of analysis (e.g., families, participants, participant visits)
The same data in the best practice longform:
year treatment result 2013 1 54 2013 2 65 2014 1 34 2014 2 90 2015 1 89 2015 2 100
Now, every variable has its column, every observation is a row.
Two packages to get long-form data
The central function in reshape2 is called
Example with dataset above:
library(reshape2) data_long <- melt(data, id.vars="year", measure.vars=c("treatment a", "treatment b"), variable.name="treatment", value.name="result")
measure.vars: which columns need to be packed in melted into one column
variable.name: how is this new column called?
value.name: how is the value column called? default: value
Learn more about reshape2 in a tutorial: An Introduction to reshape2
The function with tidyr is gather
library(tidyr) gather(year, result, 1:2)
Suppose you have 5 different treatments and the header row of your stacked data looks like this:
year 1 2 3 4 5
The code is then:
gather(year, result, 1:5)
A tutorial for tidyr: Data Processing with dplyr & tidyr
Both packages are by Hadley Wickham, they can do much more than making stacked data long.