The goal of unjoin is to provide unjoin
for data frames.
This is exactly part of what tidyr::nest
does, but with two
differences:
main
and data
main
with the rows in data
.Install unjoin from CRAN:
install.packages("unjoin")
You can install the development unjoin from github with:
# install.packages("devtools")
::install_github("hypertidy/unjoin") devtools
This is a basic example which shows you how to unjoin a data frame.
library(unjoin)
unjoin(iris)
#> $.idx0
#> # A tibble: 1 x 1
#> .idx0
#> <int>
#> 1 1
#>
#> $data
#> # A tibble: 150 x 6
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species .idx0
#> <dbl> <dbl> <dbl> <dbl> <fct> <int>
#> 1 5.1 3.5 1.4 0.2 setosa 1
#> 2 4.9 3 1.4 0.2 setosa 1
#> 3 4.7 3.2 1.3 0.2 setosa 1
#> 4 4.6 3.1 1.5 0.2 setosa 1
#> 5 5 3.6 1.4 0.2 setosa 1
#> 6 5.4 3.9 1.7 0.4 setosa 1
#> 7 4.6 3.4 1.4 0.3 setosa 1
#> 8 5 3.4 1.5 0.2 setosa 1
#> 9 4.4 2.9 1.4 0.2 setosa 1
#> 10 4.9 3.1 1.5 0.1 setosa 1
#> # … with 140 more rows
#>
#> attr(,"class")
#> [1] "unjoin"
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
%>% unjoin(Species)
iris #> $.idx0
#> # A tibble: 3 x 2
#> Species .idx0
#> <fct> <int>
#> 1 setosa 1
#> 2 versicolor 2
#> 3 virginica 3
#>
#> $data
#> # A tibble: 150 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width .idx0
#> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 5.1 3.5 1.4 0.2 1
#> 2 4.9 3 1.4 0.2 1
#> 3 4.7 3.2 1.3 0.2 1
#> 4 4.6 3.1 1.5 0.2 1
#> 5 5 3.6 1.4 0.2 1
#> 6 5.4 3.9 1.7 0.4 1
#> 7 4.6 3.4 1.4 0.3 1
#> 8 5 3.4 1.5 0.2 1
#> 9 4.4 2.9 1.4 0.2 1
#> 10 4.9 3.1 1.5 0.1 1
#> # … with 140 more rows
#>
#> attr(,"class")
#> [1] "unjoin"
%>% unjoin(Species, Petal.Width)
iris #> $.idx0
#> # A tibble: 27 x 3
#> Species Petal.Width .idx0
#> <fct> <dbl> <int>
#> 1 setosa 0.2 2
#> 2 setosa 0.4 4
#> 3 setosa 0.3 3
#> 4 setosa 0.1 1
#> 5 setosa 0.5 5
#> 6 setosa 0.6 6
#> 7 versicolor 1.4 11
#> 8 versicolor 1.5 12
#> 9 versicolor 1.3 10
#> 10 versicolor 1.6 13
#> # … with 17 more rows
#>
#> $data
#> # A tibble: 150 x 4
#> Sepal.Length Sepal.Width Petal.Length .idx0
#> <dbl> <dbl> <dbl> <int>
#> 1 5.1 3.5 1.4 2
#> 2 4.9 3 1.4 2
#> 3 4.7 3.2 1.3 2
#> 4 4.6 3.1 1.5 2
#> 5 5 3.6 1.4 2
#> 6 5.4 3.9 1.7 4
#> 7 4.6 3.4 1.4 3
#> 8 5 3.4 1.5 2
#> 9 4.4 2.9 1.4 2
#> 10 4.9 3.1 1.5 1
#> # … with 140 more rows
#>
#> attr(,"class")
#> [1] "unjoin"
This is used to build topological data structures, with a kind of inside-out version of a nested data frame. Whether it’s of broader use is unclear.
There is a record here of some of the thinking that led to unjoin: https://github.com/r-gris/babelfish
The function unjoin
replaces the method here: http://rpubs.com/cyclemumner/iout_nest
<- iris %>% unjoin(Species, Petal.Width))
(d2 #> $.idx0
#> # A tibble: 27 x 3
#> Species Petal.Width .idx0
#> <fct> <dbl> <int>
#> 1 setosa 0.2 2
#> 2 setosa 0.4 4
#> 3 setosa 0.3 3
#> 4 setosa 0.1 1
#> 5 setosa 0.5 5
#> 6 setosa 0.6 6
#> 7 versicolor 1.4 11
#> 8 versicolor 1.5 12
#> 9 versicolor 1.3 10
#> 10 versicolor 1.6 13
#> # … with 17 more rows
#>
#> $data
#> # A tibble: 150 x 4
#> Sepal.Length Sepal.Width Petal.Length .idx0
#> <dbl> <dbl> <dbl> <int>
#> 1 5.1 3.5 1.4 2
#> 2 4.9 3 1.4 2
#> 3 4.7 3.2 1.3 2
#> 4 4.6 3.1 1.5 2
#> 5 5 3.6 1.4 2
#> 6 5.4 3.9 1.7 4
#> 7 4.6 3.4 1.4 3
#> 8 5 3.4 1.5 2
#> 9 4.4 2.9 1.4 2
#> 10 4.9 3.1 1.5 1
#> # … with 140 more rows
#>
#> attr(,"class")
#> [1] "unjoin"
We can chain unjoins together, but make sure not to repeat a
key_col
in one of these.
unjoin(iris, Species, key_col = "vertex") %>% unjoin(Petal.Width, vertex, key_col = "branch")
#> $vertex
#> # A tibble: 3 x 2
#> Species vertex
#> <fct> <int>
#> 1 setosa 1
#> 2 versicolor 2
#> 3 virginica 3
#>
#> $branch
#> # A tibble: 27 x 3
#> Petal.Width vertex branch
#> <dbl> <int> <int>
#> 1 0.2 1 2
#> 2 0.4 1 4
#> 3 0.3 1 3
#> 4 0.1 1 1
#> 5 0.5 1 5
#> 6 0.6 1 6
#> 7 1.4 2 11
#> 8 1.5 2 13
#> 9 1.3 2 10
#> 10 1.6 2 15
#> # … with 17 more rows
#>
#> $data
#> # A tibble: 150 x 4
#> Sepal.Length Sepal.Width Petal.Length branch
#> <dbl> <dbl> <dbl> <int>
#> 1 5.1 3.5 1.4 2
#> 2 4.9 3 1.4 2
#> 3 4.7 3.2 1.3 2
#> 4 4.6 3.1 1.5 2
#> 5 5 3.6 1.4 2
#> 6 5.4 3.9 1.7 4
#> 7 4.6 3.4 1.4 3
#> 8 5 3.4 1.5 2
#> 9 4.4 2.9 1.4 2
#> 10 4.9 3.1 1.5 1
#> # … with 140 more rows
#>
#> attr(,"class")
#> [1] "unjoin"
Also, there’s no escape hatch here, you can’t “unjoin” your way to normal nirvana, each unjoin needs to carry the last unjoin-key with it, and you just end up with the big link table with no attributes. It needs some kind of group-semantic to cut the chain.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.