Remove duplicate rows based on specified grouping variables

This function removes duplicate rows from a data frame while keeping the first occurrence of each unique combination of the specified grouping variables.

deduplicate_by(.data, ...)

Arguments

.data: A data frame or tibble
...: One or more unquoted variable names to group by

Value

A data frame with duplicate rows removed, keeping only the first occurrence for each unique combination of grouping variables

Examples

# \donttest{
# Remove duplicates based on a single column
mtcars %>% deduplicate_by(carb)
#> # A tibble: 6 × 11
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#> 2  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#> 3  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#> 4  16.4     8  276.   180  3.07  4.07  17.4     0     0     3     3
#> 5  19.7     6  145    175  3.62  2.77  15.5     0     1     5     6
#> 6  15       8  301    335  3.54  3.57  14.6     0     1     5     8

# Remove duplicates based on multiple columns
mtcars %>% deduplicate_by(carb, mpg)
#> # A tibble: 29 × 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  3  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  4  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  5  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  6  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  7  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  8  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#>  9  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> 10  17.8     6  168.   123  3.92  3.44  18.9     1     0     4     4
#> # ℹ 19 more rows
# }

Remove duplicate rows based on specified grouping variables

Arguments

Value

See also

Examples