주어진 열의 데이터 프레임을 집계하고 다른 열을 표시합니다.

IT TIP

주어진 열의 데이터 프레임을 집계하고 다른 열을 표시합니다.

itqueen 2021. 1. 10. 19:44

주어진 열의 데이터 프레임을 집계하고 다른 열을 표시합니다.

다음 형식의 R에 데이터 프레임이 있습니다.

> head(data)
  Group Score Info
1     1     1    a
2     1     2    b
3     1     3    c
4     2     4    d
5     2     3    e
6     2     1    f

함수를 Score사용하여 열 다음에 집계하고 싶습니다.max

> aggregate(data$Score, list(data$Group), max)

  Group.1         x
1       1         3
2       2         4

그러나 각 그룹 Info에 대한 Score열의 최대 값과 관련된 열도 표시하고 싶습니다 . 나는 이것을하는 방법을 모른다. 내 원하는 출력은 다음과 같습니다.

  Group.1         x        y
1       1         3        c
2       2         4        d

힌트가 있습니까?

먼저 다음을 사용하여 데이터를 분할합니다 split.

split(z,z$Group)

각 청크에 대해 최대 점수가있는 행을 선택합니다.

lapply(split(z,z$Group),function(chunk) chunk[which.max(chunk$Score),])

마지막으로 data.frame에 다시 감소 do.callING rbind:

do.call(rbind,lapply(split(z,z$Group),function(chunk) chunk[which.max(chunk$Score),]))

결과:

  Group Score Info
1     1     3    c
2     2     4    d

한 줄, 마법 주문 없음, 빠른 결과, 좋은 이름 =)

기본 R 솔루션은의 출력을 단계 aggregate()와 결합 하는 merge()것입니다. aggregate()부분적으로는 출력의 이름이 더 좋기 때문에 공식 인터페이스가 표준 인터페이스보다 조금 더 유용하다는 것을 알게되었으므로 다음을 사용하겠습니다.

aggregate()단계입니다

maxs <- aggregate(Score ~ Group, data = dat, FUN = max)

그리고 merge()단계는 단순히

merge(maxs, dat)

이것은 우리에게 원하는 출력을 제공합니다.

R> maxs <- aggregate(Score ~ Group, data = dat, FUN = max)
R> merge(maxs, dat)
  Group Score Info
1     1     3    c
2     2     4    d

물론 이것을 한 줄로 집어 넣을 수 있습니다 (중개 단계는 설명에 더 가깝습니다).

merge(aggregate(Score ~ Group, data = dat, FUN = max), dat)

수식 인터페이스를 사용한 주된 이유 names는 병합 단계에 대한 올바른 데이터 프레임을 반환하기 때문입니다 . 이들은 원래 데이터 세트의 열 이름입니다 dat. 원본과 집계 된 데이터 프레임의 어떤 열이 일치 aggregate()하는지 merge()알 수 있도록 출력에 올바른 이름 이 있어야합니다 .

표준 인터페이스는 어떤 방식으로 호출하든 이상한 이름을 제공합니다.

R> aggregate(dat$Score, list(dat$Group), max)
  Group.1 x
1       1 3
2       2 4
R> with(dat, aggregate(Score, list(Group), max))
  Group.1 x
1       1 3
2       2 4

merge()이러한 출력에 사용할 수 있지만 R에 어떤 열이 일치하는지 알려주는 작업을 더 많이해야합니다.

다음은 plyr패키지를 사용하는 솔루션 입니다.

다음 코드 줄은 기본적 ddply으로 먼저 그룹별로 데이터를 그룹화 한 다음 각 그룹 내에서 점수가 해당 그룹의 최대 점수와 동일한 하위 집합을 반환합니다.

library(plyr)
ddply(data, .(Group), function(x)x[x$Score==max(x$Score), ])

  Group Score Info
1     1     3    c
2     2     4    d

그리고 @SachaEpskamp가 지적했듯이 이것은 다음과 같이 더 단순화 할 수 있습니다.

ddply(df, .(Group), function(x)x[which.max(x$Score), ])

(있는 which.max경우 여러 개의 최대 행을 반환 하는 이점도 있습니다 ).

The plyr package can be used for this. With the ddply() function you can split a data frame on one or more columns and apply a function and return a data frame, then with the summarize() function you can use the columns of the splitted data frame as variables to make the new data frame/;

dat <- read.table(textConnection('Group Score Info
1     1     1    a
2     1     2    b
3     1     3    c
4     2     4    d
5     2     3    e
6     2     1    f'))

library("plyr")

ddply(dat,.(Group),summarize,
    Max = max(Score),
    Info = Info[which.max(Score)])
  Group Max Info
1     1   3    c
2     2   4    d

A late answer, but and approach using data.table

library(data.table)
DT <- data.table(dat)

DT[, .SD[which.max(Score),], by = Group]

Or, if it is possible to have more than one equally highest score

DT[, .SD[which(Score == max(Score)),], by = Group]

Noting that (from ?data.table

.SD is a data.table containing the Subset of x's Data for each group, excluding the group column(s)

To add to Gavin's answer: prior to the merge, it is possible to get aggregate to use proper names when not using the formula interface:

aggregate(data[,"score", drop=F], list(group=data$group), mean)

This is how I baseically think of the problem.

my.df <- data.frame(group = rep(c(1,2), each = 3), 
        score = runif(6), info = letters[1:6])
my.agg <- with(my.df, aggregate(score, list(group), max))
my.df.split <- with(my.df, split(x = my.df, f = group))
my.agg$info <- unlist(lapply(my.df.split, FUN = function(x) {
            x[which(x$score == max(x$score)), "info"]
        }))

> my.agg
  Group.1         x info
1       1 0.9344336    a
2       2 0.7699763    e

I don't have a high enough reputation to comment on Gavin Simpson's answer, but I wanted to warn that there seems to be a difference in the default treatment of missing values between the standard syntax and the formula syntax for aggregate.

#Create some data with missing values 
a<-data.frame(day=rep(1,5),hour=c(1,2,3,3,4),val=c(1,NA,3,NA,5))
  day hour val
1   1    1   1
2   1    2  NA
3   1    3   3
4   1    3  NA
5   1    4   5

#Standard syntax
aggregate(a$val,by=list(day=a$day,hour=a$hour),mean,na.rm=T)
  day hour   x
1   1    1   1
2   1    2 NaN
3   1    3   3
4   1    4   5

#Formula syntax.  Note the index for hour 2 has been silently dropped.
aggregate(val ~ hour + day,data=a,mean,na.rm=T)
  hour day val
1    1   1   1
2    3   1   3
3    4   1   5

ReferenceURL : https://stackoverflow.com/questions/6289538/aggregate-a-dataframe-on-a-given-column-and-display-another-column

'IT TIP' 카테고리의 다른 글

MVC 3 "취소"제출 버튼에서 클라이언트 측 유효성 검사 비활성화 (0)	2021.01.10
Android TextView 또는 EditText에서 텍스트를 오른쪽 정렬하는 방법은 무엇입니까? (0)	2021.01.10
Scala 리터럴 식별자 (백틱)에 대한 설명이 필요합니다. (0)	2021.01.10
자바 스크립트의 array.sort () 메서드를 확장하여 다른 매개 변수를 수락하는 방법은 무엇입니까? (0)	2021.01.10
Postgres에서 LIKE와 ~의 차이점 (0)	2021.01.10

현재글주어진 열의 데이터 프레임을 집계하고 다른 열을 표시합니다.

itqueen

주어진 열의 데이터 프레임을 집계하고 다른 열을 표시합니다.

주어진 열의 데이터 프레임을 집계하고 다른 열을 표시합니다.

'IT TIP' 카테고리의 다른 글

'IT TIP'의 다른글

티스토리툴바

주어진 열의 데이터 프레임을 집계하고 다른 열을 표시합니다.

주어진 열의 데이터 프레임을 집계하고 다른 열을 표시합니다.

'IT TIP' 카테고리의 다른 글

'IT TIP'의 다른글

관련글

티스토리툴바