pandas数据框获得每组的第一行
我有一个像下面的pandasDataFrame
。
df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4,5,6,6,6,7,7], 'value' : ["first","second","second","first", "second","first","third","fourth", "fifth","second","fifth","first", "first","second","third","fourth","fifth"]})
我想通过[“id”,“value”]将其分组,并得到每个组的第一行。
id value 0 1 first 1 1 second 2 1 second 3 2 first 4 2 second 5 3 first 6 3 third 7 3 fourth 8 3 fifth 9 4 second 10 4 fifth 11 5 first 12 6 first 13 6 second 14 6 third 15 7 fourth 16 7 fifth
预期结果
id value 1 first 2 first 3 first 4 second 5 first 6 first 7 fourth
我试过以下只给出了DataFrame
的第一行。 任何有关这个帮助表示赞赏。
In [25]: for index, row in df.iterrows(): ....: df2 = pd.DataFrame(df.groupby(['id','value']).reset_index().ix[0])
>>> df.groupby('id').first() value id 1 first 2 first 3 first 4 second 5 first 6 first 7 fourth
如果你需要id
作为列:
>>> df.groupby('id').first().reset_index() id value 0 1 first 1 2 first 2 3 first 3 4 second 4 5 first 5 6 first 6 7 fourth
要获得n个第一个logging,可以使用head():
>>> df.groupby('id').head(2).reset_index(drop=True) id value 0 1 first 1 1 second 2 2 first 3 2 second 4 3 first 5 3 third 6 4 second 7 4 fifth 8 5 first 9 6 first 10 6 second 11 7 fourth 12 7 fifth
这将给你每个组的第二行(零索引,nth(0)是相同的第一个()):
df.groupby('id').nth(1)
文档: http : //pandas.pydata.org/pandas-docs/stable/groupby.html#taking-the-nth-row-of-each-group
也许这是你想要的
import pandas as pd idx = pd.MultiIndex.from_product([['state1','state2'], ['county1','county2','county3','county4']]) df = pd.DataFrame({'pop': [12,15,65,42,78,67,55,31]}, index=idx)
pop state1 county1 12 county2 15 county3 65 county4 42 state2 county1 78 county2 67 county3 55 county4 31
df.groupby(level=0, group_keys=False).apply(lambda x: x.sort_values('pop', ascending=False)).groupby(level=0).head(3) > Out[29]: pop state1 county3 65 county4 42 county2 15 state2 county1 78 county2 67 county3 55