如何实现pandas数据框的“in”和“not in”

我如何实现SQL的IN和NOT IN的等价物？

我有一个所需值的列表。这是一个场景：

 df = pd.DataFrame({'countries':['US','UK','Germany','China']}) countries = ['UK','China'] # pseudo-code: df[df['countries'] not in countries]

我目前的做法如下：

 df = pd.DataFrame({'countries':['US','UK','Germany','China']}) countries = pd.DataFrame({'countries':['UK','China'], 'matched':True}) # IN df.merge(countries,how='inner',on='countries') # NOT IN not_in = df.merge(countries,how='left',on='countries') not_in = not_in[pd.isnull(not_in['matched'])]

但是，这似乎是一个可怕的kludge。任何人都可以改进吗？

你可以使用something.isin(somewhere)和~something.isin(somewhere) ：

 >>> df countries 0 US 1 UK 2 Germany 3 China >>> countries ['UK', 'China'] >>> df.countries.isin(countries) 0 False 1 True 2 False 3 True Name: countries, dtype: bool >>> df[df.countries.isin(countries)] countries 1 UK 3 China >>> df[~df.countries.isin(countries)] countries 0 US 2 Germany

我一直在做像这样的行通用过滤：

 criterion = lambda row: row['countries'] not in countries not_in = df[df.apply(criterion, axis=1)]

使用.query（）方法的替代解决scheme：

 In [5]: df.query("countries in @countries") Out[5]: countries 1 UK 3 China In [6]: df.query("countries not in @countries") Out[6]: countries 0 US 2 Germany

我想过滤掉dfProfilesBusIds的BUSINESS_ID中也有BUSINESS_ID的dfbc行

终于搞定了：

 dfbc = dfbc[(dfbc['BUSINESS_ID'].isin(dfProfilesBusIds['BUSINESS_ID']) == False)]

如何实现pandas数据框的“in”和“not in”

统计每个组内的行数

只保留每个因素水平的最小值

优雅索引到向量/matrix的结尾

在数据框的选定列中包含NA（缺失）值的行的子集

计算R中每个matrix的平均值

随机播放DataFrame行

计算数据子集的统计信息

pandas可以自动识别date吗？

计算列的子集上的行意味着

如何将数据框列转换为数字types？