Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/unix/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何使用条件从现有列在dataframe中创建新列?_Python_Pandas_Dataframe_Series - Fatal编程技术网

Python 如何使用条件从现有列在dataframe中创建新列?

Python 如何使用条件从现有列在dataframe中创建新列?,python,pandas,dataframe,series,Python,Pandas,Dataframe,Series,我有一列包含所有数据,看起来像这样(需要分隔的值有一个类似(c)的标记): 我想把它分成两列,如下所示: London UK Wales UK Liverpool UK Chicago US New York US San Francisco US Seattle US Sydney Australia Perth Australia 问题2:如果这些国家没

我有一列包含所有数据,看起来像这样(需要分隔的值有一个类似(c)的标记):

我想把它分成两列,如下所示:

London          UK
Wales           UK
Liverpool       UK
Chicago         US
New York        US
San Francisco   US
Seattle         US
Sydney          Australia
Perth           Australia

问题2:如果这些国家没有类似(c)的模式,该怎么办?

extract
ffill
extract
ffill
开始,然后删除冗余行

df['country'] = (
    df['data'].str.extract(r'(.*)\s+\(c\)', expand=False).ffill())
df[~df['data'].str.contains('(c)', regex=False)].reset_index(drop=True)

            data    country
0         London         UK
1          Wales         UK
2      Liverpool         UK
3        Chicago         US
4       New York         US
5  San Francisco         US
6        Seattle         US
7         Sydney  Australia
8          Perth  Australia
在哪里,

df['data'].str.extract(r'(.*)\s+\(c\)', expand=False).ffill()

0            UK
1            UK
2            UK
3            UK
4            US
5            US
6            US
7            US
8            US
9     Australia
10    Australia
11    Australia
Name: country, dtype: object
模式
“(.*)\s+\(c\)”
匹配格式为“country(c)”的字符串并提取国家名称。任何与此模式不匹配的内容都将替换为NaN,因此您可以方便地对行进行正向填充


拆分
np.where
ffill
这在“(c)”上分开

您可以首先使用定位以
(c)
结尾的城市,提取国家名称,并填充新的
国家列

相同的提取匹配可用于定位要删除的行,即以下行:


您可以执行以下操作:

data = ['UK (c)','London','Wales','Liverpool','US (c)','Chicago','New York','San Francisco','Seattle','Australia (c)','Sydney','Perth']
df = pd.DataFrame(data, columns = ['city'])
df['country'] = df.city.apply(lambda x : x.replace('(c)','') if '(c)' in x else None)
df.fillna(method='ffill', inplace=True)
df = df[df['city'].str.contains('\(c\)')==False]
输出

+-----+----------------+-----------+
|     |     city       |  country  |
+-----+----------------+-----------+
|  1  | London         | UK        |
|  2  | Wales          | UK        |
|  3  | Liverpool      | UK        |
|  5  | Chicago        | US        |
|  6  | New York       | US        |
|  7  | San Francisco  | US        |
|  8  | Seattle        | US        |
| 10  | Sydney         | Australia |
| 11  | Perth          | Australia |
+-----+----------------+-----------+
您也可以与一起使用:


str包含查找
(c)
,如果存在,则该索引将返回True。如果此条件为真,则国家值将通过
endswith
ffill
+
str.strip

df['country']=df.loc[df.city.str.endswith('(c)'),'city']
df.country=df.country.ffill()
df=df[df.city.ne(df.country)]
df.country=df.country.str.strip('(c)')

extract(“(.*)\s+\(c\)”
将您从
.str.strip()
中保存。如果国家/地区没有类似(c)的模式,该怎么办?@Tsatsa在这种情况下,您可能需要建立一个国家/地区列表,使用
isin
这是一个有点有趣的字符串操作问题,按照这个标签的通常标准,这是一个相对体面的问题,带有样本数据和明确指定的预期输出。我不是在抱怨\_(ツ)_/¯
data = ['UK (c)','London','Wales','Liverpool','US (c)','Chicago','New York','San Francisco','Seattle','Australia (c)','Sydney','Perth']
df = pd.DataFrame(data, columns = ['city'])
df['country'] = df.city.apply(lambda x : x.replace('(c)','') if '(c)' in x else None)
df.fillna(method='ffill', inplace=True)
df = df[df['city'].str.contains('\(c\)')==False]
+-----+----------------+-----------+
|     |     city       |  country  |
+-----+----------------+-----------+
|  1  | London         | UK        |
|  2  | Wales          | UK        |
|  3  | Liverpool      | UK        |
|  5  | Chicago        | US        |
|  6  | New York       | US        |
|  7  | San Francisco  | US        |
|  8  | Seattle        | US        |
| 10  | Sydney         | Australia |
| 11  | Perth          | Australia |
+-----+----------------+-----------+
mask = df['places'].str.contains('(c)', regex = False)
df['country'] = np.where(mask, df['places'], np.nan)
df['country'] = df['country'].str.replace('\(c\)', '').ffill()
df = df[~mask]
df
            places     country
1          London         UK 
2           Wales         UK 
3       Liverpool         UK 
5         Chicago         US 
6        New York         US 
7   San Francisco         US 
8         Seattle         US 
10         Sydney  Australia 
11          Perth  Australia 
df['country']=df.loc[df.city.str.endswith('(c)'),'city']
df.country=df.country.ffill()
df=df[df.city.ne(df.country)]
df.country=df.country.str.strip('(c)')