Python 如何将逗号分隔字符串列中的每个值提取到单独的行中?
我有一个csv,我正在导入到数据帧中。我试图将一个包含一组逗号分隔值的列拆分为行Python 如何将逗号分隔字符串列中的每个值提取到单独的行中?,python,pandas,Python,Pandas,我有一个csv,我正在导入到数据帧中。我试图将一个包含一组逗号分隔值的列拆分为行 df_supplier = pd.read_csv(wf['local_filename']) print(list(df_supplier)) col = 'Commodities (Use Ctrl to select multiple)' melt_col = 'Supplier (DTRM ID)' df_supplier_commodities = df_supplier.loc[:, col]\
df_supplier = pd.read_csv(wf['local_filename'])
print(list(df_supplier))
col = 'Commodities (Use Ctrl to select multiple)'
melt_col = 'Supplier (DTRM ID)'
df_supplier_commodities = df_supplier.loc[:, col]\
.apply(pd.Series)\
.reset_index()\
.melt(id_vars=melt_col)\
.dropna()\
.loc[:[melt_col, col]]\
.set_index(melt_col)
这是我已经想出的一段代码,是的,我知道列标题是荒谬的,但我不做CSV。因此,它包含以下标题:
['Supplier (DTRM ID)', 'Status', 'Sent for Approval Date', 'Approval Date', 'Legal Company Name', 'Supplier ID', 'Company Description (Owner To Complete)', 'Parent Supplier ID', 'Parent Supplier Name', 'List of Affiliates', 'Category Manager', 'Country', 'DUNS code', 'Trade register name', 'Commodities (Use Ctrl to select multiple)', 'Default Commodity', 'City', 'State', 'Payment Terms', 'Deactivated', 'Tag', 'Created by', 'Creation Date']
必要的标题是供应商DTRM ID,然后每个商品使用Ctrl键选择多个。一个供应商可以将多个商品分配给一个供应商ID,因此每行商品都具有适当的供应商ID
代码错误如下所示:
Traceback (most recent call last):
File "/home/ec2-user/determine_etl/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2656, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Supplier (DTRM ID)'
0 1 2
0 11 5.1 2.8
1 6 4 0
2 0 2 0
但printlistdf_供应商显示钥匙在那里。我做错了什么
我想确保我已经清楚了,所以我将给出一个dataframe中数据布局的示例:
+--------------------+---------------------------------------------+
| Supplier (DTRM ID) | Commodities (Use Ctrl to select multiple) |
+--------------------+---------------------------------------------+
| 12333 | Strawberry, Raspberry, Flamingo, Snozzberry |
+--------------------+---------------------------------------------+
以下是我试图获得的输出:
+--------------------+-------------------------------------------+
| Supplier (DTRM ID) | Commodities (Use Ctrl to select multiple) |
+--------------------+-------------------------------------------+
| 12333 | Strawberry |
| 12333 | Raspberry |
| 12333 | Flamingo |
| 12333 | Snozzberry |
+--------------------+-------------------------------------------+
我以为我的代码可以做到这一点,但它告诉我供应商DTRM ID不是有效的密钥。请参阅回溯,听起来您有如下内容:
df = pd.DataFrame({
'A': ['11, 5.1, 2.8','6, 4, 0','0, 2, 0']
})
A
0 11, 5.1, 2.8
1 6, 4, 0
2 0, 2, 0
一列A带有分隔的值
您可以执行以下操作,将每个值放入自己的列中:
df['A'].str.split(',', expand = True)
您将获得以下信息:
Traceback (most recent call last):
File "/home/ec2-user/determine_etl/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2656, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Supplier (DTRM ID)'
0 1 2
0 11 5.1 2.8
1 6 4 0
2 0 2 0
包含列0,1,2。然后可以使用.rename更改列名,使用.T转换并使其成为行。如果没有示例数据帧,就很难准确理解您要做什么
编辑:
这对我很有用:
pd.concat([df['Supplier (DTRM ID)'], df['Commodities (Use Ctrl to select multiple)'].str.split(',', expand = True)], axis = 1)\
.melt(id_vars=['Supplier (DTRM ID)'])\
.sort_values(by = 'Supplier (DTRM ID)')\
.rename(columns = {'value': 'Commodities (Use Ctrl to select multiple)'})\
.drop(columns = ['variable'])\
.dropna()
\r\n是为了可读性
最好的选择是使用,然后添加到列表中
作为pd进口熊猫
df=pd.数据框{‘供应商’:[1233312334],‘商品’:[‘草莓、覆盆子、火烈鸟、斯诺兹莓’、‘牛排、龙虾、鲑鱼、金枪鱼’}
displaydf
供应商商品
0 12333草莓、覆盆子、火烈鸟、斯诺兹莓
12334牛排、龙虾、鲑鱼、金枪鱼
将字符串拆分为列表
df['productions']=df['productions'].str.split','
分解列表
df=df.分解“商品”。重置\u indexdrop=True
displaydf
供应商商品
0 12333草莓
12333树莓
2 12333火烈鸟
312333斯诺兹贝里
4 12334牛排
512334龙虾
6 12334鲑鱼
71234金枪鱼
这回答了你的问题吗?感谢您根据比提出此问题时更新的熊猫版本提供更新的答案。@Shenanigator很乐意提供帮助。