Python Pandas dataframe:基于共享相同字符串条目的不同列的色调频率图

Python Pandas dataframe:基于共享相同字符串条目的不同列的色调频率图,python,pandas,matplotlib,seaborn,data-science,Python,Pandas,Matplotlib,Seaborn,Data Science,我正在分析这个Kaggle数据集: 我已经创建了一个包含所有已完成行程的数据帧,其中起始站('StartStn')和结束站('EndStn')不相同,并且每个都有相关信息 我已经创建了起点站的频率图和终点站的单独频率图(见下图): 图1代码: complete['StartStn'].value_counts()[:20]。绘图(kind='bar') 图2代码: complete['EndStn'].value_counts()[:20]。绘图(kind='bar') 下面是datafra

我正在分析这个Kaggle数据集:

我已经创建了一个包含所有已完成行程的数据帧,其中起始站('StartStn')和结束站('EndStn')不相同,并且每个都有相关信息

我已经创建了起点站的频率图和终点站的单独频率图(见下图):

图1代码:
complete['StartStn'].value_counts()[:20]。绘图(kind='bar')

图2代码:
complete['EndStn'].value_counts()[:20]。绘图(kind='bar')

下面是dataframe的一个示例,仅取这两列的子集:

输入:

complete[['StartStn','EndStn']].sample(10)
        StartStn             EndStn
102417  Leytonstone          East Ham
995246  Walthamstow Central  Piccadilly Circus
1102327 Earls Court          Holborn
604323  Stratford            Shepherd's Bush Und
481718  Warren Street        Walthamstow Central
2344106 Marble Arch          Northolt
1234444 Colliers Wood        Holborn
1408620 Earls Court          Marble Arch
465436  Tottenham Court Rd   Mile End
1580309 Woodside Park        Hammersmith D
OUT:

complete[['StartStn','EndStn']].sample(10)
        StartStn             EndStn
102417  Leytonstone          East Ham
995246  Walthamstow Central  Piccadilly Circus
1102327 Earls Court          Holborn
604323  Stratford            Shepherd's Bush Und
481718  Warren Street        Walthamstow Central
2344106 Marble Arch          Northolt
1234444 Colliers Wood        Holborn
1408620 Earls Court          Marble Arch
465436  Tottenham Court Rd   Mile End
1580309 Woodside Park        Hammersmith D
如您所见,许多电台,如“Walthamstow Central”,都在这两列中

问题:

complete[['StartStn','EndStn']].sample(10)
        StartStn             EndStn
102417  Leytonstone          East Ham
995246  Walthamstow Central  Piccadilly Circus
1102327 Earls Court          Holborn
604323  Stratford            Shepherd's Bush Und
481718  Warren Street        Walthamstow Central
2344106 Marble Arch          Northolt
1234444 Colliers Wood        Holborn
1408620 Earls Court          Marble Arch
465436  Tottenham Court Rd   Mile End
1580309 Woodside Park        Hammersmith D
使用seaborn、matplotlib或pandas,如何为所有具有StartStn vs EndStn色调(即在同一轴上)的电台创建频率图

我所能做的就是创建一个包含所有电台的频率图,将“StartStn”和“EndStn”中的频率组合在一起:

stations = pd.concat([complete['StartStn'],complete['EndStn']],axis=0)
stations.value_counts()[:10].plot(kind='bar')
这给了我以下输出:
最受欢迎的电台(开始或结束)

如果您有任何建议,我们将不胜感激

非常感谢

贝尼王子 您可以使用seaborn的countplot,并使用Startstn和Endstn作为“色调”,以便每个站有2个条。 请在下面找到合适的代码。我试过你的10件样品

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from collections import OrderedDict

columns = ['StartStn','EndStn']
startstn = ['Leytonstone','Walthamstow','Earls Court','Stratford','Warren Street','Marble Arch','Colliers Wood',
            'Earls Court','Tottenham Court Rd','Woodside Park']
endstn = ['East Ham','Piccadilly Circus','Holborn','Shepherds Bush Und','Walthamstow Central','Northolt',
          'Holborn','Marble Arch','Mile End','Hammersmith D']
df = pd.DataFrame(data={'StartStn':startstn,'EndStn':endstn})
print(df)

df['hue'] = 'Start'
df['Stations'] = df['StartStn']
df_start = df[['Stations','hue']]
df['hue'] = 'End'
df['Stations'] = df['EndStn']
df_end = df[['Stations','hue']]

orderstart = df['StartStn'].value_counts()
startstnlist = orderstart.index.tolist()
orderend = df['EndStn'].value_counts()
endstnlist = orderend.index.tolist()
order = startstnlist+endstnlist
order = list(OrderedDict.fromkeys(order))

df_concatenated = pd.concat([df_start,df_end],ignore_index=True)
sns.countplot(data=df_concatenated,x='Stations', order=order,hue='hue')
plt.show()
编辑: 我已经包括了一段代码,这样图表是有序的,顺序由startstation频率给出