Pandas 如何让matplot以不同的颜色打印所有异常值,而不仅仅是一种颜色

Pandas 如何让matplot以不同的颜色打印所有异常值,而不仅仅是一种颜色,pandas,filter,colors,scatter-plot,outliers,Pandas,Filter,Colors,Scatter Plot,Outliers,我有一个基本的散点图,想用不同的颜色显示所有的异常值。我将异常值定义为平均值的两个以上标准偏差。我生成的代码只显示一个异常值,而我希望所有异常值都是不同的颜色: import pandas as pd import matplotlib.pyplot as plt import numpy as np data = pd.read_csv('1fXr31hcEemkYxLyQ1aU1g_50fc36ee697c4b158fe26ade3ec3bc24_Banknote-authenticatio

我有一个基本的散点图,想用不同的颜色显示所有的异常值。我将异常值定义为平均值的两个以上标准偏差。我生成的代码只显示一个异常值,而我希望所有异常值都是不同的颜色:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
data = pd.read_csv('1fXr31hcEemkYxLyQ1aU1g_50fc36ee697c4b158fe26ade3ec3bc24_Banknote-authentication-dataset- (1).csv')
data = np.array(data)
mean = np.mean(data, 0)
min = np.min(data,0)
max = np.max(data,0)
normed = (data - min) / (max - min)
mean = np.mean(normed, 0)
std_dev = np.std (normed, 0)
fig, graph = plt.subplots()
graph.scatter(normed [:,0], normed [:,1])
graph.scatter(mean[0], mean[1])
outliers = normed[normed>2*std_dev]
graph.scatter(outliers [0], outliers [1], c='red')
plt.show

一种简单的方法是在数据框中创建一个新列来识别异常值,然后将其输入到
plt.scatter()
中的
c
参数中:

将熊猫作为pd导入
将numpy作为np导入
将matplotlib.pyplot作为plt导入
df=pd.DataFrame({'x':np.random.normal(0,size=100),
“y”:np.random.normal(0,大小=100)})
#标识x和y的平均值
x_mean=df['x'].mean()
y_mean=df['y'].mean()
#确定标准偏差乘以2
x_std2=x_mean+df['x'].std()*2
y_std2=y_mean+df['y'].std()*2
#创建一个新列,指示值是否低于或高于平均值+/-标准偏差的2倍

df['outlier']=((x_std2*-1一种简单的方法是在数据帧中创建一个新列来识别异常值,然后将其输入
plt.scatter()中的
c
参数中:

将熊猫作为pd导入
将numpy作为np导入
将matplotlib.pyplot作为plt导入
df=pd.DataFrame({'x':np.random.normal(0,size=100),
“y”:np.random.normal(0,大小=100)})
#标识x和y的平均值
x_mean=df['x'].mean()
y_mean=df['y'].mean()
#确定标准偏差乘以2
x_std2=x_mean+df['x'].std()*2
y_std2=y_mean+df['y'].std()*2
#创建一个新列,指示值是否低于或高于平均值+/-标准偏差的2倍
df['outlier']=((x_std2*-1
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.DataFrame({'x' : np.random.normal(0, size = 100),
                   'y' : np.random.normal(0, size = 100)})

# Identifies the means of x and y
x_mean = df['x'].mean()
y_mean = df['y'].mean()

# Identify the standard deviation multiplied by 2
x_std2 = x_mean + df['x'].std()*2
y_std2 = y_mean + df['y'].std()*2

# Create a  new column indicating if a value is below or above the mean +/- 2 times the standard deviation
df['outlier'] = (((x_std2*-1 <= df['x']) & (df['x'] <= x_std2)) & 
                  ((y_std2*-1 <= df['y']) & (df['y'] <= y_std2)))

# Here we use the indicator to signify the color that point should be assigned
plt.scatter(df['x'],
            df['y'],
            s = 15,
            c = df['outlier'],
            cmap = 'RdYlGn')
plt.xlabel('X')
plt.ylabel('Y')

# I just added a couple reference lines so you can see that the points are indeed below or above the mean +/- 2 times the standard deviation
plt.axvline(x_mean, linestyle = '--', color = 'k')
plt.axhline(y_mean, linestyle = '--', color = 'k')
plt.axvline(x_std2, linestyle = ':', color = 'k')
plt.axhline(y_std2, linestyle = ':', color = 'k')
plt.axvline(x_std2*-1, linestyle = ':', color = 'k')
plt.axhline(y_std2*-1, linestyle = ':', color = 'k')