Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/postgresql/9.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何找到以Open_u和Close_u开头的列之间的相关性?_Python_Pandas_Correlation - Fatal编程技术网

Python 如何找到以Open_u和Close_u开头的列之间的相关性?

Python 如何找到以Open_u和Close_u开头的列之间的相关性?,python,pandas,correlation,Python,Pandas,Correlation,我试图找到150种加密货币的开盘价和收盘价之间的相关性 每个加密货币数据都存储在自己的CSV文件中,如下所示: |---------------------|------------------|------------------| | Date | Open | Close | |---------------------|------------------|------------------| | 2019-0

我试图找到150种加密货币的开盘价和收盘价之间的相关性

每个加密货币数据都存储在自己的CSV文件中,如下所示:

|---------------------|------------------|------------------|
|         Date        |       Open       |       Close      |
|---------------------|------------------|------------------|
| 2019-02-01 00:00:00 |    0.00001115    |    0.00001119    |
|---------------------|------------------|------------------|
| 2019-02-01 00:05:00 |    0.00001116    |    0.00001119    |
|---------------------|------------------|------------------|
|         .           |        .         |         .        |
temporary_dataframe = pandas.DataFrame()
for csv_path, coin in zip(all_csv_paths, coin_name):
    data_file = pandas.read_csv(csv_path)
    temporary_dataframe[f"Open_{coin}"] = data_file["Open"]
    temporary_dataframe[f"Close_{coin}"] = data_file["Close"]
# Create all_open based on temporary_dataframe data.


corr_file = all_open.corr() 
print(corr_file.unstack().sort_values().drop_duplicates())
Open_TNT_BTC     Close_QKC_BTC      0.996229
Open_ETH_BTC     Close_TNT_BTC      0.996312
Open_ADA_BTC     Close_ETC_BTC      0.996423
我想找出每个加密货币的
Close
Open
列之间的相关性

现在,我的代码如下所示:

|---------------------|------------------|------------------|
|         Date        |       Open       |       Close      |
|---------------------|------------------|------------------|
| 2019-02-01 00:00:00 |    0.00001115    |    0.00001119    |
|---------------------|------------------|------------------|
| 2019-02-01 00:05:00 |    0.00001116    |    0.00001119    |
|---------------------|------------------|------------------|
|         .           |        .         |         .        |
temporary_dataframe = pandas.DataFrame()
for csv_path, coin in zip(all_csv_paths, coin_name):
    data_file = pandas.read_csv(csv_path)
    temporary_dataframe[f"Open_{coin}"] = data_file["Open"]
    temporary_dataframe[f"Close_{coin}"] = data_file["Close"]
# Create all_open based on temporary_dataframe data.


corr_file = all_open.corr() 
print(corr_file.unstack().sort_values().drop_duplicates())
Open_TNT_BTC     Close_QKC_BTC      0.996229
Open_ETH_BTC     Close_TNT_BTC      0.996312
Open_ADA_BTC     Close_ETC_BTC      0.996423
这是输出的一部分(输出的形状为
(43661,)
):


问题是我不想看到以下相关性:

  • 在以
    Close\u
    Close\u
    开头的列之间(例如
    Close\u USD\u BTC
    Close\u ETH\u BTC
  • 在以
    Open\uu
    Open\u
    开头的列之间(例如
    Open\u USD\u BTC
    Open\u ETH\u BTC
  • 在同一枚硬币之间(例如,
    Open\u USD\u BTC
    Close\u USD\u BTC
简而言之,完美的输出如下所示:

|---------------------|------------------|------------------|
|         Date        |       Open       |       Close      |
|---------------------|------------------|------------------|
| 2019-02-01 00:00:00 |    0.00001115    |    0.00001119    |
|---------------------|------------------|------------------|
| 2019-02-01 00:05:00 |    0.00001116    |    0.00001119    |
|---------------------|------------------|------------------|
|         .           |        .         |         .        |
temporary_dataframe = pandas.DataFrame()
for csv_path, coin in zip(all_csv_paths, coin_name):
    data_file = pandas.read_csv(csv_path)
    temporary_dataframe[f"Open_{coin}"] = data_file["Open"]
    temporary_dataframe[f"Close_{coin}"] = data_file["Close"]
# Create all_open based on temporary_dataframe data.


corr_file = all_open.corr() 
print(corr_file.unstack().sort_values().drop_duplicates())
Open_TNT_BTC     Close_QKC_BTC      0.996229
Open_ETH_BTC     Close_TNT_BTC      0.996312
Open_ADA_BTC     Close_ETC_BTC      0.996423
(附言:我很确定这不是我现在做的最优雅的方式。如果有人对如何改进这个脚本有任何建议,我将非常乐意听到)


非常感谢您的帮助

这相当混乱,但它至少显示了一个选项

我正在生成一些随机数据,并使一些后缀(硬币名称)比您的情况更容易

import string
import numpy as np
import pandas as pd


#Generate random data
prefix = ['Open_','Close_']
suffix = string.ascii_uppercase #All uppercase letter to simulate coin-names

var1 = [None] * 100
var2 = [None] * 100

for i in range(len(var1)) :
    var1[i] = prefix[np.random.randint(0,len(prefix))] + suffix[np.random.randint(0,len(suffix))]
    var2[i] = prefix[np.random.randint(0,len(prefix))] + suffix[np.random.randint(0,len(suffix))]

df = pd.DataFrame(data = {'var1': var1, 'var2':var2 })

df['DropScenario_1'] = False
df['DropScenario_2'] = False
df['DropScenario_3'] = False
df['DropScenario_Final'] = False

df['DropScenario_1'] = df.apply(lambda row: bool(prefix[0] in row.var1) and (prefix[0] in row.var2), axis=1) #Both are Open_
df['DropScenario_2'] = df.apply(lambda row: bool(prefix[1] in row.var1) and (prefix[1] in row.var2), axis=1) #Both are Close_
df['DropScenario_3'] = df.apply(lambda row: bool(row.var1[len(row.var1)-1] == row.var2[len(row.var2)-1]), axis=1) #Both suffixes are the same

#Combine all scenarios
df['DropScenario_Final'] = df['DropScenario_1'] | df['DropScenario_2'] | df['DropScenario_3']

#Keep only the part of the df that we want
df = df[df['DropScenario_Final'] == False]

#Drop our messy columns
df = df.drop(['DropScenario_1','DropScenario_2','DropScenario_3','DropScenario_Final'], axis = 1)
希望这有帮助


注:如果你找到了比特币交易的秘钥,而不是以r/wallstreetbets收场,我将收取5%;)

谢谢你的快速回复,这正是我想要的,我也不想把硬币本身进行比较(例如,
Open\u USD\u BTC
Close\u USD\u BTC
)对不起,我的评论写得太快了,你已经非常清楚地说明了你的问题。(我删除了我的评论)@VegardKT我试着用要点来解释。我现在将尝试编辑它以使其更清晰。这就是我要做的:为每个问题创建一个二进制数组,如果当前行与您的问题匹配,则在其中指定true/false。将这些添加到一起,然后简单地删除最终数组为true的所有行。您是否尝试过使用这些行?它可能会帮助你快速浏览你的数据集。啊哈,尽我最大的努力,希望有一天能达到目的。我不太明白在引用我的代码之前(之后)应该把你的代码放在哪里。如果你能把我的代码或者至少我的几个变量合并到答案中,那就太好了!非常感谢此代码在您的代码中不起作用。这是一个向您展示如何解决此问题的示例。我很难理解,因为我不知道你们的变量是如何工作的。