Python 如何找到以Open_u和Close_u开头的列之间的相关性?
我试图找到150种加密货币的开盘价和收盘价之间的相关性 每个加密货币数据都存储在自己的CSV文件中,如下所示:Python 如何找到以Open_u和Close_u开头的列之间的相关性?,python,pandas,correlation,Python,Pandas,Correlation,我试图找到150种加密货币的开盘价和收盘价之间的相关性 每个加密货币数据都存储在自己的CSV文件中,如下所示: |---------------------|------------------|------------------| | Date | Open | Close | |---------------------|------------------|------------------| | 2019-0
|---------------------|------------------|------------------|
| Date | Open | Close |
|---------------------|------------------|------------------|
| 2019-02-01 00:00:00 | 0.00001115 | 0.00001119 |
|---------------------|------------------|------------------|
| 2019-02-01 00:05:00 | 0.00001116 | 0.00001119 |
|---------------------|------------------|------------------|
| . | . | . |
temporary_dataframe = pandas.DataFrame()
for csv_path, coin in zip(all_csv_paths, coin_name):
data_file = pandas.read_csv(csv_path)
temporary_dataframe[f"Open_{coin}"] = data_file["Open"]
temporary_dataframe[f"Close_{coin}"] = data_file["Close"]
# Create all_open based on temporary_dataframe data.
corr_file = all_open.corr()
print(corr_file.unstack().sort_values().drop_duplicates())
Open_TNT_BTC Close_QKC_BTC 0.996229
Open_ETH_BTC Close_TNT_BTC 0.996312
Open_ADA_BTC Close_ETC_BTC 0.996423
我想找出每个加密货币的Close
和Open
列之间的相关性
现在,我的代码如下所示:
|---------------------|------------------|------------------|
| Date | Open | Close |
|---------------------|------------------|------------------|
| 2019-02-01 00:00:00 | 0.00001115 | 0.00001119 |
|---------------------|------------------|------------------|
| 2019-02-01 00:05:00 | 0.00001116 | 0.00001119 |
|---------------------|------------------|------------------|
| . | . | . |
temporary_dataframe = pandas.DataFrame()
for csv_path, coin in zip(all_csv_paths, coin_name):
data_file = pandas.read_csv(csv_path)
temporary_dataframe[f"Open_{coin}"] = data_file["Open"]
temporary_dataframe[f"Close_{coin}"] = data_file["Close"]
# Create all_open based on temporary_dataframe data.
corr_file = all_open.corr()
print(corr_file.unstack().sort_values().drop_duplicates())
Open_TNT_BTC Close_QKC_BTC 0.996229
Open_ETH_BTC Close_TNT_BTC 0.996312
Open_ADA_BTC Close_ETC_BTC 0.996423
这是输出的一部分(输出的形状为(43661,)
):
问题是我不想看到以下相关性:
- 在以
和Close\u
开头的列之间(例如Close\u
和Close\u USD\u BTC
)Close\u ETH\u BTC
- 在以
和Open\uu
开头的列之间(例如Open\u
和Open\u USD\u BTC
)Open\u ETH\u BTC
- 在同一枚硬币之间(例如,
和Open\u USD\u BTC
)Close\u USD\u BTC
|---------------------|------------------|------------------|
| Date | Open | Close |
|---------------------|------------------|------------------|
| 2019-02-01 00:00:00 | 0.00001115 | 0.00001119 |
|---------------------|------------------|------------------|
| 2019-02-01 00:05:00 | 0.00001116 | 0.00001119 |
|---------------------|------------------|------------------|
| . | . | . |
temporary_dataframe = pandas.DataFrame()
for csv_path, coin in zip(all_csv_paths, coin_name):
data_file = pandas.read_csv(csv_path)
temporary_dataframe[f"Open_{coin}"] = data_file["Open"]
temporary_dataframe[f"Close_{coin}"] = data_file["Close"]
# Create all_open based on temporary_dataframe data.
corr_file = all_open.corr()
print(corr_file.unstack().sort_values().drop_duplicates())
Open_TNT_BTC Close_QKC_BTC 0.996229
Open_ETH_BTC Close_TNT_BTC 0.996312
Open_ADA_BTC Close_ETC_BTC 0.996423
(附言:我很确定这不是我现在做的最优雅的方式。如果有人对如何改进这个脚本有任何建议,我将非常乐意听到)
非常感谢您的帮助 这相当混乱,但它至少显示了一个选项 我正在生成一些随机数据,并使一些后缀(硬币名称)比您的情况更容易
import string
import numpy as np
import pandas as pd
#Generate random data
prefix = ['Open_','Close_']
suffix = string.ascii_uppercase #All uppercase letter to simulate coin-names
var1 = [None] * 100
var2 = [None] * 100
for i in range(len(var1)) :
var1[i] = prefix[np.random.randint(0,len(prefix))] + suffix[np.random.randint(0,len(suffix))]
var2[i] = prefix[np.random.randint(0,len(prefix))] + suffix[np.random.randint(0,len(suffix))]
df = pd.DataFrame(data = {'var1': var1, 'var2':var2 })
df['DropScenario_1'] = False
df['DropScenario_2'] = False
df['DropScenario_3'] = False
df['DropScenario_Final'] = False
df['DropScenario_1'] = df.apply(lambda row: bool(prefix[0] in row.var1) and (prefix[0] in row.var2), axis=1) #Both are Open_
df['DropScenario_2'] = df.apply(lambda row: bool(prefix[1] in row.var1) and (prefix[1] in row.var2), axis=1) #Both are Close_
df['DropScenario_3'] = df.apply(lambda row: bool(row.var1[len(row.var1)-1] == row.var2[len(row.var2)-1]), axis=1) #Both suffixes are the same
#Combine all scenarios
df['DropScenario_Final'] = df['DropScenario_1'] | df['DropScenario_2'] | df['DropScenario_3']
#Keep only the part of the df that we want
df = df[df['DropScenario_Final'] == False]
#Drop our messy columns
df = df.drop(['DropScenario_1','DropScenario_2','DropScenario_3','DropScenario_Final'], axis = 1)
希望这有帮助
注:如果你找到了比特币交易的秘钥,而不是以r/wallstreetbets收场,我将收取5%;) 谢谢你的快速回复,这正是我想要的,我也不想把硬币本身进行比较(例如,
Open\u USD\u BTC
和Close\u USD\u BTC
)对不起,我的评论写得太快了,你已经非常清楚地说明了你的问题。(我删除了我的评论)@VegardKT我试着用要点来解释。我现在将尝试编辑它以使其更清晰。这就是我要做的:为每个问题创建一个二进制数组,如果当前行与您的问题匹配,则在其中指定true/false。将这些添加到一起,然后简单地删除最终数组为true的所有行。您是否尝试过使用这些行?它可能会帮助你快速浏览你的数据集。啊哈,尽我最大的努力,希望有一天能达到目的。我不太明白在引用我的代码之前(之后)应该把你的代码放在哪里。如果你能把我的代码或者至少我的几个变量合并到答案中,那就太好了!非常感谢此代码在您的代码中不起作用。这是一个向您展示如何解决此问题的示例。我很难理解,因为我不知道你们的变量是如何工作的。