Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/297.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在python中为列创建转换矩阵?_Python_Pandas - Fatal编程技术网

如何在python中为列创建转换矩阵?

如何在python中为列创建转换矩阵?,python,pandas,Python,Pandas,如何在python中将列B转换为转换矩阵 矩阵的大小为19,这是B列中唯一的值。 数据集中总共有432行 time A B 2017-10-26 09:00:00 36 816 2017-10-26 10:45:00 43 816 2017-10-26 12:30:00 50 998 2017-10-26 12:45:00 51 750 2017-10-26 13:00:00 52

如何在python中将列B转换为转换矩阵

矩阵的大小为19,这是B列中唯一的值。 数据集中总共有432行


time                A          B
2017-10-26 09:00:00  36       816
2017-10-26 10:45:00  43       816
2017-10-26 12:30:00  50       998
2017-10-26 12:45:00  51       750
2017-10-26 13:00:00  52       998
2017-10-26 13:15:00  53       998
2017-10-26 13:30:00  54       998
2017-10-26 14:00:00  56       998
2017-10-26 14:15:00  57       834
2017-10-26 14:30:00  58      1285
2017-10-26 14:45:00  59      1288
2017-10-26 23:45:00  95      1285
2017-10-27 03:00:00  12      1285
2017-10-27 03:30:00  14      1285
                             ... 
2017-11-02 14:00:00  56       998
2017-11-02 14:15:00  57       998
2017-11-02 14:30:00  58       998
2017-11-02 14:45:00  59       998
2017-11-02 15:00:00  60       816
2017-11-02 15:15:00  61       275
2017-11-02 15:30:00  62       225
2017-11-02 15:45:00  63      1288
2017-11-02 16:00:00  64      1088
2017-11-02 18:15:00  73      1285
2017-11-02 20:30:00  82      1285
2017-11-02 21:00:00  84      1088
2017-11-02 21:15:00  85      1088
2017-11-02 21:30:00  86      1088
2017-11-02 22:00:00  88      1088
2017-11-02 22:30:00  90      1088
2017-11-02 23:00:00  92      1088
2017-11-02 23:30:00  94      1088
2017-11-02 23:45:00  95      1088


矩阵应包含它们之间的转换次数

 B -----------------1088------1288----------------------------
B  
.
.
1088                   8         2
.
.
.
.
.            Number of transitions between them.
..
.
.


我使用您的数据仅创建列
B
的数据框,但它也应适用于所有列

text = '''time                A          B
2017-10-26 09:00:00  36       816
2017-10-26 10:45:00  43       816
2017-10-26 12:30:00  50       998
2017-10-26 12:45:00  51       750
2017-10-26 13:00:00  52       998
2017-10-26 13:15:00  53       998
2017-10-26 13:30:00  54       998
2017-10-26 14:00:00  56       998
2017-10-26 14:15:00  57       834
2017-10-26 14:30:00  58      1285
2017-10-26 14:45:00  59      1288
2017-10-26 23:45:00  95      1285
2017-10-27 03:00:00  12      1285
2017-10-27 03:30:00  14      1285
2017-11-02 14:00:00  56       998
2017-11-02 14:15:00  57       998
2017-11-02 14:30:00  58       998
2017-11-02 14:45:00  59       998
2017-11-02 15:00:00  60       816
2017-11-02 15:15:00  61       275
2017-11-02 15:30:00  62       225
2017-11-02 15:45:00  63      1288
2017-11-02 16:00:00  64      1088
2017-11-02 18:15:00  73      1285
2017-11-02 20:30:00  82      1285
2017-11-02 21:00:00  84      1088
2017-11-02 21:15:00  85      1088
2017-11-02 21:30:00  86      1088
2017-11-02 22:00:00  88      1088
2017-11-02 22:30:00  90      1088
2017-11-02 23:00:00  92      1088
2017-11-02 23:30:00  94      1088
2017-11-02 23:45:00  95      1088'''

import pandas as pd

B = [int(row[29:].strip()) for row in text.split('\n') if 'B' not in row]
df = pd.DataFrame({'B': B})
我在Column中获得唯一的值,以便稍后使用它创建矩阵

numbers = sorted(df['B'].unique())
print(numbers)

[225, 275, 750, 816, 834, 998, 1088, 1285, 1288]
我创建了移位列
C
,因此每行中都有这两个值

df['C'] = df.shift(-1)
print(df)

       B       C
0    816   816.0
1    816   998.0
2    998   750.0
3    750   998.0
我按
['B','C']
分组,这样我就可以数对了

groups = df.groupby(['B', 'C'])
counts = {i[0]:(len(i[1]) if i[0][0] != i[0][1] else 0) for i in groups} # don't count (816,816)
# counts = {i[0]:len(i[1]) for i in groups} # count even (816,816)
print(counts)

{(225, 1288.0): 2, (275, 225.0): 2, (750, 998.0): 2, (816, 275.0): 2, (816, 816.0): 2, (816, 998.0): 2, (834, 1285.0): 2, (998, 750.0): 2, (998, 816.0): 2, (998, 834.0): 2, (998, 998.0): 12, (1088, 1088.0): 14, (1088, 1285.0): 2, (1285, 998.0): 2, (1285, 1088.0): 2, (1285, 1285.0): 6, (1285, 1288.0): 2, (1288, 1088.0): 2, (1288, 1285.0): 2}
现在我可以创建矩阵了。使用
数字
计数
创建列/系列(使用正确的
索引
),并将其添加到矩阵中

matrix = pd.DataFrame()

for x in numbers:
    matrix[x] = pd.Series([counts.get((x,y), 0) for y in numbers], index=numbers)

print(matrix)
结果

      225  275  750  816  834  998  1088  1285  1288
225     0    2    0    0    0    0     0     0     0
275     0    0    0    2    0    0     0     0     0
750     0    0    0    0    0    2     0     0     0
816     0    0    0    2    0    2     0     0     0
834     0    0    0    0    0    2     0     0     0
998     0    0    2    2    0   12     0     2     0
1088    0    0    0    0    0    0    14     2     2
1285    0    0    0    0    2    0     2     6     2
1288    2    0    0    0    0    0     0     2     0
完整示例

text = '''time                A          B
2017-10-26 09:00:00  36       816
2017-10-26 10:45:00  43       816
2017-10-26 12:30:00  50       998
2017-10-26 12:45:00  51       750
2017-10-26 13:00:00  52       998
2017-10-26 13:15:00  53       998
2017-10-26 13:30:00  54       998
2017-10-26 14:00:00  56       998
2017-10-26 14:15:00  57       834
2017-10-26 14:30:00  58      1285
2017-10-26 14:45:00  59      1288
2017-10-26 23:45:00  95      1285
2017-10-27 03:00:00  12      1285
2017-10-27 03:30:00  14      1285
2017-11-02 14:00:00  56       998
2017-11-02 14:15:00  57       998
2017-11-02 14:30:00  58       998
2017-11-02 14:45:00  59       998
2017-11-02 15:00:00  60       816
2017-11-02 15:15:00  61       275
2017-11-02 15:30:00  62       225
2017-11-02 15:45:00  63      1288
2017-11-02 16:00:00  64      1088
2017-11-02 18:15:00  73      1285
2017-11-02 20:30:00  82      1285
2017-11-02 21:00:00  84      1088
2017-11-02 21:15:00  85      1088
2017-11-02 21:30:00  86      1088
2017-11-02 22:00:00  88      1088
2017-11-02 22:30:00  90      1088
2017-11-02 23:00:00  92      1088
2017-11-02 23:30:00  94      1088
2017-11-02 23:45:00  95      1088'''

import pandas as pd

B = [int(row[29:].strip()) for row in text.split('\n') if 'B' not in row]
df = pd.DataFrame({'B': B})

numbers = sorted(df['B'].unique())
print(numbers)

df['C'] = df.shift(-1)
print(df)

groups = df.groupby(['B', 'C'])
counts = {i[0]:(len(i[1]) if i[0][0] != i[0][1] else 0) for i in groups} # don't count (816,816)
# counts = {i[0]:len(i[1]) for i in groups} # count even (816,816)
print(counts)

matrix = pd.DataFrame()

for x in numbers:
    matrix[str(x)] = pd.Series([counts.get((x,y), 0) for y in numbers], index=numbers)

print(matrix)

编辑:

counts = {i[0]:(len(i[1]) if i[0][0] != i[0][1] else 0) for i in groups} # don't count (816,816)
counts = {}
for pair, group in groups:
    if pair[0] != pair[1]:  # don't count (816,816)

        #counts[(A,B)] = len((A,B)) + len((B,A)) 
        if pair not in counts:
            counts[pair] = len(group) # put first value
        else:
            counts[pair] += len(group) # add second value

        #counts[(B,A)] = len((A,B)) + len((B,A)) 
        if (pair[1],pair[0]) not in counts:
            counts[(pair[1],pair[0])] = len(group) # put first value
        else:
            counts[(pair[1],pair[0])] += len(group) # add second value
    else:  
        counts[pair] = 0 # (816,816) gives 0

#counts[(A,B)] == counts[(B,A)]

counts_2 = {}               
for pair, count in counts.items():
    if count > 10 :
        counts_2[pair] = -count
    else:
        counts_2[pair] = count

matrix = pd.DataFrame()

for x in numbers:
    matrix[str(x)] = pd.Series([counts_2.get((x,y), 0) for y in numbers], index=numbers)

print(matrix)
作为循环的正常

counts = {}
for pair, group in groups:
    if pair[0] != pair[1]:  # don't count (816,816)
        counts[pair] = len(group)
    else:  
        counts[pair] = 0
大于T 10时反转数值

counts = {}
for pair, group in groups:
    if pair[0] != pair[1]:  # don't count (816,816)
        count = len(group)
        if count > 10 :
            counts[pair] = -count
        else
            counts[pair] = count
    else:  
        counts[pair] = 0

编辑:

counts = {i[0]:(len(i[1]) if i[0][0] != i[0][1] else 0) for i in groups} # don't count (816,816)
counts = {}
for pair, group in groups:
    if pair[0] != pair[1]:  # don't count (816,816)

        #counts[(A,B)] = len((A,B)) + len((B,A)) 
        if pair not in counts:
            counts[pair] = len(group) # put first value
        else:
            counts[pair] += len(group) # add second value

        #counts[(B,A)] = len((A,B)) + len((B,A)) 
        if (pair[1],pair[0]) not in counts:
            counts[(pair[1],pair[0])] = len(group) # put first value
        else:
            counts[(pair[1],pair[0])] += len(group) # add second value
    else:  
        counts[pair] = 0 # (816,816) gives 0

#counts[(A,B)] == counts[(B,A)]

counts_2 = {}               
for pair, count in counts.items():
    if count > 10 :
        counts_2[pair] = -count
    else:
        counts_2[pair] = count

matrix = pd.DataFrame()

for x in numbers:
    matrix[str(x)] = pd.Series([counts_2.get((x,y), 0) for y in numbers], index=numbers)

print(matrix)

另一种基于熊猫的方法。注意,我使用了shift(1),这意味着下一个数字是transition:

text = '''time                A          B
2017-10-26 09:00:00  36       816
2017-10-26 10:45:00  43       816
2017-10-26 12:30:00  50       998
2017-10-26 12:45:00  51       750
2017-10-26 13:00:00  52       998
2017-10-26 13:15:00  53       998
2017-10-26 13:30:00  54       998
2017-10-26 14:00:00  56       998
2017-10-26 14:15:00  57       834
2017-10-26 14:30:00  58      1285
2017-10-26 14:45:00  59      1288
2017-10-26 23:45:00  95      1285
2017-10-27 03:00:00  12      1285
2017-10-27 03:30:00  14      1285
2017-11-02 14:00:00  56       998
2017-11-02 14:15:00  57       998
2017-11-02 14:30:00  58       998
2017-11-02 14:45:00  59       998
2017-11-02 15:00:00  60       816
2017-11-02 15:15:00  61       275
2017-11-02 15:30:00  62       225
2017-11-02 15:45:00  63      1288
2017-11-02 16:00:00  64      1088
2017-11-02 18:15:00  73      1285
2017-11-02 20:30:00  82      1285
2017-11-02 21:00:00  84      1088
2017-11-02 21:15:00  85      1088
2017-11-02 21:30:00  86      1088
2017-11-02 22:00:00  88      1088
2017-11-02 22:30:00  90      1088
2017-11-02 23:00:00  92      1088
2017-11-02 23:30:00  94      1088
2017-11-02 23:45:00  95      1088'''

import pandas as pd

B = [int(row[29:].strip()) for row in text.split('\n') if 'B' not in row]
df = pd.DataFrame({'B': B})
# alternative approach
df['C'] = df['B'].shift(1)  # shift forward so B transitions to C

df['counts'] = 1  # add an arbirtary counts column for group by

# group together the combinations then unstack to get matrix
trans_matrix = df.groupby(['B', 'C']).count().unstack()

# max the columns a bit neater
trans_matrix.columns = trans_matrix.columns.droplevel()
结果是:


我认为这是正确的,也就是说,当你观察225的时候,它会转换到1288。您只需除以样本大小即可得到每个值的概率转移矩阵。

如果使用
pandas
,则添加标记
pandas
。在纯Python中,您可以使用
zip(B,B[1:])
创建对,并使用
Counter()
对它们进行计数。需要更多的工作来用这些数据填充列表/矩阵。在
pandas
中,您可以使用
shift()
创建列
B[1:][/code>和
groupby
对它们进行计数。同样,需要做更多的工作来用结果填充新的
df
。谢谢Furas。但结果中的转换数大于行数。我认为它应该等于行数。如果相同的数字之间存在转换,我们如何填充0呢。示例:如果转换为(1088.0,1088.0):411,那么我们应该在411处填充0。它应该是
rows-1
,因为最后一行没有转换。现在我看到了问题-它必须是
len(I[1])
而不是
I[1]。大小
是因为组中有
len(I[1])
行,但每行有两个元素,所以
I[1]。大小=2*len(I[1])
计数={I[0]:len I[1])如果I[0]!=I[0][1]否则组中的I应为
(1088.0,1088.0):0
我们可以为上述示例创建距离矩阵吗。如果过渡计数为10,则距离为反方向。((此处A列为地图上的随机点。如果有更多的过渡,则表示点之间的距离较小。))