使用MySQL从长到宽重新格式化(重塑)数据表

使用MySQL从长到宽重新格式化(重塑)数据表,mysql,reshape,Mysql,Reshape,MySQL新手,试图继承R 我有一个具有两列的数据表,类似于以下内容,具有级别2 id和嵌套id: level2id | nestedid | 1 | 1 | 1 | 2 | 1 | 3 | 2 | 1 | 2 | 2 | ... 我想在一个新表中使用MYSql重新构造数据,如下所示: level2id | nestedid1 | nestedid2 |

MySQL新手,试图继承R

我有一个具有两列的数据表,类似于以下内容,具有级别2 id和嵌套id:

level2id | nestedid |
1        | 1        |
1        | 2        |
1        | 3        |
2        | 1        |
2        | 2        |
...
我想在一个新表中使用MYSql重新构造数据,如下所示:

level2id | nestedid1 | nestedid2 | nestedid3 |
1        | 1         | 2         | 3         |
2        | 1         | 2         |           |
...

这样,我以后就可以执行连接来提取嵌套id上的信息,从而为与level2 id相关的变量创建聚合值。在R中,使用“时变”数据的重塑很简单,但找不到这种特定格式的明显解决方案(即,如果数据不是按列中的属性名称和属性值进行组织的。请提前感谢!

虽然您不能将其作为选择,但您可以使用插入来实现这一点,仅当主键为level2id或该level2id上有唯一索引时,插入才有效。)

表结构

CREATE TABLE `table2` (
  `level2id` int(11) NOT NULL DEFAULT '0',
  `nestedid1` int(11) NOT NULL,
  `nestedid2` int(11) NOT NULL,
  `nestedid3` int(11) NOT NULL,
  PRIMARY KEY (`level2id`)
) ENGINE=InnoDB;
insert SQL语句用旧表替换表1

INSERT INTO table2 (level2id, nestedid1) SELECT level2id, nestedid FROM table1 WHERE nestedid = 1 ON DUPLICATE KEY UPDATE nestedid1 = nestedid;
INSERT INTO table2 (level2id, nestedid2) SELECT level2id, nestedid FROM table1 WHERE nestedid = 2 ON DUPLICATE KEY UPDATE nestedid2 = nestedid;
INSERT INTO table2 (level2id, nestedid3) SELECT level2id, nestedid FROM table1 WHERE nestedid = 3 ON DUPLICATE KEY UPDATE nestedid3 = nestedid;
ON DUPLICATE KEY UPDATE是一个MySQL扩展。这里有更多详细信息

我也遇到了类似的问题。 也许您想看看sql中的动态数据透视。
。但是,我不建议您只使用R中的重塑命令。

您可以使用MySQL创建一个MySQL程序来修复此问题:

USE test;

/*Create long input table 'test' with variables of varying length*/
DROP TABLE nums;
CREATE TABLE nums (id INT(2));
INSERT INTO nums
VALUES 
(0), (1), (2), (3), (4), (5), (6), (7);

DROP TABLE test;
CREATE TABLE test (id INT(2), var VARCHAR(5), attribute VARCHAR(6), PRIMARY KEY (id, var));
INSERT INTO test
SELECT nums3.*, REPEAT(CHAR(97+RAND()*24),CAST(6.*RAND() AS INT)) AS attribute
 FROM (SELECT DISTINCT nums2.id1 as id, CONCAT('var', LPAD(CAST(16.*RAND() AS INT),2,'0')) AS var  
 FROM (SELECT DISTINCT nums.id as id1, nums1.id as id2 FROM nums, nums as nums1) AS nums2) AS nums3;

/*Create SQL program to convert long to wide format (R: reshape)*/
SELECT DISTINCT CONCAT('DROP TABLE result;\nCREATE TABLE result (id INT(2),
', GROUP_CONCAT(CONCAT(field) SEPARATOR ', '), ');')
FROM 
(SELECT DISTINCT CONCAT(var, CONCAT(' VARCHAR(', max(length(attribute)), ')')) AS field
FROM test GROUP BY var) AS fields

UNION

SELECT CONCAT("INSERT INTO result \nSELECT DISTINCT test.id, ", GROUP_CONCAT(var SEPARATOR '.attribute, '),
".attribute FROM (SELECT DISTINCT id FROM test) AS test") 
FROM (SELECT DISTINCT var FROM test ORDER BY var) as vars

UNION

SELECT CONCAT("LEFT JOIN test AS ", var, " ON test.id = ", var, ".id AND ", var, ".var=", '"', var, '"' )
FROM (SELECT DISTINCT var FROM test ORDER BY var) as vars

UNION

SELECT ";" ;

/*Copy output to screen editor, delete '|' symbols and superfluous white spaces.
Then copy to MySQL prompt, run by pressing 'enter' key and view 'result'*/

我想在一个新表中使用MYSql重构数据,如下所示:
——这是一个非常糟糕的主意。这样做的最初原因是什么?我认为在MYSql中不太可能做到这一点-如果您已经在使用R,我建议您在R中进行重构。您可以尝试使用该包在数据帧上执行类似SQL的查询。原因很多在我的例子中,我需要收集个人信息(nestedid)并在家庭层面(level2)汇总这些信息。不过,我不想要一堆交叉标签,因为nestedid之间的特定关系很重要。。。