Php 数值分类的Varchar值

Php 数值分类的Varchar值,php,mysql,data-mining,classification,Php,Mysql,Data Mining,Classification,我有一个包含三个表的数据库: 做法-8个领域 患者-47个字段 恶化-11个领域 这些表中的大多数字段以varchar格式记录,其他字段包括整数、双精度和日期 我必须将这些数据转换为数字分类数据,以便统计学家可以使用它来推断数据中的任何模式。要实现这一点,我必须将varchar字段转换为表示字符串所属分类的整数,例如“严重性”,它具有以下可能的字符串值: 温和的 中庸的 严厉的 非常严重 patients表中的这个字段有一个可以显示的字符串值的有限列表,其他字段有无限可能出现字符串值,这些字符串

我有一个包含三个表的数据库:

  • 做法-8个领域
  • 患者-47个字段
  • 恶化-11个领域
  • 这些表中的大多数字段以varchar格式记录,其他字段包括整数、双精度和日期

    我必须将这些数据转换为数字分类数据,以便统计学家可以使用它来推断数据中的任何模式。要实现这一点,我必须将varchar字段转换为表示字符串所属分类的整数,例如“严重性”,它具有以下可能的字符串值:

  • 温和的
  • 中庸的
  • 严厉的
  • 非常严重
  • patients表中的这个字段有一个可以显示的字符串值的有限列表,其他字段有无限可能出现字符串值,这些字符串值在我的数据库遇到之前无法分类(除非我实现某种形式的智能方法)

    目前,我只是在尝试构造一种最佳方法,将3个表中每个表的所有条目的每个字段转换为数值。到目前为止,我脑海中的伪代码如下(它还不完整):

    鉴于目前系统中有超过100个实践、15000名患者和55000例病情恶化,是否有人能建议执行此过程的最佳方式,以使其有效。我没有必要在PHP中实现这一点,我更愿意这样做。任何关于如何构建这一点的建议都是非常好的,因为我不确定我的方法是否是最好的方法


    在接下来的两年里,随着数据库增加到总共有100000名患者,这个过程必须每月运行一次。

    也许有助于了解字段的使用情况。

    我已经设法构建了自己的解决方案,在合理的时间内运行。对于任何感兴趣的人,或任何可能遇到类似问题的人,以下是我使用的方法:

    通过调用PHP scriptName.PHP[数据库名称]作为cron作业运行的PHP脚本。该脚本为数据库中的每个表名(不是此过程的查找表)构建一个分类表。每个分类的设置都会创建一个新表,该表模仿基本表的格式,但将所有字段设置为允许空值。然后为基表中的每一行创建空行。然后,该过程通过逐个字段分析每个表字段并使用该字段的正确类更新每一行来进行

    我确信我可以优化这个函数来改善当前的复杂性,但是现在我将使用这个方法,直到脚本的运行时间超出可接受的范围

    脚本代码:
    include ("../application.php");
    
    profileDatabase("coco");
    classifyDatabase("coco");
    
    function profileDatabase($database) {
        mysql_select_db($database);
        $query = "SHOW TABLES";
        $result = db_query($query);
        $dbProfile = array();
        while ($obj = mysql_fetch_array($result)) {
            if (!preg_match("/_/", $obj[0])) {
                $dbProfile[$obj[0]] = profileTable($obj[0]);
            }
        }
        return $dbProfile;
    }
    
    function profileTable($table) {
        $tblProfile = array();
        $query = "DESCRIBE $table";
        $result = db_query($query);
        while ($obj = mysql_fetch_array($result)) {
            $type = substr($obj[1], 0, 7);
    //echo $type;
            if (preg_match("/varchar/", $obj[1]) && (!preg_match("/Id/", $obj[0]) && !preg_match("/date/", $obj[0]) && !preg_match("/dob/", $obj[0]))) {
                $x = createLookup($obj[0], $table);
                $arr = array($obj[0], $x);
                $tblProfile[] = $arr;
            }
        }
        return $tblProfile;
    }
    
    function getDistinctValues($field, $table) {
        $distinct = array();
        $query = "SELECT DISTINCT $field as 'value', COUNT($field) as 'no' FROM $table GROUP BY $field ORDER BY no DESC";
        $result = db_query($query);
        while ($obj = mysql_fetch_array($result)) {
            $distinct[] = $obj;
        }
        return $distinct;
    }
    
    function createLookup($field, $table) {
        $query = "CREATE TABLE IF NOT EXISTS `" . $table . "_" . $field . "`
    (
    `id` int(5) NOT NULL auto_increment,
    `value` varchar(255) NOT NULL,
    `no` int(5) NOT NULL,
    `map1` int(3) NOT NULL,
    `map2` int(3) NOT NULL,
    PRIMARY KEY  (`id`)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1";
        db_query($query);
        $distinct = getDistinctValues($field, $table);
        $count = count($distinct);
        foreach ($distinct as $val) {
            $val['value'] = addslashes($val['value']);
            $rs = db_query("SELECT id FROM " . $table . "_" . $field . " WHERE value = '" . $val['value'] . "' LIMIT 1");
            if (mysql_num_rows($rs) == 0) {
                $sql = "INSERT INTO " . $table . "_" . $field . " (value,no) VALUES ('" . $val['value'] . "', " . $val['no'] . ")";
            } else {
                $sql = "UPDATE " . $table . "_" . $field . " (value,no) VALUES ('" . $val['value'] . "', " . $val['no'] . ")";
            }
            db_query($sql);
        }
        return $count;
    }
    
    function classifyDatabase($database) {
        mysql_select_db($database);
        $query = "SHOW TABLES";
        $result = db_query($query);
        $dbProfile = array();
        while ($obj = mysql_fetch_array($result)) {
            if (!preg_match("/_/", $obj[0])) {
                classifyTable($obj[0]);
                //echo "Classfied $obj[0]\n";
            }
        }
    }
    
    function classifyTable($table) {
        $query = "SHOW TABLES";
        $result = db_query($query);
        $dbProfile = array();
        $setup = true;
        while ($obj = mysql_fetch_array($result)) {
            if ($obj[0] == "classify_" . $table)
                $setup = false;
        }
        if ($setup) {
            setupClassifyTable($table);
            //echo "Setup $table\n";
        }
    
        $query = "DESCRIBE $table";
        $result = db_query($query);
        while ($obj = mysql_fetch_array($result)) {
            if (preg_match("/varchar/", $obj[1]) && (!preg_match("/Id/", $obj[0]) && !preg_match("/date/", $obj[0]) && !preg_match("/dob/", $obj[0]))) {
                $rs = db_query("
            SELECT t.entryId, t.$obj[0], COALESCE(tc.map1,99) as 'group' FROM $table t 
            LEFT JOIN " . $table . "_$obj[0] tc ON t.$obj[0] = tc.value 
            ORDER BY tc.map1 ASC");
                while ($obj2 = mysql_fetch_object($rs)) {
                    $sql = "UPDATE classify_$table SET $obj[0] = $obj2->group WHERE entryId = $obj2->entryId";
                    db_query($sql);
                }
            } else {
                if ($obj[0] != "entryId") {
                    $rs = db_query("
            SELECT t.entryId, t.$obj[0] as 'value' FROM $table t");
                    while ($obj2 = mysql_fetch_object($rs)) {
                        $sql = "UPDATE classify_$table SET $obj[0] = '" . addslashes($obj2->value) . "' WHERE entryId = $obj2->entryId";
                        db_query($sql);
                    }
                }
            }
        }
    }
    
    function setupClassifyTable($table) {
        $tblProfile = array();
        $query = "DESCRIBE $table";
        $result = db_query($query);
        $create = "CREATE TABLE IF NOT EXISTS `classify_$table` (";
        while ($obj = mysql_fetch_array($result)) {
            if (preg_match("/varchar/", $obj[1]) && (!preg_match("/Id/", $obj[0]) && !preg_match("/date/", $obj[0]) && !preg_match("/dob/", $obj[0]))) {
                //echo $obj[1]. " matches<br/>";
                $create .= "$obj[0] int(3) NULL,";
            } else {
                $create .= "$obj[0] $obj[1] NULL,";
            }
        }
        $create .= "PRIMARY KEY(`entryId`)) ENGINE=MyISAM DEFAULT CHARSET=latin1";
        db_query($create);
        $result = mysql_query("SELECT entryId FROM $table");
        while ($obj = mysql_fetch_object($result)) {
            db_query("INSERT IGNORE INTO classify_$table (entryId) VALUES ($obj->entryId)");
        }
    }
    
    ?>
    
    include(“../application.php”);
    档案数据库(“coco”);
    分类数据库(“coco”);
    函数配置文件数据库($database){
    mysql\u select\u db($database);
    $query=“显示表格”;
    $result=db\u查询($query);
    $dbProfile=array();
    而($obj=mysql\u fetch\u数组($result)){
    如果(!preg_match(“//”,$obj[0])){
    $dbProfile[$obj[0]]=profileTable($obj[0]);
    }
    }
    返回$dbProfile;
    }
    函数配置文件表($table){
    $tblProfile=array();
    $query=“descripe$table”;
    $result=db\u查询($query);
    而($obj=mysql\u fetch\u数组($result)){
    $type=substr($obj[1],0,7);
    //echo$型;
    如果(preg_match(“/varchar/”,$obj[1])&(!preg_match(/Id/”,$obj[0])&&!preg_match(“/date/”,$obj[0])&!preg_match(“/dob/”,$obj[0])){
    $x=createLookup($obj[0],$table);
    $arr=数组($obj[0],$x);
    $tblProfile[]=$arr;
    }
    }
    返回$tblProfile;
    }
    函数getDistinctValues($field,$table){
    $distinct=array();
    $query=“选择不同的$field作为‘值’,将$table GROUP中的$field按$field ORDER按no DESC计数为‘否’”;
    $result=db\u查询($query);
    而($obj=mysql\u fetch\u数组($result)){
    $distinct[]=$obj;
    }
    返回$distinct;
    }
    函数createLookup($field,$table){
    $query=“创建不存在的表格”`“$TABLE.”字段`
    (
    `id`int(5)非空自动增量,
    `值`varchar(255)不为空,
    `no`int(5)不为空,
    `map1`int(3)不为空,
    `map2`int(3)不为空,
    主键(`id`)
    )引擎=MyISAM默认字符集=1”;
    数据库查询($query);
    $distinct=GetDistinctValue($field,$table);
    $count=计数($distinct);
    foreach($val){
    $val['value']=addslashes($val['value']);
    $rs=db_查询(“从“$table.”中选择id.“$field.”其中value=''.$val['value'].“'LIMIT 1”);
    if(mysql_num_rows($rs)==0){
    $sql=“插入到“$table.”“$field.”(值,否)值(“$val['value']”,“$val['no']”);
    }否则{
    $sql=“更新“$table.”“$field.”(值,否)值(“$val['value']..”,“$val['no'.]”);
    }
    数据库查询($sql);
    }
    返回$count;
    }
    函数classifyDatabase($database){
    mysql\u select\u db($database);
    $query=“显示表格”;
    $result=db\u查询($query);
    $dbProfile=array();
    而($obj=mysql\u fetch\u数组($result)){
    如果(!preg_match(“//”,$obj[0])){
    分类表($obj[0]);
    //echo“已分类的$obj[0]\n”;
    }
    }
    }
    函数classifyTable($table){
    $query=“显示表格”;
    $result=db\u查询($query);
    $dbProfile=array();
    $setup=true;
    而($obj=mysql\u fetch\u数组($result)){
    如果($obj[0]==“分类”.$table)
    $setup=false;
    }
    如果($setup){
    setupClassifyTable($table);
    //回显“设置$table\n”;
    }
    $query=“descripe$table”;
    $result=db\u查询($query);
    而($obj=mysql\u fetch\u数组($result)){
    if(preg_match(“/varchar/”,$obj[1])&(!preg_match(“/Id/”)
    
    include ("../application.php");
    
    profileDatabase("coco");
    classifyDatabase("coco");
    
    function profileDatabase($database) {
        mysql_select_db($database);
        $query = "SHOW TABLES";
        $result = db_query($query);
        $dbProfile = array();
        while ($obj = mysql_fetch_array($result)) {
            if (!preg_match("/_/", $obj[0])) {
                $dbProfile[$obj[0]] = profileTable($obj[0]);
            }
        }
        return $dbProfile;
    }
    
    function profileTable($table) {
        $tblProfile = array();
        $query = "DESCRIBE $table";
        $result = db_query($query);
        while ($obj = mysql_fetch_array($result)) {
            $type = substr($obj[1], 0, 7);
    //echo $type;
            if (preg_match("/varchar/", $obj[1]) && (!preg_match("/Id/", $obj[0]) && !preg_match("/date/", $obj[0]) && !preg_match("/dob/", $obj[0]))) {
                $x = createLookup($obj[0], $table);
                $arr = array($obj[0], $x);
                $tblProfile[] = $arr;
            }
        }
        return $tblProfile;
    }
    
    function getDistinctValues($field, $table) {
        $distinct = array();
        $query = "SELECT DISTINCT $field as 'value', COUNT($field) as 'no' FROM $table GROUP BY $field ORDER BY no DESC";
        $result = db_query($query);
        while ($obj = mysql_fetch_array($result)) {
            $distinct[] = $obj;
        }
        return $distinct;
    }
    
    function createLookup($field, $table) {
        $query = "CREATE TABLE IF NOT EXISTS `" . $table . "_" . $field . "`
    (
    `id` int(5) NOT NULL auto_increment,
    `value` varchar(255) NOT NULL,
    `no` int(5) NOT NULL,
    `map1` int(3) NOT NULL,
    `map2` int(3) NOT NULL,
    PRIMARY KEY  (`id`)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1";
        db_query($query);
        $distinct = getDistinctValues($field, $table);
        $count = count($distinct);
        foreach ($distinct as $val) {
            $val['value'] = addslashes($val['value']);
            $rs = db_query("SELECT id FROM " . $table . "_" . $field . " WHERE value = '" . $val['value'] . "' LIMIT 1");
            if (mysql_num_rows($rs) == 0) {
                $sql = "INSERT INTO " . $table . "_" . $field . " (value,no) VALUES ('" . $val['value'] . "', " . $val['no'] . ")";
            } else {
                $sql = "UPDATE " . $table . "_" . $field . " (value,no) VALUES ('" . $val['value'] . "', " . $val['no'] . ")";
            }
            db_query($sql);
        }
        return $count;
    }
    
    function classifyDatabase($database) {
        mysql_select_db($database);
        $query = "SHOW TABLES";
        $result = db_query($query);
        $dbProfile = array();
        while ($obj = mysql_fetch_array($result)) {
            if (!preg_match("/_/", $obj[0])) {
                classifyTable($obj[0]);
                //echo "Classfied $obj[0]\n";
            }
        }
    }
    
    function classifyTable($table) {
        $query = "SHOW TABLES";
        $result = db_query($query);
        $dbProfile = array();
        $setup = true;
        while ($obj = mysql_fetch_array($result)) {
            if ($obj[0] == "classify_" . $table)
                $setup = false;
        }
        if ($setup) {
            setupClassifyTable($table);
            //echo "Setup $table\n";
        }
    
        $query = "DESCRIBE $table";
        $result = db_query($query);
        while ($obj = mysql_fetch_array($result)) {
            if (preg_match("/varchar/", $obj[1]) && (!preg_match("/Id/", $obj[0]) && !preg_match("/date/", $obj[0]) && !preg_match("/dob/", $obj[0]))) {
                $rs = db_query("
            SELECT t.entryId, t.$obj[0], COALESCE(tc.map1,99) as 'group' FROM $table t 
            LEFT JOIN " . $table . "_$obj[0] tc ON t.$obj[0] = tc.value 
            ORDER BY tc.map1 ASC");
                while ($obj2 = mysql_fetch_object($rs)) {
                    $sql = "UPDATE classify_$table SET $obj[0] = $obj2->group WHERE entryId = $obj2->entryId";
                    db_query($sql);
                }
            } else {
                if ($obj[0] != "entryId") {
                    $rs = db_query("
            SELECT t.entryId, t.$obj[0] as 'value' FROM $table t");
                    while ($obj2 = mysql_fetch_object($rs)) {
                        $sql = "UPDATE classify_$table SET $obj[0] = '" . addslashes($obj2->value) . "' WHERE entryId = $obj2->entryId";
                        db_query($sql);
                    }
                }
            }
        }
    }
    
    function setupClassifyTable($table) {
        $tblProfile = array();
        $query = "DESCRIBE $table";
        $result = db_query($query);
        $create = "CREATE TABLE IF NOT EXISTS `classify_$table` (";
        while ($obj = mysql_fetch_array($result)) {
            if (preg_match("/varchar/", $obj[1]) && (!preg_match("/Id/", $obj[0]) && !preg_match("/date/", $obj[0]) && !preg_match("/dob/", $obj[0]))) {
                //echo $obj[1]. " matches<br/>";
                $create .= "$obj[0] int(3) NULL,";
            } else {
                $create .= "$obj[0] $obj[1] NULL,";
            }
        }
        $create .= "PRIMARY KEY(`entryId`)) ENGINE=MyISAM DEFAULT CHARSET=latin1";
        db_query($create);
        $result = mysql_query("SELECT entryId FROM $table");
        while ($obj = mysql_fetch_object($result)) {
            db_query("INSERT IGNORE INTO classify_$table (entryId) VALUES ($obj->entryId)");
        }
    }
    
    ?>