Sql server 通过ODBC的Unicode
我已经创建了一个到MS SQL Server的ODBC连接,它可以很好地处理正常数据 但是,当数据包含“HKSCS”字符时,它将变成Sql server 通过ODBC的Unicode,sql-server,r,unicode,Sql Server,R,Unicode,我已经创建了一个到MS SQL Server的ODBC连接,它可以很好地处理正常数据 但是,当数据包含“HKSCS”字符时,它将变成? 以下是表格结构(简化): ODBC设置: Odbc32.dll: 6.1.7601.23403 Driver: SQL Server Native Client 11.0 Option: Use ANSI quoted identifiers Use ANSI nulls, paddings and warnings Perform tr
?
以下是表格结构(简化):
ODBC设置:
Odbc32.dll: 6.1.7601.23403
Driver: SQL Server Native Client 11.0
Option:
Use ANSI quoted identifiers
Use ANSI nulls, paddings and warnings
Perform translation for character data
样本数据:
╔════════════════════════╦═════════════╗
║ TraditionalChineseName ║ EnglishName ║
╠════════════════════════╬═════════════╣
║ 邨 ║ estate ║
║ 衞生 ║ health ║
╚════════════════════════╩═════════════╝
SQL Server中的排序规则:SQL\u Latin1\u General\u CP1\u CI\u AS
结果在SSMS和.NET程序中都可以正常工作(通过SQL Server驱动程序连接),但在ODBC连接中不能正常工作
目标:我想将数据传递到R并绘制它。
但是,当数据存储在
data.frame
中时,那些HKSCS
字符将变成?
此外,如果我绘制它,所有非英语字符都无法正确显示 问题:
我试图获得结果并将其粘贴到
R studio
中,并将其形成数据.frame
,我发现它可以正确显示,但它以
格式存储字符。我只是想知道有没有可能将这些字符改成
经过几次尝试,我已经将SQL中的字符串转换为“unicode数字”,并在R中对其进行解析
简言之:
"邨" -> "37032“->”\u90A8“->“”
首先,在SQL中将汉字转换为基于十六进制的unicode数字:
;with targetTable as (
-- sim the table from database
select 1 as ID, N'邨' as TraditionalChineseName, 'estate' as EnglishName union
select 2 as ID, N'衞生' as TraditionalChineseName, 'health' as EnglishName
), ctx as (
select top (8000) n = row_number() over (order by Number)
FROM master.dbo.spt_values order by Number
), tc as (
-- convert character to unicode (dec) then binary (hex, 0x12345678)
-- get the last 4 digit
select f.ID, '\u' + right(convert(varchar(10), convert(varbinary(4), unicode(substring(f.TraditionalChineseName, x.n, 1))), 2), 4) as unicodeStr,
substring(f.TraditionalChineseName, x.n, 1) as charStr, x.n
from ctx x
inner join targetTable f with (nolock)
on x.n <= len(f.TraditionalChineseName)
)
select distinct s.ID, s.EnglishName,
(
select u1.unicodeStr as [text()]
from tc u1
where u1.ID = s.ID
order by u1.n
for xml path('')
) TraditionalChineseName
from targetTable s (nolock)
在R中,使用SqlQuery
检索结果集
library(RODBC)
myconn <- odbcConnect(dsn="ODBC", uid="...", pwd="...")
dat <- sqlQuery(channel = myconn, query = qry, stringsAsFactors = FALSE)
close(myconn)
使用此方法,字符可以显示在任何图表或表格中,并且不需要修改R中的区域设置
+----+-------------+------------------------+
| ID | EnglishName | TraditionalChineseName |
+----+-------------+------------------------+
| 1 | estate | \u90A8 |
| 2 | health | \u885E\u751F |
+----+-------------+------------------------+
library(RODBC)
myconn <- odbcConnect(dsn="ODBC", uid="...", pwd="...")
dat <- sqlQuery(channel = myconn, query = qry, stringsAsFactors = FALSE)
close(myconn)
convertUnicode <- function(x) {
parse(text = paste0("'", x, "'"))[[1]]
}
kvp <- data.frame(ID = dat$ID,
TraditionalChineseName = unlist(lapply(dat$TraditionalChineseName, convertUnicode)),
EnglishName = dat$EnglishName)