Sql server SQL Server R服务-将数据输出到数据库表，性能_Sql Server_R_Microsoft R

Sql server SQL Server R服务-将数据输出到数据库表，性能

sql-server r

Sql server SQL Server R服务-将数据输出到数据库表，性能,sql-server,r,microsoft-r,Sql Server,R,Microsoft R,我注意到，当outFile参数设置为表时，rx*函数（例如，rxKmeans，rxDataStep）以逐行方式向SQL Server表插入数据。这显然是非常缓慢的，像批量插入之类的东西将是可取的。这能获得吗？如何获得目前，我正试图通过调用rxKmeans函数并指定outFile参数，将大约1400万行插入到一个表中，这大约需要20分钟我的代码示例： clustersLogInitialPD <- rxKmeans(formula = ~LogInitialPD

我注意到，当outFile参数设置为表时，rx*函数（例如，

rxKmeans

，

rxDataStep

）以逐行方式向SQL Server表插入数据。这显然是非常缓慢的，像批量插入之类的东西将是可取的。这能获得吗？如何获得

目前，我正试图通过调用

rxKmeans

函数并指定

outFile

参数，将大约1400万行插入到一个表中，这大约需要20分钟

我的代码示例：

clustersLogInitialPD <-  rxKmeans(formula = ~LogInitialPD
                     ,data = inDataSource
                     ,algorithm = "Lloyd" 
                     ,centers = start_c
                     ,maxIterations = 1
                     ,outFile = sqlLogPDClustersDS
                     ,outColName = "ClusterNo"
                     ,overwrite = TRUE
                     ,writeModelVars = TRUE
                     ,extraVarsToWrite = c("LoadsetId", "ExposureId")
                     ,reportProgress = 0
)

clustersLogInitialPD我也提出了这个问题
我遇到了这个问题，我知道有两个合理的解决方案
使用输出数据帧选项
/*从R将数据写回SQL的时间*/
设置统计时间
如果对象id（'tempdb..#tmp'）不为空
升降台#tmp
创建表#tmp（a FLOAT NOT NULL，b INT NOT NULL）；
声明@numRows INT=1000000
插入#tmp（a、b）
执行sys.sp\u执行\u外部\u脚本
@语言=N'R'
，@script=N'OutputDataSet您的输入数据集是SQL表还是数据帧/xdf文件？如果是前者，您是否尝试过更改为inSqlServer计算上下文？inDataSource是一个SQL查询，我将计算上下文设置为SQL Server：有人解决了这个问题吗？特别是如果inDataSource不是SQL查询或表，而是R代码中的data.frame？Bob，谢谢，您提供的注释和代码片段非常有用。我希望MS将来会对这个问题做些什么，这样就不再需要使用变通方法了。我也遇到了同样的问题，最后为SQLbulk insert命令编写了一个bulk insert包（很抱歉，由于公司政策，没有开源）。很多工作，甚至更多的问题（如编码、文本列分隔符、UTF-8不工作、setSPN疯狂…）。有没有办法投票给微软来推动一个像样的解决方案？
# Creates a bcp file format function needed to insert data into a table.
# This should be run one-off during code development to generate the format needed for a given task and saved in a the .R file that uses it
createBcpFormatFile <- function(formatFileName, tableName) {
  # Command to generate BCP file format for importing data into SQL Server

  # https://msdn.microsoft.com/en-us/library/ms162802.aspx
  # format creates a format file based on the option specified (-n, -c, -w, or -N) and the table or view delimiters. When bulk copying data, the bcp command can refer to a format file, which saves you from re-entering format information interactively. The format option requires the -f option; creating an XML format file, also requires the -x option. For more information, see Create a Format File (SQL Server). You must specify nul as the value (format nul).
  # -c Performs the operation using a character data type. This option does not prompt for each field; it uses char as the storage type, without prefixes and with \t (tab character) as the field separator and \r\n (newline character) as the row terminator. -c is not compatible with -w.
  # -x Used with the format and -f format_file options, generates an XML-based format file instead of the default non-XML format file. The -x does not work when importing or exporting data. It generates an error if used without both format and -f format_file.
  ## Bob: -x not used because we currently target bcp version 8 (default odbc driver compatibility that is installed everywhere)
  # -f If -f is used with the format option, the specified format_file is created for the specified table or view. To create an XML format file, also specify the -x option. For more information, see Create a Format File (SQL Server).
  # -t field_term Specifies the field terminator. The default is \t (tab character). Use this parameter to override the default field terminator. For more information, see Specify Field and Row Terminators (SQL Server).
  # -S server_name [\instance_name] Specifies the instance of SQL Server to which to connect. If no server is specified, the bcp utility connects to the default instance of SQL Server on the local computer. This option is required when a bcp command is run from a remote computer on the network or a local named instance. To connect to the default instance of SQL Server on a server, specify only server_name. To connect to a named instance of SQL Server, specify server_name\instance_name.
  # -U login_id Specifies the login ID used to connect to SQL Server.
  # -P -P password Specifies the password for the login ID. If this option is not used, the bcp command prompts for a password. If this option is used at the end of the command prompt without a password, bcp uses the default password (NULL).

  bcpPath <- .pathToBcpExe()
  parsedTableName <- parseName(tableName)
  # We can't use the -d option for BCP and instead need to fully qualify a table (database.schema.table)
  # -d database_name Specifies the database to connect to. By default, bcp.exe connects to the user’s default database. If -d database_name and a three part name (database_name.schema.table, passed as the first parameter to bcp.exe) is specified, an error will occur because you cannot specify the database name twice.If database_name begins with a hyphen (-) or a forward slash (/), do not add a space between -d and the database name.
  fullyQualifiedTableName <- paste0(parsedTableName["dbName"], ".", parsedTableName["schemaName"], ".", parsedTableName["tableName"])

  bcpOptions <- paste0("format nul -c -f ", formatFileName, " -t, ", .bcpConnectionOptions())

  commandToRun <- paste0(bcpPath, " ", fullyQualifiedTableName, " ", bcpOptions)
  result <- .bcpRunShellThrowErrors(commandToRun)
}


# Save a data frame (data) using file format (formatFilePath) to a table on the database (tableName)
bcpDataToTable <- function(data, formatFilePath, tableName) {
  numRows <- nrow(data)

  # write file to disk
  ptm <- proc.time()

  tmpFileName <- tempfile("bcp", tmpdir=getwd(), fileext=".csv")
  write.table(data, file=tmpFileName, quote=FALSE, row.names=FALSE, col.names=FALSE, sep=",")
  # Bob: note that one can make this significantly faster by switching over to use the readr package (readr::write_csv)
  #readr::write_csv(data, tmpFileName, col_names=FALSE)

  # bcp file to server time start
  mid <- proc.time()

  bcpPath <- .pathToBcpExe()
  parsedTableName <- parseName(tableName)
  # We can't use the -d option for BCP and instead need to fully qualify a table (database.schema.table)
  # -d database_name Specifies the database to connect to. By default, bcp.exe connects to the user’s default database. If -d database_name and a three part name (database_name.schema.table, passed as the first parameter to bcp.exe) is specified, an error will occur because you cannot specify the database name twice.If database_name begins with a hyphen (-) or a forward slash (/), do not add a space between -d and the database name.
  fullyQualifiedTableName <- paste0(parsedTableName["dbName"], ".", parsedTableName["schemaName"], ".", parsedTableName["tableName"])
  bcpOptions <- paste0(" in ", tmpFileName, " ", .bcpConnectionOptions(), " -f ", formatFilePath, " -h TABLOCK")

  commandToRun <- paste0(bcpPath, " ", fullyQualifiedTableName, " ", bcpOptions)

  result <- .bcpRunShellThrowErrors(commandToRun)

  cat(paste0("time to save dataset to disk (", numRows, " rows):\n"))
  print(mid - ptm)
  cat(paste0("overall time (", numRows, " rows):\n"))
  proc.time() - ptm

  unlink(tmpFileName)
}

# Examples:
# createBcpFormatFile("test2.fmt", "temp_bob")
# data <- data.frame(x=sample(1:40, 1000, replace=TRUE))
# bcpDataToTable(data, "test2.fmt", "test_bcp_1")

#####################
#                   #
# Private functions #
#                   #
#####################

# Path to bcp.exe. bcp.exe is currently from version 8 (SQL 2000); newer versions depend on newer SQL Server ODBC drivers and are harder to copy/paste distribute
.pathToBcpExe <- function() {
  paste0(<<<bcpFolder>>>, "/bcp.exe")
}

# Function to convert warnings from shell into errors always
.bcpRunShellThrowErrors <- function(commandToRun) {
  tryCatch({
    shell(commandToRun)
  }, warning=function(w) {
    conditionMessageWithoutPassword <- gsub(<<<connectionStringSqlPassword>>>, "*****", conditionMessage(w), fixed=TRUE) # Do not print SQL passwords in errors
    stop("Converted from warning: ", conditionMessageWithoutPassword)
  })
}

# The connection options needed to establish a connection to the client database
.bcpConnectionOptions <- function() {
  if (<<<useTrustedConnection>>>) {
    return(paste0(" -S ", <<<databaseServer>>>, " -T"))
  } else {
    return(paste0(" -S ", <<<databaseServer>>>, " -U ", <<<connectionStringLogin>>>," -P ", <<<connectionStringSqlPassword>>>))
  }
}

###################
# Other functions #
###################

# Mirrors SQL Server parseName function
parseName <- function(databaseObject) {
  splitName <- strsplit(databaseObject, '.', fixed=TRUE)[[1]]
    if (length(splitName)==3){
      dbName <- splitName[1]
      schemaName <- splitName[2]
      tableName <- splitName[3]
    } else if (length(splitName)==2){
      dbName <- <<<databaseServer>>>
      schemaName <- splitName[1]
      tableName <- splitName[2]
    } else if (length(splitName)==1){
      dbName <- <<<databaseName>>>
      schemaName <- ""
      tableName <- splitName[1]
    }

    return(c(tableName=tableName, schemaName=schemaName, dbName=dbName))
}