文章/答案/技术大牛

发布

社区首页 >问答首页 >RevoScaleR: rxPredict，参数数与变量数不匹配

问RevoScaleR: rxPredict，参数数与变量数不匹配
EN

Stack Overflow用户

提问于 2016-08-05 13:29:02

回答 3查看 1.4K关注 0票数 1

我使用微软的"数据科学端到端漫游“设置了R，他们的例子工作得很好。

该示例(纽约出租车数据)使用非范畴变量(即距离、出租车票价等)。预测分类变量(1或0表示是否支付小费)。

我正在尝试使用分类变量作为输入来预测类似的二进制输出，使用线性回归( rxLinMod函数)，并出现了一个错误。

错误说参数的数量与变量的数量不匹配，但是在我看来，number of variables实际上是每个因素(变量)中的级别数。

复制

在Server中创建一个称为示例的表：

USE [my_database];
SET ANSI_NULLS ON;
SET QUOTED_IDENTIFIER ON;
CREATE TABLE [dbo].[example](
    [Person] [nvarchar](max) NULL,
    [City] [nvarchar](max) NULL,
    [Bin] [integer] NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY];

把数据放进去：

insert into [dbo].[example] values ('John','London',0);
insert into [dbo].[example] values ('Paul','New York',0);
insert into [dbo].[example] values ('George','Liverpool',1);
insert into [dbo].[example] values ('Ringo','Paris',1);
insert into [dbo].[example] values ('John','Sydney',1);
insert into [dbo].[example] values ('Paul','Mexico City',1);
insert into [dbo].[example] values ('George','London',1);
insert into [dbo].[example] values ('Ringo','New York',1);
insert into [dbo].[example] values ('John','Liverpool',1);
insert into [dbo].[example] values ('Paul','Paris',0);
insert into [dbo].[example] values ('George','Sydney',0);
insert into [dbo].[example] values ('Ringo','Mexico City',0);

我还使用了一个SQL函数，它以表格式返回变量，因为这是Microsoft示例所要求的。创建函数formatAsTable

USE [my_database];
SET ANSI_NULLS ON;
SET QUOTED_IDENTIFIER ON;
CREATE FUNCTION [dbo].[formatAsTable] (
@City nvarchar(max)='',
@Person nvarchar(max)='')
RETURNS TABLE
AS
  RETURN
  (
  -- Add the SELECT statement with parameter references here
  SELECT
    @City AS City,
    @Person AS Person
  );

现在我们有了一个包含两个分类变量的表- Person和City。

我们开始预测吧。在R中，运行以下命令：

library(RevoScaleR)
# Set up the database connection
connStr <- "Driver=SQL Server;Server=<servername>;Database=<dbname>;Uid=<uid>;Pwd=<password>"
sqlShareDir <- paste("C:\\AllShare\\",Sys.getenv("USERNAME"),sep="")
sqlWait <- TRUE
sqlConsoleOutput <- FALSE
cc <- RxInSqlServer(connectionString = connStr, shareDir = sqlShareDir, 
                    wait = sqlWait, consoleOutput = sqlConsoleOutput)
rxSetComputeContext(cc)
# Set the SQL which gets our data base
sampleDataQuery <- "SELECT * from [dbo].[example] "
# Set up the data source
inDataSource <- RxSqlServerData(sqlQuery = sampleDataQuery, connectionString = connStr, 
                                colClasses = c(City = "factor",Bin="logical",Person="factor"
                                ),
                                rowsPerRead=500)

现在，建立线性回归模型。

isWonObj <- rxLinMod(Bin ~ City+Person,data = inDataSource)

查看模型对象：

isWonObj

请注意，它看起来如下：

...
Total independent variables: 11 (Including number dropped: 3)
...

Coefficients:
                           Bin
(Intercept)       6.666667e-01
City=London      -1.666667e-01
City=New York     4.450074e-16
City=Liverpool    3.333333e-01
City=Paris        4.720871e-16
City=Sydney      -1.666667e-01
City=Mexico City       Dropped
Person=John      -1.489756e-16
Person=Paul      -3.333333e-01
Person=George          Dropped
Person=Ringo           Dropped

它说有11个变量，这很好，因为这是因素的水平之和。

现在，当我试图根据Bin和Person预测City值时，会得到一个错误：

首先，我将我想要预测的City和Person格式化为一个表。然后，我预测使用这个作为输入。

sq<-"SELECT City, Person FROM [dbo].[formatAsTable]('London','George')"
pred<-RxSqlServerData(sqlQuery = sq,connectionString = connStr
                      , colClasses = c(City = "factor",Person="factor"))

如果您检查pred对象，它看起来与预期的一样：

> head(pred)
    City Person
1 London George

现在，当我试图预测时，我得到了一个错误。

scoredOutput <- RxSqlServerData(
  connectionString = connStr,
  table = "binaryOutput"
)

rxPredict(modelObject = isWonObj, data = pred, outData = scoredOutput, 
          predVarNames = "Score", type = "response", writeModelVars = FALSE, overwrite = TRUE,checkFactorLevels = FALSE)

错误说：

INTERNAL ERROR: In rxPredict, the number of parameters does not match the number of  variables: 3 vs. 11.

我可以看到11是从哪里来的，但是我只为预测查询提供了2个值--所以我看不出3是从哪里来的，也不知道为什么会出现问题。

任何帮助都是非常感谢的！

sql-server

revolution-r

回答 3

Stack Overflow用户

发布于 2016-08-05 15:52:52

答案似乎与R如何对待因素变量一致，然而，错误信息可以更清楚地区分因素、级别、变量和参数。

用于生成预测的参数似乎不能仅仅是没有级别的字符或因素。，它们需要具有与模型参数化中使用的相同变量的因素相同的级别。

因此，以下几行：

sq<-"SELECT City, Person FROM [dbo].[formatAsTable]('London','George')"
pred<-RxSqlServerData(sqlQuery = sq,connectionString = connStr
                      , colClasses = c(City = "factor",Person="factor"))

..。应将其替换为：

sq<-"SELECT City, Person FROM [dbo].[formatAsTable]('London','George')"

column_information<-list(
  City=list(type="factor",levels=c("London","New York","Liverpool","Paris","Sydney","Mexico City")),
  Person=list(type="factor",levels=c("John","Paul","George","Ringo")),
  Bin=list(type="logical")
)

pred<-RxSqlServerData(sqlQuery = sq,connectionString = connStr
                      ,colInfo=column_information,
                      stringsAsFactors=FALSE)

我已经看到了其他的例子，分类变量似乎不起作用，但也许水平是在那里反正。

我希望这能节省一些时间，就像我在这上面失去的一样！

编辑SLSvenR的响应

我想我对训练水平和训练水平保持不变的看法是肯定的。

fac <- c("one", "two", "three")
val = c(1, 2, 3)
trainingData <- data.frame(fac, val, stringsAsFactors = TRUE)
lmModel <- lm(val ~ fac, data = trainingData)
print(summary(lmModel))
predictionData = data.frame(fac = c("one", "three", "one", "one"))
lmPred <- predict(lmModel, newdata = predictionData)
lmPred
# The result is OK:
# 1 2 3 4
# 1 3 1 1

levels(predictionData$fac)<-levels(trainingData$fac)
# rxLinMod() and rxPredict() behave different:
rxModel <- rxLinMod(val ~ fac, data = trainingData)
rxPred <- rxPredict(rxModel, data = predictionData, writeModelVars = TRUE,checkFactorLevels = TRUE)
rxPred
# This result appears correct to me.

我不能评论这是好的还是坏的-然而，它似乎是一种方法来绕过这是应用水平的培训数据到测试集，我认为你可以做实时。

票数 0

Stack Overflow用户

发布于 2016-09-26 14:58:01

您确定指定colInfo解决了问题吗？在rxPredict中似乎存在一个普遍的问题，而不是将rxPredict与Server结合在一起：

# lm() and predict() don't have a problem with missing factor levels ("two" in this case):
fac <- c("one", "two", "three")
val = c(1, 2, 3)
trainingData <- data.frame(fac, val, stringsAsFactors = TRUE)
lmModel <- lm(val ~ fac, data = trainingData)
print(summary(lmModel))
predictionData = data.frame(fac = c("one", "three", "one", "one"))
lmPred <- predict(lmModel, newdata = predictionData)
lmPred
# The result is OK:
# 1 2 3 4
# 1 3 1 1

# rxLinMod() and rxPredict() behave different:
rxModel <- rxLinMod(val ~ fac, data = trainingData)
rxPred <- rxPredict(rxModel, data = predictionData, writeModelVars = TRUE)
# The following error is thrown:
# "INTERNAL ERROR: In rxPredict, the number of parameters does not match
# the number of  variables: 3 vs. 4."
# checkFactorLevels = FALSE doesn't help here, it actually seems to just
# check the order of factor levels.
levels(predictionData$fac) <- c("two", "three", "one")
rxPred <- rxPredict(rxModel, data = predictionData, writeModelVars = TRUE)
# The following error is thrown (twice):
# ERROR:order of factor levels in the data are inconsistent with
# the order of the model coefficients:fac = two versus fac = one. Set
# checkFactorLevels = FALSE to ignore.
rxPred <- rxPredict(rxModel, data = predictionData, checkFactorLevels = FALSE, writeModelVars = TRUE)
rxPred
#   val_Pred    fac
#1  1           two
#2  3           three
#3  1           two
#4  1           two
# This looks suspicious at best. While the prediction values are still
# correct if you look only at the order of the records in trainingData,
# the model variables are messed up.

在我的场景中，我有一个大约有10.000级的因素(只有在创建模型时才知道)，还有几个因素，每个因素大约有5个级别(在创建模型之前就知道了)。在按“正确”顺序调用rxPredict()时，似乎不可能为所有这些级别指定级别。

票数 0

Stack Overflow用户

发布于 2016-10-04 06:59:23

而只设置因子水平(.levels(predictionData$fac)<-levels(trainingData$fac ...)避免了错误--它还会导致模型使用错误的因子指标，如果writeModelVars设置为TRUE，则可以看出这一点。在colInfo中为我的因子设置几乎10.000级别的RxSqlServerData会导致应用程序挂起，尽管查询已正确传递给Server。我改变了策略，将数据加载到没有任何因素的数据框架中，然后将RxFactors应用于它：

rxSetComputeContext(“本地”)

sqlPredictQueryDS <- RxSqlServerData(connectionString = sqlConnString，sqlQuery = sqlQuery，stringsAsFactors = FALSE)

predictQueryDS = rxImport(sqlPredictQueryDS)

if ("Artikelnummer“%in% colname( predictQueryDS )){predictQueryDS <- rxFactors(predictQueryDS，factorInfo = list(Artikelnummer =list(level=allItems)}

除了设置所需的因子级别之外，RxFactors还重新排序了因子指数。我并不是说colInfo的解决方案是错误的，也许它只是不适用于“太多”级别的因素。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/38790530

复制

相似问题

问RevoScaleR: rxPredict，参数数与变量数不匹配
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问RevoScaleR: rxPredict，参数数与变量数不匹配EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问RevoScaleR: rxPredict，参数数与变量数不匹配
EN