我有一个客户数据库,我想编写一个查询,根据我的客户的特征向我提供客户的分层样本。我正在使用一个调查猴子使用的公式(代码中的URL),该公式根据z得分,误差幅度和百分比值给我样本数量。我还使用R中的odbc包将此数据直接发送到R。
stratified_sample <- function() {
library(DBI)
library(odbc)
con <- dbConnect(odbc(),Driver = "ODBC_DRIVER",Server = "SERVER",Database = "DATABASE",Authentication = "Authentication")
dbGetQuery(con,"--Parameters for sample size calc
DeclARE @ZScore DECIMAL(4,3) = 1.96,@MOE DECIMAL(4,3) = 0.05,@PctVal DECIMAL(4,3) = 0.5;
--All customers meeting criteria
IF OBJECT_ID('tempdb..#CustPop') IS NOT NULL
DROP TABLE #CustPop;
SELECT DISTINCT
c.CustomerIDNumber,c.Gender,c.Generation,c.CharactertisticA,c.CharactertisticB,c.CharactertisticC
INTO #CustPop
FROM [DATABASE].TableC C")
dbGetQuery(con,"--Find sample sizes by demographic combination
IF OBJECT_ID('tempdb..#SampleSizes') IS NOT NULL
DROP TABLE #SampleSizes;
SELECT c.Gender,c.CharactertisticC,COUNT(DISTINCT c.CustomerIDNumber) AS DistinctCust,--Formula courtesy of SurveyMonkey: https://www.surveymonkey.com/mp/sample-size-calculator/
ROUND(
((SQUARE(@ZScore) * (@PctVal * (1 - @PctVal))) / SQUARE(@MOE))
/ (1
+ ((SQUARE(@ZScore) * (@PctVal * (1 - @PctVal)))
/ (SQUARE(@MOE) * COUNT(DISTINCT c.CustomerIDNumber))
)
),0
) AS SampleSize
INTO #SampleSizes
FROM #CustPop c
GROUP BY c.Gender,c.CharactertisticC;")
dbGetQuery(con,"--Assign row numbers randomly for each customer
IF OBJECT_ID('tempdb..#CustRowNum') IS NOT NULL
DROP TABLE #CustRowNum;
SELECT c.CustomerIDNumber,ROW_NUMber() OVER (PARTITION BY c.Gender,c.CharactertisticC
ORDER BY NEWID()
) AS RowNum,c.CharactertisticC
INTO #CustRowNum
FROM #CustPop c;")
dbGetQuery(con,"--Find only customers with RowNum <= SampleSize
IF OBJECT_ID('tempdb..#CustSample') IS NOT NULL
DROP TABLE #CustSample;
SELECT DISTINCT
c.CustomerIDNumber
INTO #CustSample
FROM #CustRowNum c
JOIN #SampleSizes s
ON s.CharactertisticA= c.CharactertisticA
AND s.Gender = c.Gender
AND s.Generation = c.Generation
AND s.CharactertisticB= c.CharactertisticB
AND s.CharactertisticC = c.CharactertisticC
WHERE c.RowNum <= s.SampleSize;")
get_query_data <- dbGetQuery(con,"select * from #CustSample")
return(get_query_data)
}
问题是,当我尝试ORDER BY NEWID()
时,我的函数在倒数第二个子句上失败,因为它不是公认的内置函数名。也许我需要避免在order by子句中使用NEWID。我以前使用过RODBC包,但从未遇到过此问题。还有其他人遇到过这个问题吗?
谢谢