Who’s Running Your R Code In Your SQL 2016 Procedure?

So you’ve got an instance of SQL 2016 running and you want to play with running R code inside SQL stored procedures. You’ve enabled external scripts and given permissions to some users to execute those scripts. You open Management Studio and execute sp_execute_external_script with some simple R code.

Do you know what account that code is running under?

WhoamI

You’re not running as yourself, even though that’s the account you signed into SSMS as.

You’re not running under the server account that SQL or SQL Launchpad run under.

You’re running as a new account created when you installed SQL R Service In Database for the purpose of running R code.

imageOn the server SQL creates a new Windows group, SQLRUserGroup and 20 new Windows accounts, MSSQLServer01 through 20. If you’re running a named instance the names will be SQLRUserGroup<instance> and <instance>01 through 20, respectively. If you need more than 20 accounts you can add more on the Advanced tab for the properties of the SQL Launchpad service. You can also specify if you need the passwords changed based on a password policy. The passwords themselves are generated by keys inside the RegisterRext.exe.config file. If you connect to R Server from a client you’ll also need to grant database access to the Windows group, it doesn’t have and database rights by default.

 

There’s a few ways to see which account is running. You can query the R environment, like in the following example.

image

You can also use the built in dynamic management views. Sys.dm_external_script_requests will show the accounts that any currently running R code are using.

image

This solves the issue in my post last week, about not being able to access a file stored on a network share using R code inside a SQL 2016 stored procedure. In my case I just had to extend the read permissions on the share. But it’s good to know when you run any external scripts.

References

Modify the User Account Pool for SQL Server R Services – MSDN
https://msdn.microsoft.com/en-us/library/mt591988.aspx

Monitoring Reports for R on SQL Server 2016 – Ginger Grant
http://www.desertislesql.com/wordpress1/?p=1549

Permissions question in R and SQL 2016
https://www.reddit.com/r/SQLServer/comments/54lfz7/permissions_question_in_r_sql_server_2016/

 

Can’t Read text file on Network Share Using R Inside Stored Procedure

The Problem

I can’t seem to read from a .txt file when I use read.csv inside a SQL 2016 stored procedure using R if that .txt file is on a network shared directory.

The Setup

I am running a personal domain in my home lab. The domain controller is running on an old pc I’ve repurposed to run Windows 2012 R2. On that server I run virtual machines using Hyper-V, one of which is a domain server with SQL 2016 Developer Edition installed. My desktop pc and laptop run Windows 10 and are members of my home domain. Both also have SQL 2016 Developer as well as R 3.3.1, RStudio 1.0.44 and R Tools for Visual Studio 0.4. In all cases SQL is configured to run external scripts.

On my domain controller I’m running a weather station program that writes to a comma delimited log file. There’s a separate file for each month. Each file has a row added every 10 minutes, and each row has 27 columns . There’s a separate text file that contains header information for the logs delimited by the pipe (|) character.

The Scenario

What I’m trying to do should be pretty easy. I want to use R code inside a stored procedure to read all the log files created by my weather program and store the results in a database. I also want to read the current monthly file on a regular basis, at least once a day. Once the data is in the database I’ll create some mobile reports with data and charts I can read on my phone. I’ll also be able to use my own data to play with local weather predicting. I thought it would make a pretty cool demo for my R sessions.

However when I run my stored procedure to read the logs I receive an error that there is no such file if I map the share as a drive, or I’m using an invalid parameter if I try to access the share directly.

What is even more confusing is that the same R code works if I run it from any other R interface; R console, Microsoft R Open console, RStudio, or RTVS.Since I can read the files in other ways from my client I’m ruling out permission or firewall issues.

The Code

The following code examples work in RStudio, R console, MS R Open console, and RTVS when run on the client. This connects to the share…

file_name <- paste(as.character(Sys.Date(), format = "%b%y"),"log.txt", sep="")
weather <- read.csv(file.path("//PERTELLVM01/Cumulus/Data", file_name),stringsAsFactors = FALSE, header=FALSE)
weathernames <- read.csv(file.path("//PERTELLVM01/Cumulus", "monthlyfileheader.txt"),sep = "|", header = TRUE)
names(weather) <- names(weathernames)
weather[,1] <- as.POSIXct(weather[,1], "%d/%m/%y", tz = "")
head(weather[, c(1:6)])

…and this connects when the share is mapped…

file_name <- paste(as.character(Sys.Date(), format = "%b%y"),"log.txt", sep="")
weather <- read.csv(file.path("w:/Data", file_name),
stringsAsFactors = FALSE, header=FALSE)
weathernames <- read.csv(file.path("w:", "monthlyfileheader.txt"),
sep = "|", header = TRUE)
names(weather) <- names(weathernames)
weather[,1] <- as.POSIXct(weather[,1], "%d/%m/%y", tz = "")
head(weather[, c(1:6)])

…and both return the correct data.

  Date..dd.mm.yy.  Time Temperature Humidity Dew.point Wind.speed
1      2016-11-01 00:00        59.7       67      48.8        9.2
2      2016-11-01 00:10        59.7       67      48.8        4.5
3      2016-11-01 00:20        59.9       68      49.3        5.4
4      2016-11-01 00:30        60.1       68      49.5        1.6
5      2016-11-01 00:40        60.3       69      50.1        3.8
6      2016-11-01 00:50        60.6       69      50.4        5.4

Now here’s the SQL statement running the R code when the share is mapped.

DECLARE @RScript NVARCHAR(MAX) = N'
file_name <- paste(as.character(Sys.Date(), format = "%b%y"),"log.txt", sep="")
weather <- read.csv(file.path("w:/Data", file_name),
stringsAsFactors = FALSE, header=FALSE)
weathernames <- read.csv(file.path("w:", "monthlyfileheader.txt"),
sep = "|", header = TRUE)
names(weather) <- names(weathernames)
weather[,1] <- as.POSIXct(weather[,1], "%d/%m/%y", tz = "")
OutputDataSet <- head(weather[, c(1:6)])'

EXECUTE sp_execute_external_script
@language = N'R',
@script = @RScript
WITH RESULT SETS (([Date] DATE, [Time] TIME(0), Temperature NUMERIC(4, 1), Humidity INT,
DewPoint NUMERIC(5, 1), WindSpeed NUMERIC(3, 1)));

image

Slightly different message when trying to access the share directly.

image

The Wrap Up

Now there are other ways I can import the data; copying the files I need to my client before running my R stored procedures or using SSIS packages to do all the importing just to name two. But I should be able to access the file directly with my procedure.

I’m curious if anyone else has run into this issue, and how they’ve solved it. Leave a comment if you can offer any insights. I’ve copied some sample files which you can access at https://github.com/jayape/Sample-Data.

Slides and Code from Chi Suburban SQL Group – R You Ready For SQL 2016?

Hi everyone,

Thanks to all who attended my session on R and SQL 2016 last night at the Chicago Suburban SQL User Group. And thanks to Andy and Lowry for inviting me. I had a great time, meeting everyone and demoing my favorite new feature in SQL 2016, using R inside SQL.

I’ve sent the slide deck and code I used in my demos to Andy for distribution to the group. I’ve also pushed them out to a github repository, at https://github.com/jayape/R-and-SQL. Unfortunately I can’t copy the database. But the one I use is just a copy of a ReportServer database from a SQL instance running SSRS. The code will work by just changing the name of the database. If you don’t have access to a ReportServer database leave me a comment and I’ll create a small one.

Now that my vacation and speaking obligations are over I’m going to start writing again. My next post will be covering what changed between the pre and post releases of SQL 2016, at least as far as R Services goes. After that I’ll cover using R Services without the SQL part.

SQL 2016 Release Day

Just a real quick post while I’m waiting for SQL 2016 RTM to show up as a download on MSDN.

You can install SQL 2016 in an Azure VM. There’s templates there to get you up and running on a default instance without waiting. Most all features are installed by default, with the exception of R Services, Polybase, Reporting Services Share Point add-ins, and Distributed Relay components.

Speaking at #SQLSaturdayIowa in June

#SQLSaturdayIowa is just about a month away, on June 11th. This year I’m honored to have been chosen to present my session, R You Ready For SQL 2016? You can learn more about #SQLSaturdayIowa on the website.

SQL Saturdays are free one day events held in different cities around the world, usually more than 1 per week. When you sign up you’ll see many of the same sessions and speakers you’d see at the bigger SQL PASS Summit. There’s content for every level, beginner to advanced, on development or admin topics. And now is a great time to attend, with SQL 2016 being released on June 1st. This will be a good way to see many of the new features in the new release. Check out and mark the sessions you’re interested in.

SQL Saturday events also include pre-conference sessions, full day events where you can get extra training on a variety of topics. And while these sessions cost $125, that’s a cheap price for an all day session with an expert. Iowa City has 5 pre-cons this year:

  • Ben Miller has 2 sessions; Powershell for the DBA in 0 –60 and Powershell for the DBA in 60 – 120. If you have to administer a number of SQL instances you’ve probably been looking for a way of simplifying everyday tasks. Powershell is probably what you need.
  • Kathi Kellenberger is hosting a Query Tuning Workshop. Query tuning is something a lot of us do on a day to day basis. Wouldn’t it be nice to learn how to do it effectively?
  • David Klee’s topic is SQL Server Infrastructure Tuning for Availability and Performance. The hardware SQL runs on is a magic box, something I for one don’t know much about. If you want to find out about the SAN your database is running on, or how to set up SQL in a virtual environment, this session is for you.
  • Bill Pearson has a session on Getting Started with Power BI. Data analytics and visualizations are very hot topics now. This is a great session to find out about Power BI, both the service and desktop versions, and how to gather data from different sources to create some neat looking dashboards.

Speaking of new features, my session is on one of them, running R inside of SQL. This will be the third SQL Saturday I’ve done this session at; I also presented in Chicago and Madison. I originally put the session together from a SQL standpoint, showing how you can incorporate R inside stored procedures. But I received a lot of great feedback from people who wanted to learn more about R itself. So for this session I’m including some R basics, just to show some of what you can do.

I’m looking forward to meeting everyone in Iowa City on June 11th. Stop in at my session and say hi!

Updating

Hello, all.

No, I didn’t stop blogging. I’ve run into an issue with my site’s host that doesn’t allow me to post remotely. And I’m kind of waiting until SQL 2016 comes out on June 1st. But until then, here’s what’s new.

I presented a session on R and SQL 2016 at SQL Saturdays in Chicago and Madison, and I’ll be repeating it at SQL Saturday in Iowa City on June 11th (SQLSaturday Iowa City). I’ll have a separate post on that in the next week or so. Iowa is where I attended my first SQL Saturday event so this will be kind of like a homecoming for me. I’m really excited to be returning. Go ahead and sign up now, there’s a great list of speakers. And checkout the pre-cons, a full day of SQL training at a VERY reasonable cost.

A little bit on R in SQL 2016. When I presented in Chicago I used CTP3. In Madison I upgraded to RC2. I’ve skipped RC3, mainly because the RTM release is just a few weeks away. But RC2 had some big improvements, at least from the installation side. It included the installation of Microsoft R Open and Microsoft R Server. So no more hunting down the correct version of each. I’m looking forward to using all of the new R features.

Until next time,
John

Error When Installing Microsoft R Open 3.2.3 and MKL

This post covers a problem I ran into when installing and running MRO 3.2.3 and MKL, and a workaround if you have the same issue.

Over the weekend I was setting up a new workstation to use in an upcoming session. The OS is a 64 bit virtual machine running Windows 8.1, hosted on my laptop.The workstation has no software installed and it has all patches applied as of January 30th of this year.

The first program I installed was the new Microsoft R Open, version 3.2.3. I downloaded my copy from the Microsoft R Application Network. If you’re not familiar with it yet, MRAN is the new portal for Microsoft’s R software. After installing R Open I downloaded and installed the Math Kernel Library. You need the MKL if you want to use multi-threading in your R scripts. Finally I installed R Studio.

Opening R Studio for the first time gave me the following error.

“The program can’t start because MSVCR120.dll is missing from your computer. Try reinstalling the program to fix this problem.”

The file referenced is the Visual C++ Redistributable library and definitely should not have been missing.

I restored my virtual machine from an earlier checkpoint (I love checkpoints in Hyper-V!) and tried again. This time I ran MRO after installing it but before I installed the MKL. No problems, so I installed MKL and reran MRO. And the error came up again. So it’s definitely something with this version of the MKL.

Fortunately there’s an easy fix. There is an open user forum you can see at Google groups. This issue is near the top. Scroll to near the bottom of the thread and you’ll see that the solution is to reinstall the file. You can get the package from https://www.microsoft.com/en-us/download/details.aspx?id=40784.

Configuring RLauncher.config for SQL Server Launchpad

I’ve been playing with R in SQL 2016 CTP3 lately to find out how it works. One of the components is the RLauncher.config file that is used by the SQL Server Launchpad service. This is the service that runs the Advanced Analytics Extensions.

I wrote previously on changing the working directory for R session here. Today I’m going to look at the rest of the options.

  • MEMORY_LIMIT_PERCENT – this is the maximum amount of memory available to all R sessions on the server. The default is 20, with the range being 0 to 100. It’s recommended to not set it to 100% to leave memory for other processes.
  • TRACE_LEVEL – specifies how verbose the logs for the Launchpad are. The default is 1 (for errors), other values are 2 (warnings) and 3 (information).
  • USER_POOL_SIZE – Controls the maximum number of user pool accounts that can launch R sessions. This is a read only setting and the default is 10. Re-run RegisterRExt.exe to change the value.But I’ve seen on other MSDN pages that the default value is 20. And I don’t see any argument in RegisterRExe to change the value. There are 2 commands, /install and /uninstall, and 3 arguments: instance, user, and password. But this is beta software so this will most likely change.

There are other settings, but MSDN says to leave them alone.

  • JOB_CLEANUP_ON_EXIT – For debugging, cleans up when sessions are completed. The default value is 1 (enabled), other value is 0 (disabled).
  • LOG_DIRECTORY – where the logs are stored, by default <instance path>\MSSQL\Log.
  • MPI_HOME – directory for Microsoft MPI binaries. The default is C:\Program Files\Microsoft MPI
  • RHOME – directory for RRO files. Default is C:\Program Files\RRO\RRO-3.2.2-for-RRE-7.5.0\R-3.2.2.
  • WORKING_DIRECTORY – root directory of all temporary working directories for R sessions. The default is <instance path>\MSSQL\ExtensibilityData.

Remember that this is still just CTP bits. Anything and everything can change before SQL 2016 goes live.

Microsoft R Open and RTVS

A few little tidbits you may have missed from last week’s Microsoft R Server announcement.

  • There will be an R Tools for Visual Studio plug-in, similar to the Python Tools for Visual Studio. It will contain Intellisense, syntax-aware editing, debugging, and a command-line REPL. This add-in won’t give you anything you don’t already have in R Studio but you may be more comfortable here if you already do a lot of development in Visual Studio. There’s no release date yet but it’s probably 2 – 3 months out. There’s a preview on YouTube that you can watch here.
  • Revolution R Open is now Microsoft R Open.This is what you’ll need to install server-side to run R inside of SQL 2016. It will be updated on January 19th to include R 3.2.3 (the latest version of R).
  • You can get Microsoft R Open at the MRAN site (Microsoft R Applications Portal).It’s similar to CRAN (Comprehensive R Archive Network). You can browse all the available packages, over 7700 the last time I looked.

Some Random Thoughts to begin 2016

Hello everyone, and welcome to 2016. Well, a few days late anyways.

Currently I’m working with using R inside of SQL 2016. I’m putting together notes for both future blog posts and a new session I hope to present at SQL events later this year. This post is just some random thoughts that have popped into my brain that don’t fit into full posts.

  • As of January 4th there are new downloads for Microsoft R Server on the MSDN Subscriptions site as well as RRE for Windows. Different flavors for Hadoop and Terradata Db on Red Hat and SUSE Linux. I hadn’t heard of these before seeing a tweet about them. I’m not seeing a whole lot of information about them yet but it may just be rebranding after Microsoft acquired Revolution Analytics last year. Here’s a little blurb about it on ZDNet, and a description on MSDN.

“Microsoft R Server is a fast and cost-effective enterprise-class big data advanced analytics platform supporting a variety of big data statistics,  predictive modeling and machine learning capabilities. Microsoft R Server includes Open Source R and is fully compatible with R scripts, functions and CRAN packages, and offers a variety of analytics capabilities including exploratory data analysis, model building and model deployment.”

  • The version of R released with RRE Open was 3.2.2, at least when I grabbed it from the Revolution Analytics site after SQL 2016 CTP3 was released. The current version of R, if you download it from the CRAN site, is 3.2.3. I know it’s not a major release, just some minor enhancements. But it made me think about how Microsoft will handle updates to R. Will they be included in future SQL service packs or do you have to patch them yourself?

 

  • SQL 2005 will reach the end of it’s life cycle on April 12th 2016. I know I’m not fully ready for it but I’m working on it. My issue is with old applications that will break if we upgrade. We’re upgrading those apps now but we’ll still need to evaluate and decide what version of SQL we should move to. I’m also wondering how many 2005 or earlier instances are still running in production with no plans for upgrading? And if you’re running SQL 2008 it’s probably not a bad idea to start planning now.