目标
在远程MLflow跟踪服务器中跟踪R模型(在kubernetes中运行)。该模型是从在Docker容器中运行的RStudio在本地计算机上开发的。
设置
根据我的研究,我需要在安装了conda的情况下创建RStudio镜像。之后,我想运行MLflow文档中的示例。
Dockerfile
FROM rocker/rstudio
USER root
ENV PATH="/root/miniconda3/bin:${PATH}"
ARG PATH="/root/miniconda3/bin:${PATH}"
ENV MLFLOW_BIN=/root/miniconda3/bin/mlflow
ENV MLFLOW_PYTHON_BIN=/root/miniconda3/bin/python
RUN apt-get update
RUN apt-get install -y wget && rm -rf /var/lib/apt/lists/*
RUN wget \
https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
&& mkdir /root/.conda \
&& bash Miniconda3-latest-Linux-x86_64.sh -b \
&& rm -f Miniconda3-latest-Linux-x86_64.sh
RUN conda --version
RUN R -e 'install.packages("mlflow")'
RUN R -e 'install.packages("glmnet")'
RUN R -e 'install.packages("carrier")'
RUN pip install -U mlflow==1.19.0train.R (从here调整的示例)
# The data set used in this example is from http://archive.ics.uci.edu/ml/datasets/Wine+Quality
# P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
# Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.
library(mlflow)
library(glmnet)
library(carrier)
set.seed(40)
# Read the wine-quality csv file
data <- read.csv("wine-quality.csv")
# Split the data into training and test sets. (0.75, 0.25) split.
sampled <- sample(1:nrow(data), 0.75 * nrow(data))
train <- data[sampled, ]
test <- data[-sampled, ]
# The predicted column is "quality" which is a scalar from [3, 9]
train_x <- as.matrix(train[, !(names(train) == "quality")])
test_x <- as.matrix(test[, !(names(train) == "quality")])
train_y <- train[, "quality"]
test_y <- test[, "quality"]
alpha <- mlflow_param("alpha", 0.5, "numeric")
lambda <- mlflow_param("lambda", 0.5, "numeric")
Sys.setenv(MLFLOW_S3_ENDPOINT_URL="<EP>")
Sys.setenv(AWS_ACCESS_KEY_ID="<some_key>")
Sys.setenv(AWS_SECRET_ACCESS_KEY="<some_secret>")
mlflow_set_experiment("Wine R experiment")
mlflow_set_tracking_uri("<http...blabla>")
with(mlflow_start_run(), {
model <- glmnet(train_x, train_y, alpha = alpha, lambda = lambda, family= "gaussian", standardize = FALSE)
predictor <- crate(~ glmnet::predict.glmnet(!!model, as.matrix(.x)), !!model)
predicted <- predictor(test_x)
rmse <- sqrt(mean((predicted - test_y) ^ 2))
mae <- mean(abs(predicted - test_y))
r2 <- as.numeric(cor(predicted, test_y) ^ 2)
message("Elasticnet model (alpha=", alpha, ", lambda=", lambda, "):")
message(" RMSE: ", rmse)
message(" MAE: ", mae)
message(" R2: ", r2)
mlflow_log_param("alpha", alpha)
mlflow_log_param("lambda", lambda)
mlflow_log_metric("rmse", rmse)
mlflow_log_metric("r2", r2)
mlflow_log_metric("mae", mae)
mlflow_log_model(predictor, "model")
})它的POC并不介意不安全的环境变量。
我像这样运行容器:
docker run --rm -p 8787:8787 -e PASSWORD=password --mount type=bind,source=$(pwd)/mlflow/examples/r_wine,target=/home/rstudio rstudio-mlflow问题
每次我从RStudio运行该文件时,最后一行(mlflow_log_model(predictor, "model"))都会出现错误:
cannot start processx process '/root/miniconda3/bin/mlflow' (system error 13, Permission denied) @unix/processx.c:608 (processx_exec)从RStudio终端列出conda bin文件夹时,权限被拒绝。你能帮我正确安装conda with RStudio镜像吗?
发布于 2021-08-29 12:43:34
rstudio与本地用户Rstudio一起安装。Dockerfile文件是here。rstudio用户没有/root/文件夹的权限。这就是为什么当你试图从rstudio执行mlflow时,你得到了“权限被拒绝”的错误。
下面的Dockerfile运行良好。
FROM rocker/rstudio
RUN useradd -g rstudio -m conda
USER conda
WORKDIR /home/conda
ENV PATH="/home/conda/miniconda3/bin:${PATH}"
ARG PATH="/home/conda/miniconda3/bin:${PATH}"
ENV MLFLOW_BIN=/home/conda/miniconda3/bin/mlflow
ENV MLFLOW_PYTHON_BIN=/home/conda/miniconda3/bin/python
RUN wget \
https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
&& bash Miniconda3-latest-Linux-x86_64.sh -b \
&& rm -f Miniconda3-latest-Linux-x86_64.sh
RUN conda --version
USER root
RUN R -e 'install.packages("mlflow")'
RUN R -e 'install.packages("glmnet")'
RUN R -e 'install.packages("carrier")'
RUN pip install -U mlflow==1.19.0https://stackoverflow.com/questions/68665324
复制相似问题