我想知道以前有没有人遇到过同样的问题。
我试图从一份工作中提取任务列表的信息。我是通过历史服务器的REST来做到这一点的。然而,我只得到20行数据,而在Spark中,所有的任务都会被反映出来(超过100 )。我附上了来自历史服务器的截图和日志。


在上面的图片中,您可以看到121项任务是如何在UI中显示的(由于空间不足,我不会附加121项任务的完整屏幕截图),但是当我查询REST时,只有20行。不管我用什么工具。
我将历史服务器的日志粘贴到这里。
16/04/15 09:23:00 INFO history.HistoryServer: Registered signal handlers for [TERM, HUP, INT]
16/04/15 09:23:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/04/15 09:23:01 INFO spark.SecurityManager: Changing view acls to: abrandon
16/04/15 09:23:01 INFO spark.SecurityManager: Changing modify acls to: abrandon
16/04/15 09:23:01 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(abrandon); users with modify permissions: Set(abrandon)
16/04/15 09:23:01 INFO history.FsHistoryProvider: Replaying log path: file:/tmp/spark-events/application_1460638681315_0004
16/04/15 09:23:01 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/04/15 09:23:01 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:18080
16/04/15 09:23:01 INFO util.Utils: Successfully started service on port 18080.
16/04/15 09:23:01 INFO history.HistoryServer: Started HistoryServer at http://172.16.100.1:18080
16/04/15 09:23:02 INFO history.FsHistoryProvider: Replaying log path: file:/tmp/spark-events/application_1460638681315_0005
16/04/15 09:23:03 INFO history.FsHistoryProvider: Replaying log path: file:/tmp/spark-events/application_1460638681315_0002
16/04/15 09:23:03 INFO history.FsHistoryProvider: Replaying log path: file:/tmp/spark-events/application_1460638681315_0008
16/04/15 09:23:03 INFO history.FsHistoryProvider: Replaying log path: file:/tmp/spark-events/application_1460638681315_0001
16/04/15 09:23:03 INFO history.FsHistoryProvider: Replaying log path: file:/tmp/spark-events/application_1460638681315_0006
16/04/15 09:23:03 INFO history.FsHistoryProvider: Replaying log path: file:/tmp/spark-events/application_1460638681315_0007
16/04/15 09:23:03 INFO history.FsHistoryProvider: Replaying log path: file:/tmp/spark-events/application_1460638681315_0003
16/04/15 09:23:22 INFO spark.SecurityManager: Changing view acls to: abrandon
16/04/15 09:23:22 INFO spark.SecurityManager: Changing modify acls to: abrandon
16/04/15 09:23:22 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(abrandon); users with modify permissions: Set(abrandon)
16/04/15 09:23:22 INFO history.FsHistoryProvider: Replaying log path: file:/tmp/spark-events/application_1460638681315_0007
16/04/15 09:23:22 INFO spark.SecurityManager: Changing acls enabled to: false
16/04/15 09:23:22 INFO spark.SecurityManager: Changing admin acls to:
16/04/15 09:23:22 INFO spark.SecurityManager: Changing view acls to: abrandon
16/04/15 09:26:44 INFO core.PackagesResourceConfig: Scanning for root resource and provider classes in the packages:
org.apache.spark.status.api.v1
16/04/15 09:26:48 INFO core.ScanningResourceConfig: Root resource classes found:
class org.apache.spark.status.api.v1.ApiRootResource
16/04/15 09:26:48 INFO core.ScanningResourceConfig: Provider classes found:
class org.apache.spark.status.api.v1.JacksonMessageWriter
16/04/15 09:26:48 INFO application.WebApplicationImpl: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
16/04/15 09:26:49 WARN inject.Errors: The following warnings have been detected with resource and/or provider classes:
WARNING: A sub-resource method, public scala.collection.Seq org.apache.spark.status.api.v1.OneStageResource.stageData(int), with URI template, "", is treated as a resource method发布于 2016-04-15 12:53:41
它似乎不在文档中,但是taskList端点使用分页来防止太大的响应,默认页面大小为20,如您在源代码中所看到的那样。
@GET
@Path("/{stageAttemptId: \\d+}/taskList")
def taskList(
@PathParam("stageId") stageId: Int,
@PathParam("stageAttemptId") stageAttemptId: Int,
@DefaultValue("0") @QueryParam("offset") offset: Int,
@DefaultValue("20") @QueryParam("length") length: Int,
@DefaultValue("ID") @QueryParam("sortBy") sortBy: TaskSorting): Seq[TaskData] = {
withStageAttempt(stageId, stageAttemptId) { stage =>
val tasks = stage.ui.taskData.values.map{AllStagesResource.convertTaskData}.toIndexedSeq
.sorted(OneStageResource.ordering(sortBy))
tasks.slice(offset, offset + length) // <--- here!
}
}所以,要么使用offset参数获取下一个页面,要么将?length=200添加到URL中,将其全部放在一个页面中。
(不过我还没试过)
https://stackoverflow.com/questions/36644214
复制相似问题