文章/答案/技术大牛

发布

社区首页 >问答首页 >[Scala/Scalding]：映射ID到名称

问[Scala/Scalding]：映射ID到名称
EN

Stack Overflow用户

提问于 2014-04-16 08:23:41

回答 1查看 215关注 0票数 0

我对scalding相当陌生，我正在尝试编写一个Scalding程序，它接受2个数据集作为输入: 1) book_id_title：(' ID，' title )：包含图书ID和图书标题之间的映射，两者都是字符串。2) book_sim：('id1，'id2，'sim)：包含图书对之间的相似度，由它们的ID标识。

滚烫程序的目标是通过查找book_id_title表，将book_ratings中的每个(id1，id2)替换为它们各自的标题。但是，我无法检索到标题。如果有人能帮助我使用下面的getTitle()函数，我将不胜感激。

我的滚烫代码如下：

  // read in the mapping between book id and title from a csv file
  val book_id_title =
       Csv(book_file, fields=book_format)
         .read
         .project('id,'title)

   // read in the similarity data from a csv file and map the ids to the titles
   // by calling getTitle function
  val result = 
      book_sim
      .map(('id1, 'id2)->('title1, 'title2)) {
           pair:(String,String)=> (getTitle(pair._1), getTitle(pair._2))
       }
      .write(out)


  // function that searches for the id and retrieves the title
  def getTitle(search_id: String) = {
      val btitle = 
         book_id_title
           .filter('id){id:String => id == search_id} // extract row matching the id
           .project('title)  // get the title
   }

谢谢

map

filter

scalding

回答 1

Stack Overflow用户

发布于 2014-04-16 12:49:46

Hadoop是一个批处理系统，无法通过索引查找数据。相反，您需要通过id连接book_id_title和book_sim，可能需要两次:对于左id和右id。类似于：

book_sim.joinWithSmaller('id1->id, book_id_title).joinWithSmaller('id2->id, book_id_title)

我对基于字段的API不是很熟悉，所以把上面的代码当做伪代码。您还需要添加适当的投影。希望它仍然能给你一个想法。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/23096837

复制

相似问题

问[Scala/Scalding]：映射ID到名称
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问[Scala/Scalding]：映射ID到名称EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问[Scala/Scalding]：映射ID到名称
EN