文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在scala中生成n个g？

问如何在scala中生成n个g？
EN

Stack Overflow用户

提问于 2011-11-24 14:55:06

回答 3查看 5.5K关注 0票数 8

我试图在scala中基于n-gram的分离新闻算法进行编码.如何为大型文件生成n克文件:例如，包含“蜜蜂是蜜蜂的蜜蜂”的文件。

，

，首先，它必须选择一个随机的n-克.例如，蜜蜂.
，然后它必须寻找以(n-1)字开头的n-克。例如.
的蜜蜂，它打印这个n克的最后一个单词.然后重复。

你能告诉我怎么做吗？很抱歉给您带来不便。

scala

n-gram

回答 3

Stack Overflow用户

发布于 2011-11-24 15:08:46

你的问题可以更具体一点，但这是我的尝试。

val words = "the bee is the bee of the bees"
words.split(' ').sliding(2).foreach( p => println(p.mkString))

票数 14

Stack Overflow用户

发布于 2013-05-24 09:58:58

您可以尝试使用参数为n的方法。

val words = "the bee is the bee of the bees"
val w = words.split(" ")

val n = 4
val ngrams = (for( i <- 1 to n) yield w.sliding(i).map(p => p.toList)).flatMap(x => x)
ngrams foreach println

List(the)
List(bee)
List(is)
List(the)
List(bee)
List(of)
List(the)
List(bees)
List(the, bee)
List(bee, is)
List(is, the)
List(the, bee)
List(bee, of)
List(of, the)
List(the, bees)
List(the, bee, is)
List(bee, is, the)
List(is, the, bee)
List(the, bee, of)
List(bee, of, the)
List(of, the, bees)
List(the, bee, is, the)
List(bee, is, the, bee)
List(is, the, bee, of)
List(the, bee, of, the)
List(bee, of, the, bees)

票数 5

Stack Overflow用户

发布于 2013-12-17 12:48:58

这里是一种基于流的方法。这将不需要太多的内存，同时计算n-克.

object ngramstream extends App {

  def process(st: Stream[Array[String]])(f: Array[String] => Unit): Stream[Array[String]] = st match {
    case x #:: xs => {
      f(x)
      process(xs)(f)
    }
    case _ => Stream[Array[String]]()
  }

  def ngrams(n: Int, words: Array[String]) = {
    // exclude 1-grams
    (2 to n).map { i => words.sliding(i).toStream }
      .foldLeft(Stream[Array[String]]()) {
        (a, b) => a #::: b
      }
  }

  val words = "the bee is the bee of the bees"
  val n = 4
  val ngrams2 = ngrams(n, words.split(" "))

  process(ngrams2) { x =>
    println(x.toList)
  }

}

产出：

List(the, bee)
List(bee, is)
List(is, the)
List(the, bee)
List(bee, of)
List(of, the)
List(the, bees)
List(the, bee, is)
List(bee, is, the)
List(is, the, bee)
List(the, bee, of)
List(bee, of, the)
List(of, the, bees)
List(the, bee, is, the)
List(bee, is, the, bee)
List(is, the, bee, of)
List(the, bee, of, the)
List(bee, of, the, bees)

票数 4

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/8258963

复制

相似问题

问如何在scala中生成n个g？
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在scala中生成n个g？EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在scala中生成n个g？
EN