spark wordcount 单词统计

spark wordcount 单词统计文件1.txthello worldhello tomhello lucytom lucyhello python# -*- coding:utf-8 -*-import osimport shutilfrom pyspark import SparkContextinputpath = '1.txt'outputpat...

houjibofa2050

260人浏览 · 2018-12-13 14:58:06

houjibofa2050 · 2018-12-13 14:58:06 发布

spark wordcount 单词统计

文件1.txt

hello world
hello tom
hello lucy
tom lucy
hello python

# -*- coding:utf-8 -*-
import os
import shutil

from pyspark import SparkContext

inputpath = '1.txt'
outputpath = 'result'

sc = SparkContext('local', 'wordcount')

# 读取文件
input = sc.textFile(inputpath)
# 切分单词
words = input.flatMap(lambda line: line.split(' '))
# 转换成键值对并计数
counts = words.map(lambda word: (word, 1)).reduceByKey(lambda x, y: x + y)

# 输出结果
result=counts.collect()
print result
for (word,count) in result:
    print word,count


# 删除输出目录
if os.path.exists(outputpath):
    shutil.rmtree(outputpath, True)

# 将统计结果写入结果文件
counts.saveAsTextFile(outputpath)

CSDN学习社区

CSDN联合极客时间，共同打造面向开发者的精品内容学习社区，助力成长！

更多推荐