Install
- Download & uncompressed jdk
set the environment variables of java
123456export JAVA_HOME=/home/automation/java/jdk1.8.0_161export CLASSPATH=$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:.export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATHexport SBT_HOME=/home/automation/spark/sbt/sbtexport SPARK_HOME=/home/automation/spark/spark-2.2.1-bin-hadoop2.7export PATH=$PATH:$SBT_HOME/bin:$SPARK_HOME/binDownload & uncompressed spark
- set the environment variables of spark
- install sbt: used to build scala
Doc
- 打开命令行交互
dataset的构造函数中传入输入文件,调用dataset的[API处理文档](https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset) - 编写可运行的文件
build.sbt
SimpleApp.scala
放在规定的目录结构下面12345678910111213import org.apache.spark.sql.SparkSessionobject SimpleApp {def main(args: Array[String]) {val logFile = "YOUR_SPARK_HOME/README.md" // Should be some file on your systemval spark = SparkSession.builder.appName("Simple Application").getOrCreate()val logData = spark.read.textFile(logFile).cache()val numAs = logData.filter(line => line.contains("a")).count()val numBs = logData.filter(line => line.contains("b")).count()println(s"Lines with a: $numAs, Lines with b: $numBs")spark.stop()}}
sbt package 的时候如果有些包因为proxy无法下载
将包离线下载下来以后放到对应的目录下面 ~/.sbt/preloaded/*