read

Scala is very easy to get started but hard to master.
Specially because it’s a very context aware language.

On this post I’ll be covering the fundamentals of Scala using the notes I’ve taken from two excellent resources:

  • Spark with Scala - Professional Development Seminar with Elephant Scale’s Sujee Maniyam. Sujee is a seasoned Big Data practicioner and his company delivers excellent training in Big Data and Data Science technologies. I attended this workshop on August 5th, 2017 at Intel, Santa Clara. If your company is interested on Big Data related training, check his company on this link.

  • Scala and Spark for Big Data and Machine Learning by Jose Portilla. This is a well delivered and accesible online course to get you started with Scala and Spark for Big Data. It also includes some theory on machine learning and practical exercises to follow along. It is hosted on Udemy, check out this link.

Content


General Notes on Scala

Variables and Values: Declaration

Strings: Slicing and Simple Regex

Collections: Lists, Sets, Maps, Arrays

Control Flow: If, For, While

Logical Operators: And, Or, Not

Collection Methods: Map, Filter, Reduce



General Notes on Scala



Scala facts

  • As opposed to Java which is declarative “we specify how to do it”, in Scala we specify “what” to do.

  • Scala is a functional language. Functions are full class objects and can be passed as arguments.

  • Scala is a pure Object-Oriented language. Even numbers and functions.

    Take the sum: 1 + 2 executes as 1.sum(2)

  • Interoperates with Java. Runs on Java Virtual Machine.

Scala and Spark

  • Scala wins in performance over other supported languages such as Python

  • Works seamlessly with the Hadoop environment

  • Two goals in mind: Concise and Functional

  • An expression is a single unit of code that returns a value

  • A statement is an expression that doesn’t return a value.



Variables and Values



Declaration

The difference between var and val in Scala is immutability.

var stands for variable (mutable)

val stands for value (immutable or constant)

Syntax

val <name>: <type> = <literal>

var <name>: <type> = <literal>

Sample Usage

val someval: Int = 1

Which reads:

“Assign value named someval, which is expected to be an Integer, to be equal to one.”

Notes

  • var can be reassigned only for same data type.
  • Scala infers datatypes. In practice they are typically omitted.


Strings



Syntax

val <name>: String = "<literal>"

Sample Usage

// Declaration

val st: String = "This is an awesome string."

// Slicing first character

st.charAt(0)

res0: Char = T

// Slicing a range of characters
st slice (0,4)

res1: String = This

// Basic Regex for Substrings

st contains "awesome"

res2: Boolean = true

Notes

  • Strings are enclosed by double quotes.
  • Remember everything in Scala are objects. Use the string methods.


Collections



Types

  1. List: Sequences-ordered collection
  2. Set: Unordered collection with no duplicates
  3. Map: Collection of key/value pairs
  • Collections are parametrized by a type using [type]
  • The type is often inferred by contents, not explicitly set.

Sample Usage for List

val bikes = List[String]("Colnago", "Cervelo")

bikes: List[String] = List(Colnago, Cervelo)

// Or let Scala infer the type

val bikes = List("Colnago", "Cervelo")

bikes: List[String] = List(Colnago, Cervelo)

// Slicing

bikes slice (0,1)

res3: List[String] = List(Colnago)

// Ranges

val arr = 1 to 10

arr: scala.collection.immutable.Range.Inclusive = Range 1 to 10

Sample Usage for Set

val abc = Set("a", "a", "b", "c") //duplicated filtered out

abc: scala.collection.immutable.Set[String] = Set(a,b,c)

Sample Map Usage

val abc123 = Map(("a",1), ("b",2), ("c",3)) //key, value pairs

val abc123 = Map("a"-> 1, "b"->2, "c"->3)

// To add new element, use mutable version of Map

// Use arrow notation

val ab12 = collections.mutable.Map(("a",1), ("b",2))

ab12 += ("c" -> 3)

Notes

  • Note there are mutable and immutable versions of collections
  • Scala defaults to immutable
  • All collections support iteration
  • Sequences like Lists or Arrays support indexed access
  • Maps provide access by key


Control Flows



Syntax - If Statement

if (boolean){

    do something

}else if(boolean){

    do something else

}else{

    do something else if no boolean is true

}

Syntax - For Statement

for(item <- iterable){

    do something

}

Syntax - While Statement

while(boolean){

    do something

}


Logical Operators



Syntax

// And

&&

// Or

||

// Not

!


Collection Methods: Map, Filter, Reduce



Useful Collection Type Methods

  • Map Operations: Apply a function to each element in collection (N to N)

  • Sub-Collection Retrieval (take, filter, slice): Return a Sub-Collection identified by index range

  • Folds/Reductions: Apply a binary operation to successive elements

Sample Usage - Map

var li = List(1,2,3,4)

// Square the list using map.

// Much like a lambda function in python.

var squarelist = li.map(x => x*x)

squareList: List[Int] = List(1, 4, 9, 16)

Sample Usage - Filter

  • Only elements that pass the filter are kept in the new collection.
var li = List(1,2,3,4)

// Much like a lambda function in python.

var isEvenList = li.filter(x => x%2 ==0)

isEvenList: List[Int] = List(2, 4)

Sample Usage - Reduce

  • Applies function to produce a “minimal” result
  • Ex. Sum: Sum up all in list and product single Int
var li = List(1,2,3,4)

// Much like a lambda function in python.

var reducedList = li.reduce( (a,b) => a+b)

res5: Int = 10





That is all I intended to cover on this post! It turned out to be an overview of Scala’s syntax and should give you an idea of how easy it is to get started with Scala.

If you want to practice some coding questions, I’ve put together a few on this repo.

And finally, to remind ourselves the motivation for using Spark / Scala, I’ll close with an extract from Russell Jurney’s excellent book Agile Data Science 2.0: Building Full-Stack Data Analytics Applications with Spark:

There is great interest in brining new tools to bear on formerly intractable problems, to derive entirely new products from raw data, to refine raw data into profitable insight, and to productize and productionize insight in new kinds of analytics applications. These tools are processor cores and disk spindles, paired with visualization, statistics, and machine learning. This is data science.

Blog Logo

Pablo Felgueres


Published