Scala or Scalable language is an extension of Java language which runs on Java Virtual Machine (JVM). It is one of the de facto languages when it comes to playing practically with Big Data. This statically-typed language serves as an important tool for the data scientists because it supports both anonymous functions as well as higher-order functions. In this article, we list down 10 Scala Libraries for a data science enthusiast.
(The list is in alphabetical order)
Breeze is a set of libraries for machine learning and numerical computing and is a part of ScalaNLP umbrella project. It is a library for numerical processing which is modelled on Scala. It provides a set of libraries for ScalaNLP which includes linear algebra, numerical computing, and optimisation. It aims to enable a generic, powerful yet still efficient approach to ML.
Breeze-viz is a visualisation library which is backed by Breeze for Scala. This prominent Java charting library, JFreeChart as well as the Matlab-like “image” command.
DeepLearning.scala is a deep learning toolkit for Scala which combines object-oriented and functional programming constructs. It is a simple library for creating statically typed dynamic neural networks from map/reduce and other higher-order functions. Using this library, writing of code is almost the same and the only difference is that the code based on this library is differentiable which enables such code to evolve by modifying its parameters continuously.
Epic is a structured prediction framework for Scala which includes classes for training high-accuracy syntactic parsers, part-of-speech taggers, named entity recognisers, and much more. It is distributed under the Apache License, Version 2.0 and can be used programmatically or from the command line, using either pre-trained models or with models that a developer has trained by himself. Epic has support for three kinds of models which are parsers, sequence labellers, and segmenters. Parsers produce syntactic representations of sentences, sequence labellers are sort of part-of-speech taggers and segmenters break a sentence into a sequence of fields.
5| Apache PredictionIO
Apache PredictionIO is an open-source machine learning server built on top of a state-of-the-art open source stack for developers and data scientists in order to create predictive engines for any machine learning task. PredictionIO chose Scala as its JVM language over Java primarily because of the advantages it brings to functional programming. This ML server allows to quickly build and deploy an engine as a web service on production, respond to dynamic queries in real-time, speed up machine learning modelling with systematic processes and pre-built evaluation measures, simplify data infrastructure management and much more.
Saddle is a high-performance data manipulation library for Scala which provides array-backed, indexed, one and two-dimensional data structures, vectorised numerical calculations, automatic data alignment as well as robustness to missing values. It is licensed under Apache License version 2.0 and is said as the easiest and most expressive way to program with structured data on Java Virtual Machine (JVM).
ScalaLab is an efficient scientific programming environment for the Java Virtual Machine (JVM). The main potential of the ScalaLab is numerical code speed and flexibility. Also, a major design priority of ScalaLab is its user-friendly interface. The MATLAB-like mathematical domain-specific language of ScalaLab is termed as ScalaSci which is developed as an internal domain-specific language.
Statistical Machine Intelligence and Learning Engine (SMILE) is a fast and comprehensive machine learning engine. Smile provides hundreds of advanced algorithms with a clean interface and one is able to write applications quickly in Java, Scala, or any JVM languages. Scala API also offers high-level operators that make it easy to build machine learning apps. This engine covers almost every aspect of machine learning techniques such as classification, regression, clustering, association rule mining, feature selection, manifold learning, multidimensional scaling, genetic algorithm, among others.
Summingbird is a library which allows writing MapReduce programs that look like native Scala or Java collection transformations and execute them on a number of well-known distributed MapReduce platforms, including Storm and Scalding. The Summingbird program can be executed in batch mode using Scalding while it can be executed in real-time mode using Storm.
Vegas is a data visualisation library in Scala. It a Scala API for declarative, statistical data visualisations where once is able to work with data files as well as Spark DataFrames and perform filtering, transformations, and aggregations as part of the plotting specification. This library works by compiling down the Scala code into strongly typed JSON specifications.