While Python and R have been dominating the programming domain for long, one particular tool that rose in parallel is Weka. Developed by University of Waikato, New Zealand, Weka stands for Waikato Environment for Knowledge Analysis. This software was exclusively built for machine learning for data mining and comprises various tools for data preparation, classification, regression, clustering, association rules mining, and visualisation. This article takes a lowdown on what is Weka and how is it turning to be useful for data science researchers.
Weka And Data Mining
Java is the primary programming language used in the development of Weka. In fact, the first version was not based on Java, instead, it had Tool Command Language (TCL) in its environment (initially Weka was used to perform data analysis in agricultural domains). With the inclusion of Java, later on, Weka’s applications spread on to data mining tasks concerning different areas such as in education and research.
Here is a short list of standard ML algorithms in Weka. The latter name is how it is addressed in Weka.
- Linear Regression: function.LinearRegression
- Logistic Regression: function.Logistic
- Naive Bayes: bayes.NaiveBayes
- Decision Tree (specifically the C4.5 variety): trees.J48
- k-Nearest Neighbors (also called KNN: lazy.IBk
- Support Vector Machines (also called SVM): functions.SMO
- Neural Network: functions.MultilayerPerceptron
- Random Forest: trees.RandomForest
- Bootstrap Aggregation (also called Bagging): meta.Bagging
- Stacked Aggregation (also called Stacking or Blending): meta.Stacking
All In A Single Interface
An open source tool under the GNU General Public License, Weka has a variety of GUI to start with data mining (it also has a ‘workbench’ feature, sometimes not included at the GUI window of earlier versions). They are known as Explorer, Experimenter and KnowledgeFlow.
Weka can be downloaded here.
- Explorer lets a user tinker around with the data and helps how it can be transformed for analysis. It also lets what algorithms go into the software.
- Experimenter runs the algorithms made by the user and provides a detailed analysis surrounding the data mining project.
- KnowledgeFlow helps design how the algorithms actually work in the project.
- Simple CLI means Command Line Interface for Weka. Users can use commands to work with the project instead of relying on the Explorer/GUI. The catch here is, it reduces memory constraint on Weka.
Weka has hundreds of algorithms for classification, data preprocessing, clustering etc. All of this can be performed easily since the implementations are already there, that too, with a single inter. Ian Witten, one of the creators of Weka, tells how the software features could be applied for data mining problems.
“One way of using WEKA is to apply a learning method to a dataset and analyze its output to learn more about the data. Another is to use learned models to generate predictions on new instances. A third is to apply several different learners and compare their performance in order to choose one for prediction. In the interactive WEKA interface, you select the learning method you want from a menu.”
Hence, this is the reason why Weka uses no code for machine learning. Its embedded software environment is what makes this possible. Suppose if an ML project is based on Java, there is no need for writing code again in Weka. In fact, when using Weka there is no requirement for knowing Java either. The GUI or CLI takes care of this part.
Other Avenues That Weka Can Explore
One of the reasons this software fell behind with ML users was its exclusivity with data mining. Had it advanced into other areas of data science such as data visualisation, Weka would be as popular as Python or R. Another aspect here is the Java environment in Weka. Not every user would be comfortable with Java, and moreover, they may dislike using Weka for this reason. Users also feel that the interface is old-fashioned and can be improved with more visual features.
Despite the criticism, an interesting development Weka has come up with is a deep learning package called WekaDeeplearning4j. It was developed to incorporate deep learning into Weka. Here, the backend is provided by the Deeplearning4j Java library. If Weka sees various facets like this in its platform, it will definitely grow large to tackle ML problems in general.