One of the key features of Python is its numerous libraries and packages. In this article, we will list down the popular packages and libraries in Python that are being widely used for numeric and scientific applications.
(The list is in no particular order).
Officially released in 2000-01, SciPy is free and open source library used for scientific computing and technical computing. The library consists of modules for optimisation,
image processing, FFT, special functions and signal processing.
The SciPy package includes algorithms and functions which are the crux of Python scientific computing capabilities. The sub-package includes:
- io: used for the standard input and output
- lib: this function is used to wrap python external libraries
- signal: used for processing signal tools
- sparse: used for algorithms related to sparse matrix
- spatial: widely used to determine paths in KD-trees, nearest neighbor and distance functions.
- optimise: used to optimise algorithms which include linear programming.
- linals: used for the regular linear algebra applications.
- interpolate: used for the integration of tools
- intergate: applied for integration of numerical tools
- fftpack: this subpackage helps for the discretion Fourier to transform algorithms
- cluster: the package consists of hierarchical clustering, vector quantisation, and K-means.
- misc: used for the miscellaneous utility applications.
- special: used to switch in special functions.
- weave: a tool to convert C/C++ codes to python programming.
- ndimage: used for wide range of functions in multi-dimensional image processing.
- stats: used for better understanding and analysing of statistical functions.
- constants: this algorithm includes physical specification and conversion components.
Pandas is the most important data analysis library of Python. Being open source, it is used for analysing data with Python. It can take data formats of CSV or TSV files, or a SQL database and convert it into Python data frames with rows and columns which is similar to tables in statistical formats. The package makes comparisons with dictionaries with the aid of ‘for’ loops which are very easy to understand and operate.
Python 2.7 and above versions are required to install Pandas package. We need to import the Panda’s library into the memory to work with it. The following codes can be run to implement different operations on pandas.
- Import pandas as pd (importing pandas library to memory), it is highly suggested to import the library as pd because next time when we want to use the application we need not mention the package full name instead we can name as pd, this avoids confusion.
- pd.read_filetype() (to open the desired file)
- pd.DataFrame() (to convert a specified python object)
- df.to_filetype (filename) (to save a data frame you are currently working with)
The advantage of using Pandas is that it can perform a bunch of functions on the tables we have created. The following are some functions that can be performed on selected data frames.
- df.median()-to get the median of each column
- df.mean()-to get the mean of all columns
- df.max()-to get the highest value of a column
- df.min()-to get the minimum value of a column
- df.std()-to get the standard deviation of each column.
- df.corr()-to specify the relationship between columns of a data frame.
- df.count()-to get the number of non-null values in each column of the data frame.
Developed by Fernando Perez in the year 2001, IPython is a command shell which is designed for interactive calculation in various programming languages. It offers self-examination, rich media, shell syntax, tab completion, and history.
IPython is a browser-based notebook interface which supports code, text, mathematical expressions, inline plots and various media for interactive data visualisation with the use of GUI (Graphic User Interface) toolkits for flexible and rectifiable interpreters to load into one’s own projects.
IPython architecture contributes to parallel and distributed computing. It facilitates for the enhanced parallel applications of various styles of parallelism such as:
- Customer user defined prototypes
- Task Parallelism
- Data Parallelism
- Message cursory using M.P.I (Message Passing Interface)
- Multiple programs, multiple data (MIMD) parallelism
- A single program, multiple data (SPMD) parallelism
Better known as Numpy, numeric Python has developed a module for Python, mostly written in C. Numpy guarantees swift execution as it is accumulated with mathematical and numerical functions.
Robust Python with its dynamic data structures, efficient implementation of multi-dimensional arrays and matrices, Numpy assures accurate calculations with matrices and arrays.
We need to import Numpy into memory to perform numerical operations.
- Import numpy as np (to import Numpy into memory)
- A_values=[20,30,40,50] (defining a list)
- A=np.array(A_values) (to convert list into one dimensional numpy array)
- print(A) (to get one dimensional array displayed)
- print(A*9/5 +32) (to turn values in the list into degrees fahrenheit)
Simply known as NLP, Natural Language Processing library is used to build applications and services that can understand and analyse human languages and data. One of the sub-libraries which are widely used in NLP is NLTK (Natural Language Toolkit). It has an active discussion forum through which they give hands-on guidance on programming basic topics such as computational linguistics, comprehensive API documentation, linguistics to engineers, students, industries and researchers. NLTK is an open source free community-driven project which is accessible for operating systems such as Windows, MAC OS X, and Linux. The implementations of NLP are:
- Search engines (eg: Yahoo, Google, firefox etc) they use NLP to optimise the search results for users.
- Social websites like Facebook, Twitter use NLP for the news feed. The NLP algorithms understand the interests of the users and show related posts.
- Spam filters: unlike the traditional spam filters, the NLP has driven spam filters to understand what the mail is about and decides whether it is a spam or not.
NLP includes well known and advanced sub-libraries which are very effective in mathematical calculations.
- NLTK, which handles text analysis and related problems. Having over 50 corpora and lexicons, 9 stemmers and handful of algorithms NLTK is very popular for education and research. The application involves a deep learning and analysing process which makes it one of the tough libraries in NLP
- TextBlob, which is a simple library for text analysis
- Stanford core NLP, a library that includes entity recognition, pattern understanding, parsing, tagging etc.
- SpaCy, which presents the best algorithm for the purpose
- Gensim, which is used for topic prototypes and document similarity analysis