Big Data engineering is one of the most demanding specialisations in India today. In an interview with Analytics India Magazine, an upGrad data scientist had revealed that there’s more to the field of Big Data than just popular job roles such as Data Scientists, Machine Learning engineers, and Data Architects. In this article, we list down 5 current openings on Big Data technology.
- Designing databases and data pipelines for storing and processing large, sometimes-unstructured data-sets for use with an analytics platform.
- Executing batch jobs on a custom-built computing cluster or any standard ETL tools or using custom code in SQL, Java, and Python.
- Working closely with the data analytics team to build a robust suite of libraries for extracting data from the databases.
- Create libraries for data quality assurance or data sanity checks
- Experience with SQL (MySQL), Columnar (MariaDB/InfiniDB) as well as NoSQL (Cassandra) databases
- Familiarity with programming best practices, design patterns, version control systems
- A sound understanding of parallel/distributed programming
- Attitude to go extra mile for the completion of the assigned work
- Good command in Java or Python
- The ability to work effectively with people from a variety of backgrounds
Location: Gurgaon, New Delhi, Noida
- Working with a large set of database and creating algorithms
- Work with the data science team to make sure that there is no gap between the tech and models.
- Strong coding skills in Python,
- Experience with databases, cloud with SOLR search
- Knowledge of working with large data sets is essential.
- Manage large scale Hadoop cluster environments, handling all Hadoop environment builds, cluster setup, performance tuning, and ongoing monitoring.
- Evaluate and recommend systems software and hardware for the enterprise system including capacity modelling.
- Work with the core production support personnel in IT and Engineering to automate deployment and operation of the infrastructure.
- Manage, deploy, and configure infrastructure with Ansible or other automation toolsets.
- Capacity planning and implementation of new/upgraded hardware and software releases as well as for storage infrastructure.
- Responsible for monitoring the Linux community and report on important changes/enhancements to the team.
- 2 years of professional experience supporting production medium to large scale Linux environments.
- 2 years of professional experience working with Hadoop (HDFS & MapReduce) and related technology stack.
- A deep understanding of Hadoop design principals, cluster connectivity, security and the factors that affect distributed system performance.
- Experience on Kafka, HBase, HDFS, Yarn, Spark, and Hortonworks is mandatory.
- MapR and MySQL experience a plus.
- Understanding of automation tools (Ansible)
- Expert experience with at least one of the following languages; Python, Perl, Ruby, or bash.
- Prior experience with remote monitoring and event handling using Nagios, ELK.
- Solid ability to create automation with ansible or a shell.
- Good collaboration & communication skills, the ability to participate in an interdisciplinary team.
- Strong written, communication and documentation experience.
- Knowledge of best practices related to security, performance, and disaster recovery.
- Someone who can contribute to the success of the company as an individual contributor
- Good problem-solving skill
- A proactive, self-driven individual with a passion for innovation
- An ethical professional with dedication and determination to make a game-changing impact on the product
- At least 6 + years industry experience
- Strong understanding of Java (or Scala) and object-oriented programming
- Web service development
- Good Understanding of Big data technologies like Hive, Spark and familiarity with Hadoop ecosystem.
- Good understanding of database and SQL.
- Familiarity with NoSQL like Cassandra, Elastic search and NoSQL schema design,
- Familiarity with Networking, Security domain
- Familiar with various design and architectural patterns
- Understanding fundamental design principles behind building long term scalable applications
- Proficiency with Agile methodologies/practices
- Ability to drive high standards of product quality
Location: New Delhi
- The individual is expected to work on the development and management of a Big Data platform which shall be used to handle billions of data points from various sources in real time. The platform serves multiple purposes – it is used for predefined and ad-hoc user-generated queries, real-time Machine Learning algorithms, and various business metrics and reports.
- The individual is expected to create the data flow pipeline and set up workflows to handle the various kinds of requirements being served by the platform.
- A Big Data engineer who will work on collecting, storing, processing, and analysing huge sets of data.
- The primary focus will be on choosing optimal solutions to use for these purposes, then maintaining, implementing, and monitoring them.
- You will also be responsible for integrating them with the architecture used across the company.
- Selecting and integrating any Big Data tools and frameworks required to provide requested capabilities
- Implementing the ETL process and building, maintaining, monitoring the data pipeline
- Monitor data quality and enable system owned resolution channels
- Monitoring performance and advising any necessary infrastructure changes
- Ability to rationalise data science models through an operationalization platform – version, archive, manage the model deployment
- Defining data retention policies
- Experience in Big Data and Machine Learning Technologies/Frameworks
- Hands-on with Scala, Apache Spark, Hadoop, HDFS, MLlib, Airflow
- Ability to work in a container-based environment – docker, kubernetes
- Experience with data science tools – R and Python
- Education /Qualification: BE / B.Tech from Premier Institute
- Relevant Experience (Minimum): 1 – 5 years