We are looking for a Big Data Engineer that will work on the collecting, storing, processing, and analyzing of huge sets of data.
You will be member of a team that develops and implements advanced algorithms and data pipelines that extract, classify, merge, and deliver new insights and business value out of heterogeneous structured and unstructured data sets.
The primary focus will be on choosing optimal solutions to use for these purposes, then maintaining, implementing, and monitoring them.
You will have a chance to learn and work with multiple technologies and Thought Leaders in the Big Data space. You will also be responsible for integrating them with the architecture used across the company.
Work with consultant teams on specific customer deliverables as and when required.
Integrating any Big Data tools and frameworks required to provide requested capabilities.
Designing and Implement Data Lake.
Monitoring performance and advising any necessary configurations & infrastructure changes.
Debugging and Resolving Hadoop (YARN / Map Reduce / Spark etc.) issues.
Advise and implement Data lake security using Kerberos / Knox / Ranger / SSL etc.
Skills and Qualifications
Master / Bachelor’s degree in Computer Science or related field with minimum one-year experience
Proficient understanding of distributed computing principles
Management of Hadoop cluster, with all included services using Apache Ambari, Cloudera Manager, MapR control system
Proficiency with Hadoop v2, MapReduce, HDFS, YARN, Tez
Experience with building stream-processing systems, using solutions such as Storm or Spark-Streaming, NiFi.
Good knowledge of Big Data querying tools, such as Pig, Hive, Oozie and Impala
Working knowledge of Apache Spark
Experience with integration of data from multiple data sources
Experience with NoSQL databases, such as HBase, Cassandra, MongoDB
Knowledge of various ETL techniques and frameworks, such as Flume
Experience with various messaging systems, such as Kafka or RabbitMQ
Experience with Big Data ML toolkits, such as Mahout, SparkML, or H2O
Good understanding of Lambda Architecture, along with its advantages and drawbacks
Experience with any of the following Hadoop distributions : Cloudera / MapR / Hortonworks
Training / Certification on any Hadoop distribution will be a plus.
Completion of any MOOCS will be an advantage.