We are supporting in the hiring for big data engineers with some experience in scala and apache spark. You will be working in a team of highly talented engineers and data scientists. Your main responsibility will be to write highly performant and scalable code that will run on top of their Big Data platform (Spark/Hive/Impala/Hadoop). Also, you will be working closely with the Data Science team to support them in the ETL process (including the cohorts building efforts).
A typical day might include:
- Working in a cross-functional team - alongside talented Engineers and Data Scientists
- Building scalable and high-performant code
- Mentoring less experienced colleagues within the team
- Implementing ETL and Feature Extractions pipelines
- Monitoring cluster (Spark/Hadoop) performance
- Working in an Agile Environment
- Refactoring and moving our current libraries and scripts to Scala/Java
- Enforcing coding standards and best practices
- Working in a geographically dispersed team
- Working in an environment with a significant number of unknowns - both technically and functionally.
Ideally you will have:
- Essential - Strong experience in apache spark and scala
- Strong analytical and problem solving skills with personal interest in subjects such as math/statistics, machine learning and AI.
- Practical experience of TDD (test driven development)
- Practical experience of ScalaTest
- Practical experience of SQL
- Practical experience of Git (or similar like Bitbucket)
- Experience refactoring code with scale and production in mind
- Experience with integration of data from multiple data sources
- NoSQL databases, such as HBase, Cassandra, MongoDB
- Experience with any of the following distributions of Hadoop - Cloudera/MapR/Hortonworks
- Solid knowledge of data structures and algorithms
Bonus points for:
- Degree in Computer Science(or similar) with experience developing software in a commercial environment - alternatively, a relevant combination of education, training and work experience.
- Other functional Languages such as Haskell and Clojure
- Big Data ML toolkits such as Mahout, SparkML and H2O
- Apache Kafka, Apache Ignite and Druid
- Container technologies such as Docker
- Cloud Platforms technologies such as DCOS/Marathon/Apache Mesos, Kubernetes and Apache Brooklyn.