Position Overview
Role : Data Engineer
location : San Diego, CA 92129 (onsite)
Rate : $75/hr.
Design, build, and performance-tune Apache Spark workloads using Spark SQL and PySpark for complex transformations (JSON/semi-structured data, nested structures, window functions, joins, aggregations).
2. Profile and optimize Spark jobs: partitioning, shuffles, join strategies, skew, memory/spill, and right-sized resource usage—especially on EMR Serverless—for large-scale and petabyte-scale data.
3. Support Customers and Monitor Pipelines with Strict SLA for Fixs and Re Instating Issues around the clock.
4. Implement reusable patterns for incremental loads, deduplication and CDC-style processing.
5. Build and maintain ETL/ELT on AWS EMR Serverless (Spark), with S3 as the data lake layer: partitioning, compression, external tables, and layouts that support fast Spark and downstream SQL.
workload...