⏰ Full-time

Pyspark - Data Architect

🏢

Virtusa

                    Location
                    📍 dubai, United-Arab-Emirates
                

                    Posted
                    📅 June 18, 2026
                

                    Work Type
                    ⏰ Full-time
                

Position Overview

                    Responsibilities Data Pipeline Development: Design, develop and maintain highly scalable and optimized ETL pipelines using PySpark on the Cloudera Data Platform ensuring data integrity. 
Ingestion: Implement and manage data ingestion processes from a variety of sources (relational databases, APIs, file systems) to the data lake or data warehouse. 
Transformation and Processing: Use PySpark to process, cleanse and transform large datasets into meaningful formats that support analytical needs and business. 
Optimization: Conduct performance tuning of PySpark code and Cloudera components, optimizing resource utilization and reducing runtime of ETL. 
Quality and Validation: Implement data quality checks, monitoring and validation routines to ensure data accuracy and reliability throughout. 
Orchestration: Automate data workflows using tools like Apache Oozie, Airflow or similar orchestration tools within the Cloudera environment. 
...
                

Apply Now

Submit Application →

Quick and easy application process

Job Details

⏰

Employment Type

Full-time

📊