
Loading…

Loading…

Loading job…
Capgemini [CGEMJP00305194]
Location
Job type
Workplace
Duration
Posted
Compensation
Job Description Description:
Description:
Possible 3 Month CTH | No Fees | Do Not Re-Post | Confidential Submit candidates under their legal name and use only Capgemini template
GTD ID# UUTXZH Role Name: Senior Spark Data Engineer Location: Chicago, IL - onsite
Job Description
Job Summary: We are seeking an experienced Senior Data Engineer with strong expertise in Apache Spark to help build and manage scalable, secure, and efficient data platforms. This role will be instrumental in designing data architectures and pipelines that support both advanced analytics and governed data access across the organization. You will work with cross-functional teams to enable data discovery, lineage, and compliance while delivering high-performance data processing systems.
Key Responsibilities: • Design, build, and optimize scalable data pipelines using Apache Spark. • Manage and govern data access and metadata using AWS Data Zone. • Implement and enforce data access controls, lineage tracking, and data classification. • Integrate data across cloud platforms and on-prem systems into unified data lakes and warehouses. • Partner with data analysts, scientists, and product teams to deliver clean, reliable, and well-governed datasets. • Develop and automate ingestion, transformation, and quality validation workflows. • Ensure data compliance and security policies are implemented consistently across the platform. • Contribute to architecture and governance strategy for enterprise-scale data platforms. • Support performance tuning, troubleshooting, and monitoring of Spark jobs and data pipelines. • Mentor junior engineers and support team development through code reviews and documentation.
Required Skills & Qualifications: • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field. • 5+ years of experience in data engineering with production-level data systems. • Expert in Apache Spark (PySpark or Scala), including performance tuning and optimization. • Strong experience with Datazone for data governance and access management. • Proficient in SQL and modern data architecture concepts (e.g., lakehouse, Delta Lake). • Hands-on experience with cloud platforms (AWS, Azure, or GCP), especially in data services (e.g., S3, ADLS, Redshift, Synapse). • Experience with orchestration tools like Airflow, Athena, or similar. • Building the Frame work to load data for the data Ingestion for the sources files & RDBMS • Designing and developing data layers using Framework configuration by using the ABCR Meta data. • Using PySpark/Scala to load data, created schema, processed data and sent to kafka. • Optimization of Spark Jobs using Pyspark. • Performing data processing such as aggregation, joins, filter as per business rule.
• Strong knowledge of data governance, lineage, access control, and compliance frameworks. • Familiarity with DevOps and infrastructure-as-code tools (e.g., Terraform, Git, CI/CD pipelines).
Preferred Qualifications: • Experience with Spark Structured Streaming, Kafka, or similar real-time systems. • Working knowledge of enterprise metadata management and data catalog tools. • Prior experience implementing role-based access control (RBAC), row-level security, and data masking. • Exposure to MLflow, Feature Store, and integration of data pipelines with ML models. • Contribution to open-source or community initiatives around Spark or Databricks.
PLEASE SEND CANDIDATES WITH THIS INFORMATION / THE LACK OF THIS INFORMATION WILL RESULT IN AUTOMATIC REJECTION. ADD PHOTO ID AT THE END OF THE RESUME Legal name: Rate: Location (City and State): Relocate? Availability to join: Availability to interview: Open to CTH: Phone #: Mobile#: Email address: Visa type and expiration date: Hiring Status (C2C/W2/1099): Was he a previous Capgemini contractor? Was he a previous Capgemini full time employee? Time slots for an interview: If the resource has a visa, what company owns it? Are you working directly with the contractor’s visa holder?
Estefania Ocheita ERM | SubCo Staffing Center Capgemini North America
Enable Skills-Based Hiring
No
Named Job Posting? (if Yes - needs to be approved by SCSC)
try { var fgTooltip = new FG.Tooltip({ element: $('#cf_descz1901221542246011397492e'), text: "To\x20be\x20populated\x20with\x20\x22Yes\x22\x20when\x20Job\x20Seeker\x20to\x20be\x20onboarded\x20is\x20already\x20known\x20at\x20the\x20time\x20of\x20Job\x20Posting\x20creation.\x20Must\x20be\x20approved\x20by\x20SCSC." }).initialize(); } catch(err) {}
Additional Details Global Grade : CNamed Job Posting? (if Yes - needs to be approved by SCSC) : NoRemote work possibility : NoGlobal Role Family : 60242 (P) Data ManagementGlobal Technical Skills Family : 6341 (T) Data Science and AnalyticsLocal Role Name : Senior Spark Data EngineerLocal Skills : Hannah MutchLanguages Required: : English