Design, develop, and support data engineering, data modeling, and data integrations, with a primary focus on accelerating data landing and curation in a Databricks data lake house. Build and maintain reliable, well-governed pipelines that ingest data from source systems into the lake house and curate it through a layered (medallion) architecture into analytics-ready, trusted datasets. The role also carries a strong reporting and data-analysis focus — partnering with business users to build semantic data models, dashboards, and reports, and performing hands-on analysis to answer business questions. The Data Engineer will help establish the data foundation that powers data-related AI and machine learning initiatives, ensuring high-quality, well-documented, AI-ready data products.
Key Responsibilities
• Build, optimize, and support pipelines that land data from source systems into the Databricks lake house and curate it through a layered (medallion) architecture into trusted, analytics-ready datasets.
• Produce and maintain high-quality, well-governed, documented, AI-ready data products that serve as the foundation for AI and machine learning initiatives.
• Implement data quality, governance, and monitoring controls (e.g., Unity Catalog, automated testing, alerting) across lake house pipelines.
• Develop and maintain reporting and analytics solutions — semantic data models, dashboards, and reports — and perform ad-hoc querying to support business decision-making.
• Gather requirements, design, and develop new data integrations or enhancements to existing code.
• Partner with business users and the Business Relationship Management team on requirements gathering, testing, and supporting existing integrations, analytics, and reporting.
• Create and maintain documentation and process flows for integration solutions.
Required Experience & Skills
• Minimum 5 years of IT/technology experience spanning data analysis, data engineering, and/or data integration, with a focus on building and curating pipelines in a cloud data lake or lake house environment.
• At least 3 years writing SQL/NoSQL queries, with specific experience in MS SQL Server, Oracle, and/or Postgres.
• Hands-on experience with a modern cloud data platform / lake house (Databricks, Microsoft Fabric, Snowflake, or comparable). Databricks strongly preferred.
• Demonstrated experience landing data from diverse source systems into a lake/lake house and curating it through a medallion (bronze-silver-gold) architecture into clean, conformed, analytics-ready datasets.
• Strong Python skills for data engineering, including PySpark.
• Working knowledge of data quality, data governance, and pipeline reliability practices — automated testing, monitoring, alerting, and orchestration of batch and incremental/streaming workloads.
• Experience designing simplified data models for integrations, analytics, and reporting; comfortable performing hands-on data analysis and ad-hoc querying.
• Experience extracting data from source systems via web services (SOAP, REST, Web APIs), XML, and CSV/Excel exports.
• Experience building the data foundation and automation pipelines for analytics and AI/ML initiatives, and partnering with business users on LLM/GenAI use cases.
• Bachelor's degree in Information Systems, IT, or a related technical discipline — or equivalent demonstrated technical proficiency.
• Strong interpersonal and communication skills; fluent in English (oral and written).
Preferred / Nice-to-Have
• Python, cloud data warehouse experience (e.g., Snowflake, Synapse), Spark SQL
• Performance tuning, partitioning, and optimization.
• Modern LLM architectures and GenAI frameworks — retrieval-augmented generation (RAG), embeddings and vector databases, prompt orchestration, and integrating LLMs into data products and pipelines.
• Familiarity with using LLMs in automation development and with vector/embedding data.
• Experience in the Oil & Gas domain.