Senior Data Engineer
Liminal
Liminal is a global market intelligence and strategic advisory firm specializing in digital identity, financial crime and compliance, and IT security technology solutions across industries while also catering to the private equity and venture capital community. Founded in 2016, Liminal offers strategic and analytical services supporting executive decision-making at all product and business lifecycle stages. We advise some of the world’s most prominent business leaders, investors, and policymakers on building, acquiring, and investing in the next generation of solutions and technologies. We provide access to proprietary data and analysis, strategic frameworks, and integrated insights on the industry’s only market intelligence platform.
Every major company in the world has started focusing on the next generation of digital identity technologies as a necessity for continued growth and security. Our team works with a myriad of organizations, from Fortune 100s to startups, across industries including financial services, technology, telecommunications, and the P2P economy. At Liminal, we help businesses build solutions, execute strategies, invest intelligently, and connect with key decision-makers. We know that it’s in the sharing of discovery and insights that groundwork is laid, problems are solved, and entire sectors advance at the speed of light. Keeping information to ourselves delays progress for all. At Liminal, we don't just respond to the market; we define it.
About the role
This role focuses on building and maintaining strong data architectures, pipelines, and systems that support the effective collection, storage, and processing of data across multiple departments. The Data Engineer will play a pivotal role in ensuring the scalability, reliability, and performance of our data systems. With a strong background in data engineering, cloud infrastructure, and data pipeline automation, the Data Engineer will work on projects from initial design to deployment, supporting the seamless integration of data workflows into product and operational teams.
What you'll do
Cross-Department Data Solutions:
- Collaborate with various departments to understand data needs, assess technical feasibility, and design efficient data engineering solutions to support organizational initiatives.
- Implement scalable data workflows that optimize data availability, quality, and accessibility for AI, business analytics, and other internal teams.
- Support product teams in transitioning mature data pipelines and systems to ensure alignment with product goals and technical requirements.
Data Pipeline Development & Optimization:
- Design, implement, and maintain data pipelines that ingest, process, and transform large-scale datasets for internal applications, including AI and machine learning models.
- Build efficient ETL (Extract, Transform, Load) processes that streamline the movement of data between systems, databases, and analytics platforms.
- Optimize data flows to ensure high performance, low latency, and scalability, adapting pipelines to handle both batch and real-time processing.
Cloud Infrastructure & System Integration:
- Develop and maintain cloud-based data infrastructure on a major cloud platform (e.g., AWS, Azure, GCP, or similar), ensuring data systems are robust, cost-effective, and performant.
- Implement data storage solutions and distributed databases that ensure seamless integration with other internal systems.
- Leverage cloud services for scalable data processing and storage, ensuring that infrastructure can support growing datasets and organizational demands.
Data Quality & Governance:
- Establish data validation processes to ensure data quality, consistency, and integrity across all pipelines and systems.
- Collaborate with data scientists and analysts to ensure data is structured and formatted for optimal use in analytics and AI applications.
- Ensure compliance with data governance policies and best practices for data privacy, security, and auditability.
Automation & Monitoring:
- Implement automation for data processing workflows, reducing manual intervention and ensuring consistent delivery of high-quality data.
- Set up monitoring and alerting systems for pipeline health, performance metrics, and data anomalies to proactively address any issues.
- Continuously optimize existing data systems and pipelines to improve performance, reduce errors, and enhance reliability.
Documentation & Collaboration:
- Maintain comprehensive documentation of data architectures, data pipeline designs, and system integrations to facilitate clear communication and collaboration.
- Document technical workflows, processes, and system configurations to ensure smooth handoffs and enable other teams to leverage data assets effectively.
- Collaborate with cross-functional teams, including data scientists, product developers, and business stakeholders, to ensure data solutions align with organizational goals.
Qualifications
- 5+ years of experience in data engineering, data architecture, and system design, with extensive experience building and optimizing large-scale data systems.
- Proficiency in Python, including object-oriented programming (OOP) and knowledge of software development best practices such as design patterns.
- Strong understanding of SQL and experience with database management systems, such as PostgreSQL, MySQL, MongoDB, or other NoSQL solutions.
- Experience with cloud-based platforms (e.g., GCP, AWS, Azure), particularly services for data storage, processing, and orchestration (e.g., BigQuery, Redshift, Synapse, S3, Cloud Storage, etc.).
- Solid experience in data pipeline development, including stream and batch processing, ETL frameworks, and workflow orchestration tools like Airflow.
- Experience with containerization technologies, including Docker, and orchestration tools like Kubernetes, ECS, AKS, or GKE.
- Familiarity with CI/CD pipelines and version control systems (e.g., Git) and the ability to integrate cloud services into these workflows for automated deployments.
- Proven ability to implement data security and privacy best practices, including encryption, access controls, and governance.
- Strong problem-solving skills, with demonstrated ability to debug and optimize data pipelines and cloud-based architectures in production environments.
- Excellent communication and collaboration skills, with the ability to work across cross-functional teams and engage with both technical and non-technical stakeholders.
- Familiarity with monitoring tools (e.g., CloudWatch, Stackdriver, Azure Monitor) for tracking pipeline health, performance, and error reporting.
Nice to Have
- Experience with Cloud Run / Cloud Run Functions.
- Familiarity with Apache Spark, which could be valuable as data volumes increase.
- Understanding of REST APIs and their role in data integration.
- Exposure to data modeling for AI and machine learning pipelines.