Job details These are the details of the job, consider each of them before you apply. Company: Paige Salary: Salary to be Agreed Remote?: Yes Job Description: Paige is a software company helping pathologists and clinicians make faster, more informed diagnostic and treatment decisions by mining decades of data from the world’s experts in cancer care. We are leading a digital transformation in pathology by leveraging advanced Artificial Intelligence (AI) technology to create value for the oncology clinical team. Paige is the first company to develop clinical grade AI tools for the pathologist, which resulted in our receiving FDA breakthrough designation for our first product. We’re seeking a Data Engineer who will be working the development and support of software applications, tools and data management pipelines for research and clinical purposes. Following modern product development practices, you will also assist in the design, implementation and maintenance of tools that extract and manipulate data from various sources, including in-house and external databases. This is an extraordinary opportunity to be part of a high-performing team and to pursue a life-changing mission with unique technical challenges! This position can be fully remote for U.S. based applicants. Responsibilities Work on Data Warehouse, Data Lake and BI projects and architectures at Paige. Create and implement ETL pipelines that enables the extraction, transformation and transfer of large amounts of structured and unstructured data from various filesystems and databases, that are destined for the development of computation pathology algorithms. Handle the challenges that come with managing terabytes of data. Build tools to manage, automate and monitor our data and data processing infrastructure. Design and develop software tools into existing resources. Be responsible for design, coding, testing, packaging, debugging, documentation and deployment of software systems. Work independently to produce required functional, technical, and user documentation (e.g., business requirements, functional and technical specifications, system architecture, data flows, end-users training requirements) on assigned projects. Work and collaborate with data engineers, scientists, engineers, IT operations and medical doctors to build tools manipulating data in order to build a new generation of artificial intelligence applications for cancer detection and treatment. Requirements Experience in architecting, implementing and testing data processing pipelines (e.g. Spark, Beam, ...) and data mining / data science algorithms either on-premise or on a cloud environment. Experience in administrating and ingesting data into standard data warehouses (e.g. Amazon Redshift, Microsoft SQL Server, Google BigQuery or Snowflake). Experience architecting data warehouses and/or data lakes for large amounts of structured and unstructured data. Experience with data lakes and expertise with designing and maintaining a BI solution. Experience with workflow management tools and platforms, such as Airflow. Extensive experience in Python programming, or related language. Experience with RDBMS and NoSQL databases (e.g. MongoDB). Experience in packaging and deploying applications on-premise and in the cloud (e.g. AWS). Familiarity with modern development practices and DevOps. Interest in building non-standard medical software applications, in collaboration with medical partners. Cross-disciplinary and strong analytic skills. Bachelor’s degree in computer science or a related field, or equivalent years of experience. 3+ years of industry experience as a data engineer.