Mumbai, Maharashtra, India Post Date: May 8, 2023 Full Time
Apply for job
Job Description
Responsibilities:
Key responsibility is to design and develop data pipeline including solution architecture, prototyping and development of data extraction, transformation/processing, cleansing/standardise and loading in Data Warehouse at real-time/near real-time frequency.
Source data can be of structured, semi structured and/or unstructured format.
Provide technical expertise to design efficient data ingestion solutions to consolidate data from RDBMS, APIs, Messaging queues, web logs, images, audios, documents etc of Enterprise Applications, SAAS applications, external 3rd party sites or APIs etc through ETL/ELT, API integrations, Change Data Capture, Robotic Process Automation, Custom Python/Java Coding etc
Development of complex data transformation using Talend (BigData edition), Python/Java transformation in Talend, SQL/Python/Java UDXs, AWS S3 etc to load in OLAP Data Warehouse in Structured/Semi-structured form
Development of data model and creating transformation logic to populate models for faster data consumption with simple SQL.
Implementing automated Audit & Quality assurance checks in Data Pipeline
Document & maintain data lineage & catalogue to enable data governance
Coordination with BIU, IT and other stakeholders to provide best-in-class data pipeline solutions, exposing data via APIs, loading in down streams, No-SQL Databases etc
Requirements:
Programming experience using Python / Java / PL-SQL, to create functions / UDXs
Extensive technical experience with SQL on RDBMS (Teradata / Vertica / AWS Redshift / Azure Synapse / SAP
Hana / Snowflake etc) including code optimization techniques
Strong ETL/ELT skillset using Talend BigData Edition (including custom coding).
Experience in Talend CDC / AWS DMS / Golden Gate / Striim.
Experience & expertise in implementing complex data pipeline, including semi-structured & unstructured data processing
Expertise to design efficient data ingestion solutions to consolidate data from RDBMS, APIs, Messaging queues, web logs, images, audios, documents etc of Enterprise Applications, SAAS applications, external 3rd party sites or APIs etc through ETL/ELT, API integrations, Change Data Capture, Robotic Process Automation, Custom Python/Java Coding etc
Know-how on any No-SQL DB (DynamoDB, MongoDB, CosmosDB etc),
Knowledge on Event Stream Processing
Ability to understand business functionality, processes and flows, Analyzing business problems