AI, Data Engineering, and Document Automation
AI Document Extraction & Automation PoC
Technologies: LLM, OCR, AI-Agent, Python, JSON pipelines
Summary:
Built a full pipeline combining OCR (layout extraction), image+text hybrid processing, and LLM-based structured extraction.
Delivered multi-page PDF evaluation, scoring, and automated JSON generation for inspection and contract documents.
Highlights:
- Page-level OCR + layout understanding
- Multi-item JSON output per file
- Accuracy evaluation (DeepEval-style)
- Error analysis & iterative improvement
- Custom Python ingestion + preprocessing
DynamoDB → Data Warehouse ETL Migration
Client: Marubeni Corporation
Tech: AWS (Lambda, Glue, S3), Python, SQL, Power BI
Designed and implemented a new DWH schema and ETL pipeline.
Transformed transactional data into a star schema for analytics.
Databricks + Airflow Large-Scale ETL
Client: OPTUS via Tech Mahindra
Tech: Databricks (PySpark), Airflow, Delta Lake
Implemented long-running transformation jobs, optimized performance, and resolved critical pipeline failures.
Scraping & Distributed Computing System
Tech: Java, Python, AWS Lambda, EC2, Redis, S3, SQS
Designed and deployed a system to crawl millions of ecommerce product pages with distributed workers and failover handling.
Real-time CRM → SaaS Integration (ServiceNow)
Tech: Azure, ServiceNow API, Python, Data modeling
Built real-time integrations between CRM and external SaaS services.
Handled cleansing, transformation, schema design, and monitoring.
Power BI Dashboards for Hierarchical Data
Developed analytics dashboards from complex nested data structures, enabling drill-down visualization for internal stakeholders.
Transport Data ETL for IC-card Passenger Analytics
Processed bus route geographic data, mapped IC-card usage, and generated statistical insights for transport planning.
Skills
- AI / LLM: OCR+LLM pipelines, GPT-4o, embeddings, evaluation
- Data Engineering: ETL, pipelines, DWH, modeling
- Cloud: AWS, Azure, Databricks, Airflow
- Programming: Python, Java, SQL
- Tools: LangChain, LangGraph, Vertex AI Matching Engine, Power BI, Tableau
Contact
📧 nueki@manoriworks.com
🌐 AI-Powered Document Automation : https://manoriworks.com/en