Portfolio (Selected Projects)

AI, Data Engineering, and Document Automation

AI Document Extraction & Automation PoC

Technologies: LLM, OCR, AI-Agent, Python, JSON pipelines

Summary:

Built a full pipeline combining OCR (layout extraction), image+text hybrid processing, and LLM-based structured extraction.

Delivered multi-page PDF evaluation, scoring, and automated JSON generation for inspection and contract documents.

Highlights:

  • Page-level OCR + layout understanding
  • Multi-item JSON output per file
  • Accuracy evaluation (DeepEval-style)
  • Error analysis & iterative improvement
  • Custom Python ingestion + preprocessing

DynamoDB → Data Warehouse ETL Migration

Client: Marubeni Corporation

Tech: AWS (Lambda, Glue, S3), Python, SQL, Power BI

Designed and implemented a new DWH schema and ETL pipeline.

Transformed transactional data into a star schema for analytics.

Databricks + Airflow Large-Scale ETL

Client: OPTUS via Tech Mahindra

Tech: Databricks (PySpark), Airflow, Delta Lake

Implemented long-running transformation jobs, optimized performance, and resolved critical pipeline failures.

Scraping & Distributed Computing System

Tech: Java, Python, AWS Lambda, EC2, Redis, S3, SQS

Designed and deployed a system to crawl millions of ecommerce product pages with distributed workers and failover handling.

Real-time CRM → SaaS Integration (ServiceNow)

Tech: Azure, ServiceNow API, Python, Data modeling

Built real-time integrations between CRM and external SaaS services.

Handled cleansing, transformation, schema design, and monitoring.

Power BI Dashboards for Hierarchical Data

Developed analytics dashboards from complex nested data structures, enabling drill-down visualization for internal stakeholders.

Transport Data ETL for IC-card Passenger Analytics

Processed bus route geographic data, mapped IC-card usage, and generated statistical insights for transport planning.

Skills

  • AI / LLM: OCR+LLM pipelines, GPT-4o, embeddings, evaluation
  • Data Engineering: ETL, pipelines, DWH, modeling
  • Cloud: AWS, Azure, Databricks, Airflow
  • Programming: Python, Java, SQL
  • Tools: LangChain, LangGraph, Vertex AI Matching Engine, Power BI, Tableau

Contact

📧 nueki@manoriworks.com

🌐 AI-Powered Document Automation : https://manoriworks.com/en

🔗 LinkedIn