A selection of recent work across data engineering, AI-powered document automation, and cloud-based ETL development.

1. Inspection Report Automation (OCR + Structured Data Extraction)

Industry: Property / Facility Management

Documents: Building inspection reports, routine inspections, defect reports

Technologies: Azure Document Intelligence, OCR, Layout analysis, Python, Azure OpenAI

Overview

Developed an end-to-end workflow to convert inspection report PDFs into structured JSON/Excel outputs.

The workflow performs OCR, layout parsing, table extraction, field normalisation, defect identification, and summary generation.

Outcome / Impact

Reduced manual review time from 20–40 minutes to 1–2 minutes per report
Achieved 83–92% extraction accuracy after tuning against client formats
Enabled automatic defect summaries and action lists
Integrated 10,000+ historical reports into a searchable knowledge base

2. Contract & Policy Document Extraction / Classification

Industry: Corporate / Legal / Real Estate

Documents: Tenancy agreements, supplier contracts, internal policies, compliance documents

Technologies: OCR, Azure OpenAI embeddings, Vector Search, Python

Overview

Built a document ingestion pipeline that extracts key clauses, dates, obligations, risks, and renewal conditions from contract PDFs.

Classification and metadata tagging were added for large document archives.

Outcome / Impact

Automated extraction of 20–40 key fields per contract
Implemented standard JSON schema for downstream systems
Achieved consistent clause identification across multiple contract formats
Enabled instant retrieval of similar clauses using semantic search

3. Routine Property Inspection Summary Generator

Industry: Real Estate Property Management

Documents: Routine inspection reports (property condition, photos, notes)

Technologies: OCR, Python, Image–Text validation, Summarisation models

Overview

Developed a system that combines text from inspectors’ notes with room-by-room photos.

The workflow detects inconsistencies, extracts condition ratings, and generates a structured summary.

Outcome / Impact

Standardised reporting quality across inspectors
Automated creation of summary sheets for owners/landlords
Reduced manual summarisation time by 70–90%
Improved consistency in property condition evaluations

4. ETL Pipeline Development for National Telecom (OPTUS)

Industry: Telecommunications

Technologies: Databricks, PySpark, Airflow, SQL, Delta Lake, AWS

Overview

Participated in the development of large-scale ETL pipelines for nationwide operational data.

Responsibilities included pipeline design, job optimisation, data quality monitoring, and Airflow orchestration.

Outcome / Impact

Improved job reliability and reduced processing time
Implemented robust data quality checks
Handled multi-terabyte distributed processing
Ensured production-grade deployment and documentation

5. Supplier Data Warehouse for a Trading & Manufacturing Group (Marubeni)

Industry: Trading / Manufacturing

Technologies: AWS (Lambda, DynamoDB, S3, Glue), Python, SQL

Overview

Designed and implemented an ETL pipeline migrating supplier data from DynamoDB into a structured warehouse schema.

Developed Python modules for data transformation and schema alignment.

Outcome / Impact

Automated daily ingestion and transformation
Designed star schema for analytics
Enabled downstream Power BI dashboards
Improved data consistency across business units

6. BI Integration for ServiceNow and External SaaS

Industry: IT Service / CRM

Technologies: ServiceNow, API Integration, Azure SQL, Data Factory, Python

Overview

Built real-time integrations between ServiceNow and external SaaS platforms.

Performed data cleansing, transformation, and modelling for BI reporting.

Outcome / Impact

Reduced manual reconciliation work
Delivered real-time operational dashboards
Improved reliability of cross-system data

7. Large-Scale Web Scraping & Distributed Processing (DataSection)

Industry: E-commerce / Market Intelligence

Technologies: Python, Java, AWS EC2, Redis, S3, SQS, Distributed crawling

Overview

Designed and built distributed web scraping architecture collecting data for millions of products.

Implemented proxy pools, retry logic, failover mechanisms, and automated pipelines.

Outcome / Impact

Scaled to millions of records per day
Reduced scraping failure rate with smart retry logic
Achieved stable and cost-efficient distributed crawling

8. ML-Based Document Search (Azure + LangChain)

Industry: Professional Services

Technologies: Azure OpenAI, Azure Search, Python, LangChain

Overview

Developed a hybrid search system that combines embeddings-based retrieval, keyword fallback, and metadata filtering.

Used for document discovery across multiple departments.

Outcome / Impact

Improved search precision compared to keyword-only systems
Reduced time spent locating relevant documents
Supported bilingual content (English/Japanese)

Core Technical Capabilities

OCR / layout analysis
Document classification and extraction
LLM-based summarisation and validation
Databricks / PySpark ETL
Airflow / orchestration
AWS / Azure cloud pipelines
API integrations
Data modelling and warehouse design

Contact

For detailed case studies or sample outputs, please reach out:

📧 nueki@manoriworks.com

🌐 AI-Powered Document Automation : https://manoriworks.com/en

🔗 LinkedIn

Portfolio

1. Inspection Report Automation (OCR + Structured Data Extraction)

Overview

Outcome / Impact

2. Contract & Policy Document Extraction / Classification

Overview

Outcome / Impact

3. Routine Property Inspection Summary Generator

Overview

Outcome / Impact

4. ETL Pipeline Development for National Telecom (OPTUS)

Overview

Outcome / Impact

5. Supplier Data Warehouse for a Trading & Manufacturing Group (Marubeni)

Overview

Outcome / Impact

6. BI Integration for ServiceNow and External SaaS

Overview

Outcome / Impact

7. Large-Scale Web Scraping & Distributed Processing (DataSection)

Overview

Outcome / Impact

8. ML-Based Document Search (Azure + LangChain)

Overview

Outcome / Impact

Core Technical Capabilities

Contact