Mid-level AI EngineerData Engineer

Job Location:

Philadelphia, PA - USA

Monthly Salary: Not Disclosed

Posted on: 30+ days ago

Vacancies: 1 Vacancy

Job Summary

About the Role RECRUITERS MUST RUN CHECKLISTS KEYWORDS UNDERLINED
We are building a platform that converts unstructured financial data ( emails corporate actions index announcements ) into high-quality structured datasets used by financial institutions.
This is not a typical LLM wrapper role.
You will work on systems that:

Extract data from noisy inconsistent sources
Validate and reconcile outputs across multiple inputs
Ensure correctness traceability and auditability

The challenge is not just applying LLMs-its making them reliable in production for financial workflows.
What Youll Work On

Designing pipelines that process high-volume financial documents (batch near real-time)
Building LLM-powered extraction workflows ( classification parsing summarization )
Implementing validation layers (rule-based model-based) to reduce hallucinations
Developing retrieval systems using embeddings and vector search
Architecting end-to-end systems: ingestion processing storage serving
Ensuring data quality observability and fault tolerance
Collaborating with product to turn messy data into usable financial intelligence

Core Requirements

Strong Python and backend/data engineering experience
Experience building production data pipelines (ETL streaming or async systems)
Solid understanding of distributed systems and failure modes
Experience working with LLM-based systems in production:
- Prompt design
- Output validation
- Retry/fallback strategies
- Evaluation and monitoring
Experience with data storage systems (SQL NoSQL)
Familiarity with cloud infrastructure (AWS or similar)

Preferred Experience

Experience with RAG / vector search systems
Background in financial data or capital markets
Experience with streaming systems (Kafka etc.)
Experience building multi-step or agent-style workflows

What Makes This Role Interesting

Work on high-accuracy AI systems where correctness matters
Solve real problems around:
- LLM reliability and hallucination mitigation
- Data consistency across conflicting sources
- Real-time vs correctness tradeoffs
Build systems used in financial decision-making workflows
High ownership over core architecture in an early-stage environment

Nice to Know (but not required)

Experience with orchestration tools ( Airflow etc.)
Exposure to evaluation frameworks for LLMs
Experience working with large-scale document processing

Tech Stack (Representative not exhaustive)

Python APIs async processing
LLM APIs embeddings
SQL / NoSQL databases
Cloud infrastructure (AWS)
Data pipelines and streaming systems
Vector Databases

* If they have 6-8 years of software development/engineering with AI and Data Engineering experience
* If they have worked in the investment management investment banking area processing FINANCIAL MARKET DATA pipelines RAG Vector databases
* If they are fluent with Python and API development and streaming systems like Kafka or similar
* Prefer people who have worked at BlackRock Fidelity Investments Vanugard State Street Global Advisors ETrade Charles Schwab etc.

at Vanguard Group an investment management company that deals with Mutual Funds Index Funds ETFs etc. So must come from this business domain or they wont understand what to do.

About the Role RECRUITERS MUST RUN CHECKLISTS KEYWORDS UNDERLINED We are building a platform that converts unstructured financial data ( emails corporate actions index announcements ) into high-quality structured datasets used by financial institutions. This is not a typical LLM wrapper role. You w...

Extract data from noisy inconsistent sources
Validate and reconcile outputs across multiple inputs
Ensure correctness traceability and auditability