13.10.2025

AI Dataset Architect & Workflow Consultant

AI dataset architect / annotation workflow for a cyber-intelligence project

We’re seeking an experienced AI dataset architect / annotation workflow consultant to help design and prototype a scalable data and labeling pipeline for a cyber-intelligence dataset project.

Start

20.10.2025

Duration

2mo

Location

Remote

Allocation

50%

Scope:
  • Define schemas and data formats for supervised / reward / rationale datasets.
  • Build ingestion and normalization scripts for provided raw data.
  • Set up a lightweight labeling or enrichment interface (e.g. Label Studio, Streamlit).
  • Deliver documentation and simple QA tools for deduplication, sampling, and validation.

You’ll have:
  • Access to domain experts and developer support.
  • Clean data extracts (no data collection required).
  • Flexible, outcome-based work (remote within EU).

Ideal background:
  • Proven experience with LLM dataset design (SFT, RLHF, or analytical corpora).
  • Strong Python (pandas / pyarrow) and Hugging Face datasets skills.
  • Familiarity with labeling tools and dataset documentation best practices.
Deliverables: Schema pack, ingestion pipeline, labeling prototype, docs, and QA toolkit.

Read more about the details by signing in.

Please create an account even if you have submitted a contact form on our old site; thank you!

Or contact the agent.

Talent agent profile picture

Timo Heikkinen

Partner

+358 40 5894400

timo.heikkinen@rootsof.ai