AI Dataset Architect & Workflow Consultant

AI dataset architect / annotation workflow for a cyber-intelligence project

We’re seeking an experienced AI dataset architect / annotation workflow consultant to help design and prototype a scalable data and labeling pipeline for a cyber-intelligence dataset project.

Start

20.10.2025

Duration

2mo

Location

Remote

Allocation

50%

Scope:

Define schemas and data formats for supervised / reward / rationale datasets.
Build ingestion and normalization scripts for provided raw data.
Set up a lightweight labeling or enrichment interface (e.g. Label Studio, Streamlit).
Deliver documentation and simple QA tools for deduplication, sampling, and validation.

You’ll have:

Access to domain experts and developer support.
Clean data extracts (no data collection required).
Flexible, outcome-based work (remote within EU).

Ideal background:

Proven experience with LLM dataset design (SFT, RLHF, or analytical corpora).
Strong Python (pandas / pyarrow) and Hugging Face datasets skills.
Familiarity with labeling tools and dataset documentation best practices.

Deliverables: Schema pack, ingestion pipeline, labeling prototype, docs, and QA toolkit.

Lue lisää projektin yksityiskohdista kirjautuneena.

Luothan tunnukset vaikka olisit täyttänyt yhteystietolomakkeen vanhoilla sivuillamme, kiitos!

Tai ota yhteyttä agenttiin.

Timo Heikkinen

Partner

+358 40 5894400

timo.heikkinen@rootsof.ai

Toimeksiannot

Ota yhteyttä

Töölönlahdenkatu 3B00100 Helsinkiinfo@rootsof.ai

Navigoi