How long should an AI engineer take-home actually take?

Most take-homes are scoped for a four-to-eight-hour window, and you should treat that window as a hard budget rather than a suggestion. The goal is not to build the most ambitious system you can imagine; it is to ship one complete, working slice that proves you can scope, build, evaluate and document. If the brief gives no time limit, set your own at around a weekend of focused work, then write up what you would do with more time. Reviewers respect a finished narrow project far more than an unfinished sprawling one.

Should I deploy my take-home project to a live URL?

If you can, yes — a live demo is one of the highest-signal things in the whole submission, because recruiters engage far more with runnable code and a clickable demo than with a static repository. A reviewer who can use your project in thirty seconds will give it far more credit than one who has to clone and configure it. If a full deployment is not realistic in the time budget, a thirty-second screen recording plus a one-command Docker run is a strong substitute. Never make the reviewer fight your environment.

What goes in the README for a take-home submission?

Structure it as Problem, Demo link, Stack, Approach and key decisions, Evaluation, Results, and How to run — readable in two minutes. The two sections that separate strong submissions are Approach and key decisions, where you explain what you chose and what you deliberately left out, and Evaluation, where you report a real number from a small golden set. A README that tells the story of your reasoning, not just your dependency list, is what converts the work into an interview.

Should I build a RAG app or a tool-using agent for a take-home?

For a tight four-to-eight-hour window, a retrieval-augmented generation question-answering app is usually the safer choice, because retrieval is the baseline AI engineering skill in 2026 and a RAG slice is easy to scope and to evaluate against known answers. A tool-using agent shows more orchestration range but is far easier to over-scope, and a half-working agent reads worse than a complete RAG app. Pick the agent only if the brief asks for it or if you can genuinely bound it to two or three tools.

How do I stand out on a take-home without prior AI job experience?

Show your work. A clean commit history, a small eval harness with a real score, an honest note on failure modes, and an explanation of when and why you used AI in the project will set you apart from candidates with more experience but a thinner submission. Recruiters increasingly reward a project that spans the whole workflow — from raw input to a deployed, evaluated system — over scattered tutorial clones. You do not need a job title to demonstrate production instincts; you need one finished, well-evidenced slice.

The AI Engineer Take-Home: How to Ship a Project That Gets You Hired

The take-home is the new interview

As of June 2026, if you are interviewing for an AI engineering role in Bengaluru or London, the moment that decides the outcome is rarely a live coding round and almost never a quiz on transformer internals. It is the take-home: a scoped brief — build a retrieval system, an evaluation harness, a small agent — that you take away, build over an evening or a weekend, and submit as a repository with a README. Technical interviews have become increasingly project-based, and for good reason. A take-home reveals the things a whiteboard cannot: how you scope ambiguity, how you make trade-offs under a time budget, how you evaluate your own work, and whether you can write down your reasoning so someone else can follow it.

This piece is the sibling to our guide on the AI engineer portfolio that gets you hired. That article is about what projects to build over months to prove you can ship. This one is narrower and more urgent: it is about the single assignment in front of you right now — the take-home you have been sent, or the spec interview you are about to face — and how to turn it from a stressful weekend into the thing that earns the offer. The two work together: your portfolio is the long game; the take-home is the close.

The reason the take-home rewards preparation is that most candidates approach it backwards. They treat it as a test of how much they can build, and so they over-scope, run out of time, and submit something half-finished with a one-line README. Reviewers see this pattern constantly. The candidates who stand out do the opposite: they scope ruthlessly, finish one complete slice, evaluate it with a real number, and document their decisions. As of June 2026, showing your work — even unfinished work, openly reasoned about — is one of the clearest ways to set yourself apart. The rest of this guide is the playbook for doing exactly that, from what a reviewer looks for in the first ninety seconds to how you turn the finished assignment into a profile the people hiring can actually find.

What reviewers actually look for in under 90 seconds

A take-home reviewer is usually a working engineer with a queue of submissions and limited time. As of June 2026, the evidence on how reviewers behave is consistent and a little brutal: recruiters spend under ten seconds on a résumé, scan an AI portfolio in roughly ninety seconds, and engage around eighty per cent more with a project that has runnable code or a live demo than with a static repository. Your submission is read fast and skimmed for a small number of high-signal artefacts. If they are present and obvious, you progress; if the reviewer has to dig for them, you often do not.

Three things dominate that ninety-second scan: a deployed or runnable demo, a quantified result, and a clear architecture sketch of how the pieces fit. The table below shows the weak version most submissions ship against the strong version that survives the scan.

What the reviewer scans for	Weak version	Strong version
Deployed / runnable demo	"Clone the repo, set six env vars, then run it." The reviewer never does.	A live URL at the top of the README, or a one-command Docker run plus a 30-second screen recording.
Quantified impact	"Works well in my testing." No numbers, no eval set.	"Correctness 0.86 on a 50-question golden set; p95 latency 1.9s." A measured result the reviewer can trust.
Architecture diagram	No diagram; the reviewer must read every file to understand the flow.	One simple sketch — boxes and arrows — showing input, retrieval, model, output, and where evaluation hooks in.
Decisions and trade-offs	Silent — no record of what was chosen or skipped.	A short "key decisions" section: what you used, what you deliberately left out, and why.

The underlying lesson is that the take-home is graded on legibility as much as on code. A reviewer who can see, within ninety seconds, that your project runs, that it has been measured, and that it was built with deliberate trade-offs will read your submission generously. One who cannot find those signals assumes the worst, because the queue is long and the safe default is to move on. Everything in the sections that follow — scope, README, evidence of work, deployment — is in service of making those three signals impossible to miss. For a fuller picture of what the people on the other side of the table weigh, our reporting on what hiring managers look for in AI engineers in 2026 is a useful companion.

Choosing the right project scope

Scope is where take-homes are won or lost, and it happens before you write a line of code. The single most common failure is over-scoping: trying to build an ambitious system in a four-to-eight-hour window, running out of time, and submitting something that works in one path and crashes in three others. A reviewer would far rather see a narrow project that is complete, evaluated and deployed than a broad one that is half-built. As of June 2026, the briefs you are most likely to receive cluster into three archetypes. The table below sets out what each one proves and the specific trap to avoid.

Take-home archetype	What it proves	The over-scoping trap to avoid
RAG question-answering app — answers questions from a small document set (PDFs, policies, a wiki)	Retrieval is the baseline AI skill in 2026; you can chunk, retrieve, ground an answer in sources, and refuse when the documents do not cover the question	Don't ingest a giant corpus or build a fancy re-ranker. Use a handful of documents, a simple vector store, and spend the saved time on evaluation and grounding.
Evaluation harness — scores a model or pipeline against a golden set with a clear metric	You think about quality the way production teams do: you can define a metric, build a reproducible test set, and report an honest number	Don't try to evaluate every dimension at once. Pick one metric (correctness or faithfulness), 30–50 examples, and make it reproducible with a single command.
Tool-using agent — completes a task by chaining two or three tool calls	Orchestration instinct: planning, tool selection, recovering from a failed step, and stopping within a budget	Don't give the agent ten tools and open-ended goals. Bound it to two or three tools and a narrow task, or it will sprawl and never reach a finished state.

If the brief lets you choose, the RAG question-answering app is usually the safest pick for a tight window, because retrieval is the most expected skill in 2026 and a RAG slice is the easiest of the three to scope and to evaluate against known answers. The evaluation harness is the quiet over-performer: it is unglamorous, but it directly demonstrates the production thinking reviewers most want and that most candidates most obviously lack. The tool-using agent shows the widest range but carries the highest over-scoping risk — choose it only if the brief asks for it or if you can genuinely bound it. For a sense of the engineering discipline a production agent demands before you commit to one, our walkthrough on how to build a production AI agent is worth a read.

Pro tip

Spend the first thirty minutes of your time budget writing a one-paragraph scope statement before you open an editor: what the system will do, what it will explicitly not do, and what your single evaluation metric will be. Paste that paragraph into the top of your README. It both keeps you honest while you build and shows the reviewer that you scoped deliberately rather than ran out of time — turning a constraint into a signal of judgement.

The README that wins: Problem → Approach → Impact

The README is the interface to your work, and on a take-home it carries more weight than on any portfolio piece, because it is read under time pressure by someone deciding whether to advance you. The structure that wins is the same storytelling arc that strong portfolios use — Problem → Approach → Impact — expanded into the sections a reviewer scans for. Lead with the problem and the demo link, make your key decisions explicit, and report a real number. A reviewer should be able to read it in two minutes and come away knowing what you built, that it works, and why you built it the way you did.

The annotated template below is the one to clone. The two sections that separate strong submissions from average ones are Approach & key decisions — where you show judgement by naming what you deliberately left out — and Evaluation, where you prove quality with a number rather than asserting it. (README prose stays in US English, as is conventional for code artefacts.)

# Project name — one-line description of what it does

## Problem
The real problem this solves and who it is for. One short paragraph.
Then your scope statement: what it does NOT do, and why (the time budget).

## Demo
Live URL: https://your-demo.example.com        # highest-signal line in the file
30-second screen recording: link               # fallback if no live URL

## Stack
Model(s), retrieval / vector store, framework, API layer, deployment target.
Keep it to one block — the reviewer wants the shape, not a dependency dump.

## Architecture
A simple sketch: input -> retrieve -> model -> answer, and where eval hooks in.
ASCII boxes are fine; the point is that the flow is legible at a glance.

## Approach & key decisions
What you chose and — crucially — what you deliberately skipped, and why.
e.g. "Used a flat vector store, not a re-ranker, to spend the time on evals."
This section is where you demonstrate engineering judgement.

## Evaluation
How quality is measured: the golden set, the metric, and how to reproduce it.
    python eval.py            # runs the golden set, prints the score

## Results
- Correctness on golden set: 0.86 (43/50)
- p95 latency: 1.9s
- Known failure mode: weak on multi-hop questions across two documents

## How to run
    docker build -t app .
    docker run -p 8000:8000 --env-file .env app
    # then open http://localhost:8000

Notice what the template forces you to do. It pushes the demo link to the top, where the ninety-second scan lands. It demands an explicit account of your trade-offs, which is exactly the judgement signal that distinguishes a builder from someone who follows a tutorial. And it requires a number, an honest failure mode, and a one-command run path. A take-home submitted with this README reads as the work of someone who has shipped before — even if it is, in fact, your first. That gap between perceived and actual seniority is precisely what a strong README buys you, and it is the cheapest edge available on the whole assignment.

Showing your work: commits, evals, a live demo

The phrase to internalise is show your work. As of June 2026, showing your reasoning — even where the result is unfinished — is one of the strongest ways to stand out, because it lets a reviewer see the engineer behind the artefact. Three forms of evidence do most of the heavy lifting: a clean commit history that tells the story of how you built it, a small evaluation harness that proves you measure quality, and a demo the reviewer can actually use. Run your submission against this checklist before you send it.

Commit history that reads as a narrative. Small, logically-ordered commits with clear messages — "add retrieval", "add eval harness", "handle empty query" — show how you decomposed the problem. A single "initial commit" dump hides all of that and reads as code pasted at the end.
A golden set and a reproducible eval. Thirty to fifty representative inputs with expected outputs, plus a single command that runs them and prints a score. The number matters less than the fact that the harness exists at all.
An honest failure-mode note. State plainly where the system is weak. Reviewers trust a candidate who names a limitation far more than one who claims none exist.
A demo the reviewer can use in 30 seconds. A live URL is best; a one-command Docker run plus a short screen recording is a strong fallback. Never make the reviewer fight your environment.
A note on when and why you used AI. Explaining where you applied a model, and where you deliberately did not, is now a crucial signal in its own right — it shows you reach for AI as a tool with judgement, not by reflex.

The evaluation harness is worth building even when the brief does not explicitly ask for one, because it is the single highest-signal artefact you can add for the least effort. It need not be sophisticated. The minimal version below loads a small golden set, runs your system over it, and prints a score — exactly the proof a reviewer is scanning for. (Code in US English.)

import json
from my_system import answer  # your pipeline: question -> answer

def load_golden(path="golden.json"):
    # golden.json: [{"question": "...", "expected": "..."}, ...]
    with open(path) as f:
        return json.load(f)

def is_correct(got: str, expected: str) -> bool:
    # swap in a stricter check (exact match, embedding similarity, LLM judge)
    return expected.strip().lower() in got.strip().lower()

def main():
    golden = load_golden()
    passed = 0
    for i, case in enumerate(golden, 1):
        got = answer(case["question"])
        ok = is_correct(got, case["expected"])
        passed += int(ok)
        print(f"[{i:>2}] {'PASS' if ok else 'FAIL'}  {case['question'][:48]}")
    score = passed / len(golden)
    print(f"\nScore: {passed}/{len(golden)} = {score:.2f}")

if __name__ == "__main__":
    main()

That is fewer than thirty lines, yet it transforms your submission from a claim into a proof. It prints a per-case pass or fail so a reviewer can see what works, and a final score they can quote in their notes. Commit it as its own step, reference it in the README's Evaluation section, and you have shown the production instinct that most candidates only assert. For the question patterns that the same reviewers tend to probe in the follow-up conversation, our breakdown of the AI engineer interview and its five question clusters maps closely onto what a take-home is testing.

Watch out

The most expensive take-home mistake is the silent one: submitting a repository you have never run from a clean checkout. It works on your machine because of an env var you set weeks ago or a file that never got committed. The reviewer clones it, it fails on the first command, and your submission is dead before they read a word of your code. Before you send anything, clone your own repo into a fresh directory, follow your own README exactly as a stranger would, and confirm it runs. If the demo is not live, attach the screen recording. Make running your work effortless, because a project the reviewer cannot start is a project they cannot reward.

Indian and UK hiring context: what differs

The take-home is near-universal across the AI engineering market, but the context around it differs between India and the UK in ways worth knowing. In India — Bengaluru, Hyderabad, Pune, the NCR — take-homes are common at both fast-moving startups and the global capability centres of multinationals, and they often sit alongside a structured live round. Bengaluru startups in particular lean on the take-home as a fast filter, and they tend to reward candidates who ship a complete, deployed slice over those who optimise for theoretical depth. In the UK — London scale-ups, Cambridge research-adjacent firms, the emerging regional hubs — take-homes are equally standard, with a slightly stronger cultural expectation that the assignment be bounded and that employers respect your time, which makes ruthless scoping read as professionalism rather than as cutting corners.

The compensation backdrop differs too, and it shapes how much leverage a strong submission gives you. Pay bands, seniority expectations and the premium on demonstrable production experience vary markedly between the two markets and across cities within each. Rather than quote figures that age quickly, we keep the detail in one place: our guide to AI engineer pay in 2026 lays out the benchmarks for both India and the UK. The practical point for the take-home is that a submission which clearly demonstrates you can take a system from raw input to a deployed, evaluated result is exactly the evidence that moves you up a band — in either market, the gap between a learner and a builder is what the offer is priced on.

One shared truth across both markets: the take-home is increasingly the great equaliser. A candidate without a brand-name employer on their CV can outshine one who has it, simply by submitting a tighter, better-evidenced project. If you are making the move into the field, our six-month roadmap from software engineer to AI engineer sets out how to build the underlying skills the take-home then lets you prove.

Turn the take-home into a profile

Here is the part most candidates miss. The take-home you just built does not have to be a single-use artefact you send to one employer and forget. It is a finished, deployed, evaluated project — precisely the kind of proof-of-work that, gathered onto a profile, gets you found by the next employer without you applying at all. The work is done; the only question is whether you let it keep paying off.

A Verified Builder profile on AI Tech Connect is the single link that gathers your take-home, your portfolio and your story in the place where the people hiring across India and the UK already look. And there is a reason to claim it now rather than later: AI Tech Connect awards a permanent Founding Builder badge to the earliest verified profiles, and those founding spots are limited by design. The badge is a scarce, credible marker that you were here first — exactly the kind of signal that stands out to a reviewer scrolling a list. Once the founding cohort is full, it is full. If your take-home is ready, claiming a profile now is the difference between a Founding badge and a standard one. It takes about two minutes, costs nothing, and asks for no CV and no password.

Founding Builder spots are limited — and permanent. Claim yours before the cohort fills.

The earliest verified profiles on AI Tech Connect keep a permanent Founding Builder badge — a scarce signal the people hiring across India and the UK notice. Two minutes, no CV, no password.

Claim your Founding Builder profile →