From Research Training to AI-Native Work
How I use AI in real data and modeling work at OpenAI
Jackie (Jiā Qí) Yin, UCSF, 03/31/2026
Research training taught me how to reason under uncertainty.
AI changed the speed and structure of that reasoning, but not the need for judgment.
Keep the opening centered on the throughline, not on biography. The audience should feel from the
start that this talk is relevant to how they already think and work.
Why This Talk / Who I Am
01
A path shaped by uncertainty, systems, and judgment
The story is less about titles and more about the kinds of problems I learned to solve.
2020
UW Biostatistics
PhD training in inference, uncertainty, and what makes a method trustworthy.
2020-2025
Microsoft
Applied science in forecasting and recommendation systems at product scale.
2025-2026
Copilot Notebooks
Context engineering, AI agent development, and LLM-as-judge evaluation.
Jan 2026
OpenAI
ML Scientist in Strategic Finance: B2B revenue forecasting and decision support.
Research training -> product systems -> AI-native workflows -> finance-facing modeling
Keep this directional rather than chronological. The important thread is that you came from a world
that cared about inference, then moved into systems and product work, and now work on decision
support under ambiguity.
What My Work Looks Like Today
02
The job is not just model building
A large part of the work is turning ambiguous business questions into quantitative ones.
Operating context
High-context decisions, evolving assumptions, established planning workflows
- Revenue forecasting for a fast-growing B2B business
- Understanding pipeline risk, timing, and uncertainty
- Supporting finance and planning decisions
My role
Translate intuition into structure
- Bridge domain intuition and mathematical structure
- Turn existing planning logic into reusable modeling workflows
- Support decisions, not just produce numbers
- Make the work more explainable, scalable, and reusable
Finance partners often have strong intuition about both downside risk and revenue potential. My job is to preserve that signal while turning it into a reusable modeling framework.
Emphasize that the core challenge is not replacing existing workflows. It is taking strong domain
intuition, scenario-based planning, and caveats, then turning them into a structure that can support
real planning decisions at scale.
Research Thinking vs. Industry Data and Modeling Work
03
The reasoning carries over. The objective function changes.
Reason under incomplete information
Separate signal from noise
Break big problems into parts
Iterate instead of expecting one-shot answers
Research
Go deeper on one problem
- Novelty and methodological contribution matter
- Technical elegance is often rewarded
- A result can stay specialized and still be valuable
- The audience is often willing to learn the method deeply
Industry data and modeling work
Solve the decision problem under constraints
- Speed, alignment, and implementation matter
- Methods need to be understandable enough to use
- Complexity has a maintenance and adoption cost
- Simple and useful can beat sophisticated and fragile
This is the bridge slide. It tells the audience their research habits still matter, but the output
of the work is different. You can also mention that AI makes it faster to adapt useful academic ideas.
Where AI Fits in My Workflow
04
Where AI Fits
Different task types need different levels of supervision, reuse, and judgment.
Ad hoc analysis
One-off questions
AI best at
Fast querying, plots, and first-pass code.
My role
Evaluate output and catch bad assumptions.
Operational workflow
Repeatable work
AI best at
Turning one-off workflows into reusable pipelines.
My role
Define structure and monitoring.
Novel modeling work
New modeling questions
AI best at
Exploring formulations and coding baselines fast.
My role
Choose the framing and decide what matters.
Cross-functional communication
Stakeholder translation
AI best at
Reframing technical findings for finance or business.
My role
Preserve nuance and avoid overclaiming.
This slide should feel practical. The important point is that “using AI” is not one thing. The
human supervisory structure changes depending on the kind of work.
Why I Treat AI Like Junior Scientists
05
AI as Junior Scientists
Can we predict whether a patient will respond to treatment?
Pre-AI: serial exploration
One person, one path at a time
Typical first pass: about 1 week
1. Code + run logistic regression
↓
2. Inspect, revise, then try trees
↓
3. If needed, test a neural baseline
AI-native: parallel first pass
Parallel method search
Typical first pass: about 30 minutes
Agent 1
Logistic regression track
Agent 2
Tree-based track
Agent 3
Neural baseline track
~1 week serial -> ~30 min parallel first pass
The junior scientist metaphor is memorable because it is about management and judgment, not only capability.
AI is useful because it parallelizes early exploration while you keep the final judgment.
What I Need to Trust an AI Result
I do not treat confident language as correctness.
1. Data sanity
Check summary stats, historical patterns, and impossible values.
2. Model agreement
Reasonable models can differ. Big disagreement needs explanation.
3. Metric definition
Ask how the metric is defined, coded, and why it fits the goal.
4. Independent restatement
Use a fresh agent to restate the core logic in words or math.
This slide is about trust standards rather than bug hunting. The point is that a research-trained
workflow does not trust fluent output by default; it earns trust by checking the highest-pressure points.
A Research Workflow Example
07
From Research Question to AI Execution Notes
Can we predict whether a patient will respond to treatment?
Step 1
Formalize the question
- What counts as response?
- When is prediction made?
- What inputs exist at that time?
Step 2
Write the quant setup
- Binary outcome
- Baseline features
- Train/validation split
- Primary metric
Step 3
Make execution notes
- Question and target definition
- Features and split
- Baselines to run
- Failure modes to watch
Step 4
Why this helps
- Think first, delegate second
- AI gets an execution spec
- Less ambiguity during implementation
Do the problem formulation yourself first. Then give AI an execution spec, not a vague idea.
This slide is about how you would actually start the project. The key move is to think through the
quantitative setup yourself, then write it down in a markdown file that AI can execute against.
A Research Workflow Example
08
After AI Runs: Review, Validate, and Explore
Once AI executes, the work becomes comparison, validation, and judgment.
Run set
What I ask AI to run
- Regularized logistic regression
- Tree-based model
- Neural baseline
Check 1
What I check first
- Target defined correctly
- Split is clean
- No leakage
- Metric matches the goal
Check 2
How I compare results
- Are differences reasonable?
- Stable across slices?
- Signals scientifically plausible?
- Claims stronger than evidence?
Explore
How I explore the method space
- If simple models are close, keep them
- If tree-based wins, inspect interactions
- If nothing is stable, revisit the setup
The bottleneck is not running more models. It is knowing when to trust the setup, when to simplify, and when to reformulate the question.
This slide is about what happens after AI executes. The student-facing lesson is that model generation
is cheap; comparison, validation, and deciding when to go back to the problem setup are the real work.
Takeaways for Researchers
09
Research Training Becomes More Valuable
AI changes the shape of the work, but it increases the return on good judgment.
What to do
Start with the most painful bottlenecks
- Use AI first for acceleration, not blind delegation
- Separate exploration, implementation, and validation
- Treat AI like a collaborator, not an authority
- Keep final scientific or modeling judgment with the human
What changed for me
Judgment becomes more valuable, not less
- Problem formulation matters more
- Validation matters more
- Communication matters more
- Strong research habits become more important, not less
AI did not remove the need for research thinking. It made that thinking more leveraged, more parallel, and more visible in daily work.
End this slide as the main message slide. The point is not that AI replaces researchers. The point is that strong researchers are often better positioned to use AI well.
Questions Worth Leaving With
Closing thought
AI changes how we search, build, and explain.
The strongest shift for me was not that AI writes code. It is that AI changes how quickly I can
explore a problem, compare options, and turn reasoning into something other people can use.
Human value concentrates in decomposition, validation, prioritization, and taste.
My curiosity
What I'd love to learn from you
What kinds of data do you work with, and where does the workflow get painful?
Where do you think AI could help in your research, and where do you still not trust it?
What would make AI genuinely useful in your work, not just convenient?
Use this as the conversation slide. By this point the core content is done, so you can end on the closing
thought and then open the room with prompts that map back to their own lab workflows.
Refreshing My Personal Website
A simple before -> prompt -> after example of how I use Codex on real design and implementation work.
Before
Functional, but visually flat
Prompt
One natural-language request
After
Clearer hierarchy, stronger identity
Keep this page concrete and personal. The point is not that the website itself matters; the point is that
Codex can help move a vague design change from prompt to implementation quickly, with real before/after output.
Turning Workflows Into Codex Skills
If a workflow repeats, even on different data, it can often become a reusable skill.
What is a Codex skill?
Reusable workflow packaging
A skill bundles instructions, structure, and optional scripts into something Codex can reuse.
Typical pieces
`SKILL.md`, scripts, references, assets.
Why I make them
Repeatability is the value
The inputs change, but the methodology, checks, and output shape often stay similar.
So what?
Turn one successful run into reusable infrastructure.
How I build one
Start from one concrete run
Do one full run first. Keep the code, notes, and artifacts. Then ask Codex to turn it into a generic reusable skill for that workflow.
Examples in my repo
Current examples
- `tabular-data-explorer`: explainable dataset exploration
- `arxiv-latest-summary`: readable paper digests
- `experiment-results-notebook`: structured research notebooks
github.com/jackie-jiaqi-yin/codex-skills
Explain that the value is not only personal speed. Once a repeated workflow is packaged as a skill,
other people can install it, reuse it, and start from a much better baseline.