UCSF Talk

From Research Training to AI-Native Work

How I use AI in real data and modeling work at OpenAI

Jackie (Jiā Qí) Yin, UCSF, 03/31/2026

Research training taught me how to reason under uncertainty. AI changed the speed and structure of that reasoning, but not the need for judgment.
Keep the opening centered on the throughline, not on biography. The audience should feel from the start that this talk is relevant to how they already think and work.

Why This Talk / Who I Am

01

A path shaped by uncertainty, systems, and judgment

The story is less about titles and more about the kinds of problems I learned to solve.

2020

UW Biostatistics

PhD training in inference, uncertainty, and what makes a method trustworthy.

2020-2025

Microsoft

Applied science in forecasting and recommendation systems at product scale.

2025-2026

Copilot Notebooks

Context engineering, AI agent development, and LLM-as-judge evaluation.

Jan 2026

OpenAI

ML Scientist in Strategic Finance: B2B revenue forecasting and decision support.

Research training -> product systems -> AI-native workflows -> finance-facing modeling
Keep this directional rather than chronological. The important thread is that you came from a world that cared about inference, then moved into systems and product work, and now work on decision support under ambiguity.

What My Work Looks Like Today

02

The job is not just model building

A large part of the work is turning ambiguous business questions into quantitative ones.

Operating context

High-context decisions, evolving assumptions, established planning workflows

  • Revenue forecasting for a fast-growing B2B business
  • Understanding pipeline risk, timing, and uncertainty
  • Supporting finance and planning decisions

My role

Translate intuition into structure

  • Bridge domain intuition and mathematical structure
  • Turn existing planning logic into reusable modeling workflows
  • Support decisions, not just produce numbers
  • Make the work more explainable, scalable, and reusable
Finance partners often have strong intuition about both downside risk and revenue potential. My job is to preserve that signal while turning it into a reusable modeling framework.
Emphasize that the core challenge is not replacing existing workflows. It is taking strong domain intuition, scenario-based planning, and caveats, then turning them into a structure that can support real planning decisions at scale.

Research Thinking vs. Industry Data and Modeling Work

03

The reasoning carries over. The objective function changes.

Reason under incomplete information Separate signal from noise Break big problems into parts Iterate instead of expecting one-shot answers

Research

Go deeper on one problem

  • Novelty and methodological contribution matter
  • Technical elegance is often rewarded
  • A result can stay specialized and still be valuable
  • The audience is often willing to learn the method deeply

Industry data and modeling work

Solve the decision problem under constraints

  • Speed, alignment, and implementation matter
  • Methods need to be understandable enough to use
  • Complexity has a maintenance and adoption cost
  • Simple and useful can beat sophisticated and fragile
This is the bridge slide. It tells the audience their research habits still matter, but the output of the work is different. You can also mention that AI makes it faster to adapt useful academic ideas.

Where AI Fits in My Workflow

04

Where AI Fits

Different task types need different levels of supervision, reuse, and judgment.

Ad hoc analysis

One-off questions

AI best at

Fast querying, plots, and first-pass code.

My role

Evaluate output and catch bad assumptions.

Operational workflow

Repeatable work

AI best at

Turning one-off workflows into reusable pipelines.

My role

Define structure and monitoring.

Novel modeling work

New modeling questions

AI best at

Exploring formulations and coding baselines fast.

My role

Choose the framing and decide what matters.

Cross-functional communication

Stakeholder translation

AI best at

Reframing technical findings for finance or business.

My role

Preserve nuance and avoid overclaiming.

This slide should feel practical. The important point is that “using AI” is not one thing. The human supervisory structure changes depending on the kind of work.

Why I Treat AI Like Junior Scientists

05

AI as Junior Scientists

Can we predict whether a patient will respond to treatment?

Pre-AI: serial exploration

One person, one path at a time

Typical first pass: about 1 week

1. Code + run logistic regression
2. Inspect, revise, then try trees
3. If needed, test a neural baseline

AI-native: parallel first pass

Parallel method search

Typical first pass: about 30 minutes

Agent 1

Logistic regression track

Agent 2

Tree-based track

Agent 3

Neural baseline track

~1 week serial -> ~30 min parallel first pass

The junior scientist metaphor is memorable because it is about management and judgment, not only capability. AI is useful because it parallelizes early exploration while you keep the final judgment.

Trusting AI Output

06

What I Need to Trust an AI Result

I do not treat confident language as correctness.

1. Data sanity

Check summary stats, historical patterns, and impossible values.

2. Model agreement

Reasonable models can differ. Big disagreement needs explanation.

3. Metric definition

Ask how the metric is defined, coded, and why it fits the goal.

4. Independent restatement

Use a fresh agent to restate the core logic in words or math.

This slide is about trust standards rather than bug hunting. The point is that a research-trained workflow does not trust fluent output by default; it earns trust by checking the highest-pressure points.

A Research Workflow Example

07

From Research Question to AI Execution Notes

Can we predict whether a patient will respond to treatment?

Step 1

Formalize the question

  • What counts as response?
  • When is prediction made?
  • What inputs exist at that time?
Step 2

Write the quant setup

  • Binary outcome
  • Baseline features
  • Train/validation split
  • Primary metric
Step 3

Make execution notes

  • Question and target definition
  • Features and split
  • Baselines to run
  • Failure modes to watch
Step 4

Why this helps

  • Think first, delegate second
  • AI gets an execution spec
  • Less ambiguity during implementation
Do the problem formulation yourself first. Then give AI an execution spec, not a vague idea.
This slide is about how you would actually start the project. The key move is to think through the quantitative setup yourself, then write it down in a markdown file that AI can execute against.

A Research Workflow Example

08

After AI Runs: Review, Validate, and Explore

Once AI executes, the work becomes comparison, validation, and judgment.

Run set

What I ask AI to run

  • Regularized logistic regression
  • Tree-based model
  • Neural baseline
Check 1

What I check first

  • Target defined correctly
  • Split is clean
  • No leakage
  • Metric matches the goal
Check 2

How I compare results

  • Are differences reasonable?
  • Stable across slices?
  • Signals scientifically plausible?
  • Claims stronger than evidence?
Explore

How I explore the method space

  • If simple models are close, keep them
  • If tree-based wins, inspect interactions
  • If nothing is stable, revisit the setup
The bottleneck is not running more models. It is knowing when to trust the setup, when to simplify, and when to reformulate the question.
This slide is about what happens after AI executes. The student-facing lesson is that model generation is cheap; comparison, validation, and deciding when to go back to the problem setup are the real work.

Takeaways for Researchers

09

Research Training Becomes More Valuable

AI changes the shape of the work, but it increases the return on good judgment.

What to do

Start with the most painful bottlenecks

  • Use AI first for acceleration, not blind delegation
  • Separate exploration, implementation, and validation
  • Treat AI like a collaborator, not an authority
  • Keep final scientific or modeling judgment with the human

What changed for me

Judgment becomes more valuable, not less

  • Problem formulation matters more
  • Validation matters more
  • Communication matters more
  • Strong research habits become more important, not less
AI did not remove the need for research thinking. It made that thinking more leveraged, more parallel, and more visible in daily work.
End this slide as the main message slide. The point is not that AI replaces researchers. The point is that strong researchers are often better positioned to use AI well.

Closing / Discussion

10

Questions Worth Leaving With

Closing thought

AI changes how we search, build, and explain.

The strongest shift for me was not that AI writes code. It is that AI changes how quickly I can explore a problem, compare options, and turn reasoning into something other people can use.

Human value concentrates in decomposition, validation, prioritization, and taste.

My curiosity

What I'd love to learn from you

What kinds of data do you work with, and where does the workflow get painful?
Where do you think AI could help in your research, and where do you still not trust it?
What would make AI genuinely useful in your work, not just convenient?
Use this as the conversation slide. By this point the core content is done, so you can end on the closing thought and then open the room with prompts that map back to their own lab workflows.

A Real Codex Use Case

11

Refreshing My Personal Website

A simple before -> prompt -> after example of how I use Codex on real design and implementation work.

Before

Functional, but visually flat

Before redesign screenshot of personal website

Prompt

One natural-language request

Prompt screenshot used for Codex website redesign request

After

Clearer hierarchy, stronger identity

After redesign screenshot of personal website
Keep this page concrete and personal. The point is not that the website itself matters; the point is that Codex can help move a vague design change from prompt to implementation quickly, with real before/after output.

Another Real Use Case

12

Turning Workflows Into Codex Skills

If a workflow repeats, even on different data, it can often become a reusable skill.

What is a Codex skill?

Reusable workflow packaging

A skill bundles instructions, structure, and optional scripts into something Codex can reuse.

Typical pieces

`SKILL.md`, scripts, references, assets.

Why I make them

Repeatability is the value

The inputs change, but the methodology, checks, and output shape often stay similar.

So what?

Turn one successful run into reusable infrastructure.

How I build one

Start from one concrete run

Do one full run first. Keep the code, notes, and artifacts. Then ask Codex to turn it into a generic reusable skill for that workflow.

Examples in my repo

Current examples

  • `tabular-data-explorer`: explainable dataset exploration
  • `arxiv-latest-summary`: readable paper digests
  • `experiment-results-notebook`: structured research notebooks

github.com/jackie-jiaqi-yin/codex-skills

Explain that the value is not only personal speed. Once a repeated workflow is packaged as a skill, other people can install it, reuse it, and start from a much better baseline.