Annotations (12)
“One hypothesis is that quite early on, dyslexic people have to learn how to delegate. That's a skill that when people are not forced to learn, often very competent people don't become good at it until much later. But the dyslexic person is good at it right away, asking people to help read something for them. I found that there are certain things that some people are phenomenal at and others are horrible at.”— Brendan Foody
Leadership & Management · Psychology & Behavior · Business & Entrepreneurship
DUR_ENDURING
Dyslexia forces early delegation: constraint becomes advantage
“When one of the AI labs wants to teach their models how to be better at poetry, we'll find some of the best poets in the world that can help to measure success via creating evals and examples of how the model should behave. When we have these phenomenal poets that teach the models how to do things once, they're then able to apply those skills and that knowledge across billions of users, hence allowing us to pay $150 an hour for some of the best poets in the world.”— Brendan Foody
Economics & Markets · Business & Entrepreneurship · Technology & Engineering
DUR_ENDURING
Expert teaches once, billions benefit: fixed cost knowledge
“There's a lot of areas in law where the right way of approaching something is not written down or codified. It exists more in the heads of experts, at least not explicitly. It's those domains where there's a lot of taste that isn't well documented that the models will struggle immensely with because they either need those tokens in the pre-training data of doing these web-scale training runs, or they need it in the post-training data of having a legal expert from us to create those datasets.”— Brendan Foody
Technology & Engineering · Psychology & Behavior · Philosophy & Reasoning
DUR_ENDURING
Tacit knowledge: model needs explicit tokens or expert creation
“I would bike down to Safeway and buy donuts for $5 a dozen, go to my middle school, sell them for $2 each. Eventually my middle school called me into the principal's office to shut me down. Then I moved my donut stand about 50 feet over off of school campus so they couldn't police me. I paid my mom $20 a week to drive me in her minivan. I'd pay my friends in donuts because I perceived the cost of the donuts as my cost basis versus they perceived it as $2 each.”— Brendan Foody
Business & Entrepreneurship · Strategy & Decision Making · Economics & Markets
DUR_ENDURING
8th grade: arbitrage, regulatory evasion, wage arbitrage, predatory pricing
“Within each industry, we start out with surveys of hundreds of experts. Within consulting, we get experts that were previously at McKinsey, Bain, BCG, and other top consulting firms. Then we survey how do they spend their time, what percentage of their time is in customer meetings, is in online research, is in analysis, preparing deliverables for customers.”— Brendan Foody
Operations & Execution · Economics & Markets · Business & Entrepreneurship
DUR_ENDURING
Survey time allocation, convert to prompts: economic proxy
“One of the largest problems that people make is that they don't measure the actual skills and capabilities that they want someone to exhibit on the job. Instead of focusing on how do we measure how well this person does their investment analysis of the data room, they have this vibe space conversation of where did the person grow up, how similar are they, do they think they would enjoy hanging out together.”— Brendan Foody
Leadership & Management · Psychology & Behavior · Operations & Execution
DUR_ENDURING
Hiring error: cultural fit vibe vs skill measurement
“Give them a project. And grade them, in essence. I think that's the cleanest way to do it. Instead of the investment bankers doing the analysis, they'll build RL environments and train agents, and it'll be the same across consulting and software engineers and customer support and pretty much every knowledge work vertical.”— Brendan Foody
Leadership & Management · Operations & Execution · Business & Entrepreneurship
DUR_ENDURING
Hiring: give project, grade output vs interview vibes
“One of the largest inefficiencies in labor markets is that everything is disaggregated. When one of our friends is applying to a job, they would apply to a couple dozen jobs. When companies considering who to hire, they'll consider a fraction of a percent of people in the economy. It feels like there needs to be a structural change where there is an aggregator that everyone applies to and every company hires from, facilitating this perfect flow of information.”— Brendan Foody
Economics & Markets · Business & Entrepreneurship · Technology & Engineering
DUR_ENDURING
Matching problem blocks aggregation: distribution without prediction
“You want some degree of consensus of different exceptional people believing that they're each doing a good job, but you probably don't want too much consensus. Because you also want to get all of these edge case scenarios of what are the models doing that might deviate a little bit from what the norm is.”— Brendan Foody
Operations & Execution · Psychology & Behavior · Strategy & Decision Making
DUR_ENDURING
Need expert disagreement for edge cases, consensus for baseline
“One trend that we've found in talent assessment: it's more difficult to measure someone's slope versus how they'll develop on the job over a 6-month time horizon than it is to measure their y-intercept.”— Brendan Foody
Leadership & Management · Psychology & Behavior
DUR_ENDURING
Hiring: easier to measure current skill than learning rate
“The largest disconnect that we were seeing in AI research is that everyone was focused on academic evals like GPQA for PhD-level reasoning or IMO for Olympiad math, which were wholly disconnected from the outcomes that customers actually care about of how do we get the model to automate a medical diagnosis or a legal draft or preparing a certain financial analysis of a company.”— Brendan Foody
Strategy & Decision Making · Business & Entrepreneurship · Technology & Engineering
DUR_CONTEXTUAL
Academic benchmarks miss economic value: GPQA vs real work
“There are two key things the models struggle at that humans tend to be very good at. The first is these longer horizon tasks of not just something that we could do in a few hours, but something that might take us 50 or 100 hours to do. And then the second thing is integrating multiple tools with our response and going about doing these things, maybe interacting with people as one of those elements.”— Brendan Foody
Technology & Engineering · Psychology & Behavior · Operations & Execution
DUR_CONTEXTUAL
Models fail at: long tasks, multi-tool integration
Frameworks (2)
Time-Weighted Economic Value Measurement
Converting Expert Time Allocation into Automation Benchmarks
A methodology for measuring how much economic value AI systems create by surveying how experts actually spend their time, decomposing tasks into testable components, and scoring model performance against those components weighted by time allocation. Provides a bridge between academic benchmarks and commercial utility.
Components
- Survey Expert Time Allocation
- Decompose Time Buckets into Prompts
- Create Expert Rubrics
- Weight Scores by Time Allocation
Prerequisites
- Access to domain experts willing to participate in time surveys
- Ability to create representative prompts
- Infrastructure to score model responses at scale
Success Indicators
- Expert agreement on time allocation (>80% consensus)
- Inter-rater reliability on rubrics (>0.7 correlation)
- Model scores correlate with real-world deployment success
Failure Modes
- Experts game the survey to make their work seem more valuable
- Prompts diverge from actual work over time
- Rubrics capture what's easy to measure rather than what's important
Work-Sample Hiring Assessment
Project-Based Talent Evaluation to Minimize Cultural Bias
A hiring methodology that replaces or supplements interviews with graded work samples that mirror the actual job. Reduces cultural similarity bias and personality theater while increasing signal on actual capability. Applicable to any role where output quality is measurable.
Components
- Identify Core Job Outputs
- Design Realistic Projects
- Create Objective Grading Rubrics
- Supplement with Targeted Interviews
Prerequisites
- Clear understanding of core job outputs
- Ability to create realistic project simulations
- Multiple graders for calibration
Success Indicators
- Work sample scores predict on-the-job performance (validate after 6-12 months)
- Reduced cultural similarity in hires
- Lower attrition from poor job fit
Failure Modes
- Projects too long or complex, candidate dropout
- Rubrics capture surface features not deep quality
- Work sample doesn't correlate with job success (wrong skills tested)
Mental Models (8)
Fixed Costs vs Marginal Costs
EconomicsWhen the cost to produce the first unit is high but the cost to produce additional units is near zero, economics change dramatically.
In Practice: Foody explaining why they can pay poets $150/hour
Demonstrated by Leg-bf-001
Consensus vs Edge Case Tradeoff
Decision MakingIn any quality control system, you can optimize for consensus or for capturing edge cases.
In Practice: Foody discussing how poetry evaluators should have some disagreement
Demonstrated by Leg-bf-001
Revealed Preference via Time Allocation
EconomicsPeople's actual time allocation reveals their true priorities better than their stated preferences.
In Practice: Foody using expert time allocation as proxy for economic value
Demonstrated by Leg-bf-001
Cultural Similarity Bias in Hiring
PsychologyPeople systematically over-weight cultural similarity in hiring decisions.
In Practice: Foody critiquing common hiring mistake of optimizing for vibe space conversation over skill
Demonstrated by Leg-bf-001
The Aggregation Paradox
EconomicsHaving all the distribution is insufficient to create market efficiency if you lack the matching technology.
In Practice: Why labor markets remain fragmented despite LinkedIn distribution
Demonstrated by Leg-bf-001
Slope vs Y-Intercept Assessment
MathematicsIn evaluating people, companies, or investments, you can measure current state (
In Practice: Foody distinguishing between measuring current capability versus learning rate i
Demonstrated by Leg-bf-001
Arbitrage in Constrained Markets
EconomicsWhen regulation, information asymmetry, or geographical constraints create price differences for identical goods, arbitrageurs can profit.
In Practice: Foody 8th grade donut company story
Demonstrated by Leg-bf-001
Comparative Advantage Applied to Self
EconomicsRicardian comparative advantage applied to individuals. Success comes from identifying your relative strengths and organizing your life to leverage them.
In Practice: Foody discussing how dyslexia forced early understanding of personal comparative advantage
Demonstrated by Leg-bf-001
Connective Tissue (2)
Michael Polanyi's Tacit Knowledge ('We know more than we can tell')
Polanyi's observation that expert knowledge often exists as embodied skill that cannot be fully articulated applies directly to AI training challenges. Just as a master craftsman cannot fully explain their technique in words, domain experts cannot fully codify their decision-making into written rules. AI models face the same bottleneck: if the knowledge isn't in pre-training tokens or explicitly created in post-training data, the model cannot learn it. This explains why models excel at codified domains (law statutes, mathematics) but struggle with taste-driven domains (design, negotiation, poetry) where the rules exist only in expert heads.
Foody explaining why AI models struggle with legal expertise that isn't written down
Slope vs Y-Intercept in Linear Functions (Mathematical Analogy)
In y = mx + b, the y-intercept (b) is where you start, the slope (m) is your rate of change. In talent assessment, measuring someone's y-intercept (current capability) is relatively easy via work samples or tests. Measuring their slope (learning rate, trajectory, potential) is much harder because it requires observing performance over time under varying conditions. Most hiring processes over-index on y-intercept because it's measurable in an interview window. The best hiring processes attempt to measure slope through: reference checks on past learning, testing how quickly someone adapts to new information during the interview, or designing projects that require learning mid-execution. The mathematical analogy makes concrete why late bloomers are systematically undervalued: high slope, low y-intercept.
Foody distinguishing between measuring current skill versus development potential in hiring
Key Figures (2)
Reid Hoffman
1 mentionsCo-founder of LinkedIn
Referenced as example of company with distribution but lacking matching technology.
Scott Sandell
1 mentionsVenture Capitalist at NEA
Glossary (1)
GPQA
DOMAIN_JARGONGraduate-level PhD Question Answering benchmark for testing AI reasoning
“Everyone was focused on academic evals like GPQA for PhD-level reasoning or IMO for Olympiad math”
Key People (2)
Michael Polanyi
(1891–1976)Hungarian-British polymath who wrote The Tacit Dimension
Scott Sandell
Venture capitalist at NEA
Concepts (3)
Reinforcement Learning (RL) Environment
CL_TECHNICALA system where an AI agent learns by trying actions, receiving rewards, and adjusting behavior
Tacit Knowledge
CL_PHILOSOPHYKnowledge that is difficult to transfer via writing; skill demonstrated but hard to explain
Predatory Pricing
CL_ECONOMICSPricing below cost to drive competitors out, then raising prices once market dominance is achieved
Synthesis
Synthesis
Migrated from Scholia