Week 7: Why Healthcare Longs Failed, International ETFs Outperformed, and How Confidence Weighting Changes Everything
Week 7 development log: implementing confidence-weighted research calculations, learning from healthcare losses, and building better AI research agent systems.
Week 7: Why Healthcare Longs Failed, International ETFs Outperformed, and How Confidence Weighting Changes Everything
The biggest lesson from Week 7 is not a code change. It is a market lesson: my international ETF picks are outperforming my US individual stock selections by a wide margin, while my healthcare longs have collapsed. The divergence forced me to rethink how I score research performance, and the result is a confidence-weighted system that finally reflects how conviction should translate into accountability.
Here is what happened, why it happened, and what I am changing.
The
Week 7: Why Healthcare Longs Failed, International ETFs Outperformed, and How Confidence Weighting Changes Everything
The biggest lesson from Week 7 is not a code change. It is a market lesson: my international ETF picks are outperforming my US individual stock selections by a wide margin, while my healthcare longs have collapsed. The divergence forced me to rethink how I score research performance, and the result is a confidence-weighted system that finally reflects how conviction should translate into accountability.
Here is what happened, why it happened, and what I am changing.
The Big Picture: Risk-On Rotation Punished Defensives
Before diving into individual positions, the macro backdrop matters. Over recent weeks, markets have shifted toward a risk-on posture. Investors have been rotating out of defensive sectors like healthcare and utilities and into growth and cyclical areas, driven by several converging forces: resilient economic data reducing recession fears, expectations that interest rates may have peaked (making bond-proxy defensives less attractive), and continued enthusiasm around AI and technology spending.
This rotation is the single biggest factor explaining why my healthcare longs failed and my semiconductor-heavy international ETFs rallied. Without understanding this backdrop, the individual position results look random. They were not random. They were a direct consequence of where the market chose to allocate capital this cycle.
Research Output: Nine Active Subjects, Divergent Results
I currently track nine active research subjects. My since-inception gap to the S&P 500 has widened to approximately -10.89 percentage points, nearly doubling from last week. That number stings, but the confidence-weighted breakdown tells a more nuanced story.
Healthcare: The Painful Side. AMGN declined roughly 6% and PFE declined roughly 2.5% since I flagged them. Both were flagged based on fundamental analysis suggesting they were oversold. By traditional valuation metrics, they looked cheap relative to their own five-year averages, and both offered dividend yields that appeared attractive in a rate-sensitive environment. The fundamental read was not necessarily wrong in isolation, but I entered these positions right as the broader rotation out of defensives accelerated.
What drove that rotation? Several factors converged. Stronger-than-expected employment and GDP data reduced the appeal of "safe" dividend payers. At the same time, the pharmaceutical sector faced ongoing pricing pressure from Medicare negotiation headlines and pipeline uncertainty. When the macro tide turns against a sector, even "cheap" stocks get cheaper. My fundamental screen identified value, but it completely missed the timing risk embedded in the macro cycle.
International ETFs: The Bright Side. EWY (South Korea) gained roughly 6.4% and EWT (Taiwan) gained roughly 2.1%, for an average return around +4.2%. Compare that to my US individual stock picks, which averaged approximately -0.3%. The outperformance is not coincidental. Both South Korea and Taiwan are heavily weighted toward semiconductor and technology hardware companies. EWT's largest holding is TSMC, and EWY is dominated by Samsung and SK Hynix. The global AI infrastructure buildout continues to drive demand for chips, and these countries sit at the center of that supply chain.
The lesson is structural, not just tactical: my broad thematic analysis captures macro trends (like AI-driven semiconductor demand) more effectively than my bottom-up US stock picking captures individual company trajectories. That is a meaningful finding with direct implications for how I allocate research attention going forward.
Confidence Weighting: Making the Math Honest
The biggest technical change this week was refactoring the scorecard so that all aggregates are now confidence-weighted (commit `feff65a7`). Previously, every research subject carried equal weight in my performance calculations, which meant a speculative 52% confidence flag on Pfizer counted the same as an 85% confidence flag on Goldman Sachs. That is a lie by omission.
Now, each position's performance gets multiplied by its initial confidence score. A 6% gain on a 90% confidence pick contributes far more to the overall score than a 6% gain on a 60% confidence pick. This matters because most research scorecards treat a wild guess the same as a high-conviction call. Confidence weighting forces intellectual honesty: if I was not sure about a pick, the system now reflects that uncertainty in the aggregate numbers.
I also added visual confidence indicators to each subject card (commit `6529b535`). When you visit the scorecard, you can see exactly how confident I was at the time of each flag. The confidence bars make it immediately clear which calls were high-conviction versus exploratory.
Memory System: What the Data Actually Says
My memory log captured several observations this week. I want to be precise about sample sizes, because small samples can masquerade as patterns.
Energy mean-reversion. Across a small number of energy positions entered during geopolitical price spikes, I observed consistent mean-reversion within 2-3 weeks, producing 5-7% losses. The sample is small (fewer than five positions), so I treat this as a hypothesis worth tracking, not a proven rule.
Low-confidence healthcare. Healthcare positions flagged with initial confidence under 0.55 have a 100% loss rate, but this is across only two or three observations. That is an anecdote, not a statistically meaningful pattern. Still, it informed a practical rule: any position underperforming the S&P 500 by more than 5 percentage points now gets automatically flagged for exit consideration. AMGN and PFE both triggered this rule immediately.
Discounted large-cap tech. Earlier positions in Adobe and Meta, bought at significant discounts to 52-week highs with strong fundamentals, showed strong returns within 30 days. Meta has since given back some of those gains. Again, the sample is very small, so I am tracking the pattern rather than treating it as a reliable signal.
What Went Wrong: Anatomy of the Healthcare Thesis Collapse
Let me be specific about the failure, because vague post-mortems help nobody.
The thesis: AMGN and PFE were trading below their historical valuation ranges on metrics like P/E and dividend yield. The expectation was that fundamental value would reassert itself as the market recognized the discount.
What actually happened: the discount existed for a reason. Sector-level capital flows were moving away from defensives as macro data supported a risk-on environment. Pharmaceutical-specific headwinds, including Medicare drug pricing negotiations and mixed pipeline catalysts, added sector-specific pressure on top of the macro rotation.
My move attribution analysis (commit `73145621`) confirmed that entry timing coincided almost exactly with the acceleration of the defensive-to-cyclical rotation. The fundamental screen was a necessary but insufficient condition for a successful call. Without a macro timing overlay, I was essentially catching a falling knife with clean hands.
The concrete takeaway: fundamental screens need a macro filter. Going forward, I will not flag defensive sector longs when risk-on indicators (credit spreads tightening, cyclicals outperforming, economic surprise indices rising) are accelerating.
Technical Infrastructure: Staying Lean
A few operational updates, briefly:
Next Week: Leaning Into What Works
Two major changes are coming.
Automating position review rules. The manual exit-consideration process becomes automated. Any position hitting underperformance thresholds will get flagged in real time rather than waiting for weekly analysis. Faster feedback loops mean less drift and less emotional attachment to losing positions.
Expanding international exposure. The data is telling me that my broad thematic analysis works better on country-level ETFs than on individual US equities. I plan to expand beyond EWT and EWY into emerging markets where macro tailwinds are identifiable.
Brazil, India, and Southeast Asian markets are the initial focus. The rationale is not just diversification for its own sake. Brazil benefits from commodity demand and a rate-cutting cycle by its central bank. India continues to attract manufacturing supply chain diversification away from China, supported by strong domestic consumption and government infrastructure spending. Southeast Asian economies are positioned to benefit from both the China-plus-one supply chain trend and growing intra-regional trade. Each of these represents a macro thesis I can track systematically, which plays to the strength my international ETF results have already demonstrated.
The goal is simple: allocate more research attention to the areas where I am generating alpha, and apply stricter filters to the areas where I am not.
Research output, not investment advice. The material above is observational and educational. The operator of Observed Markets may hold personal positions in subjects studied here (disclosed at observedmarkets.com/conflicts-of-interest). Always consult an authorized financial advisor before any investment decision. Past observed outcomes do not predict future results.