- Stock Correlation Analysis Skill
- Finds and analyzes correlated stocks using historical price data from Yahoo Finance via
- yfinance
- . Routes to specialized sub-skills based on user intent.
- Important
-
- This is for research and educational purposes only. Not financial advice. yfinance is not affiliated with Yahoo, Inc.
- Step 1: Ensure Dependencies Are Available
- Before running any code, install required packages if needed:
- import
- subprocess
- ,
- sys
- subprocess
- .
- check_call
- (
- [
- sys
- .
- executable
- ,
- "-m"
- ,
- "pip"
- ,
- "install"
- ,
- "-q"
- ,
- "yfinance"
- ,
- "pandas"
- ,
- "numpy"
- ]
- )
- Always include this at the top of your script.
- Step 2: Route to the Correct Sub-Skill
- Classify the user's request and jump to the matching sub-skill section below.
- User Request
- Route To
- Examples
- Single ticker, wants to find related stocks
- Sub-Skill A: Co-movement Discovery
- "what correlates with NVDA", "find stocks related to AMD", "sympathy plays for TSLA"
- Two or more specific tickers, wants relationship details
- Sub-Skill B: Return Correlation
- "correlation between AMD and NVDA", "how do LITE and COHR move together", "compare AAPL vs MSFT"
- Group of tickers, wants structure/grouping
- Sub-Skill C: Sector Clustering
- "correlation matrix for FAANG", "cluster these semiconductor stocks", "sector peers for AMD"
- Wants time-varying or conditional correlation
- Sub-Skill D: Realized Correlation
- "rolling correlation AMD NVDA", "when NVDA drops what else drops", "how has correlation changed"
- If ambiguous, default to
- Sub-Skill A
- (Co-movement Discovery) for single tickers, or
- Sub-Skill B
- (Return Correlation) for two tickers.
- Defaults for all sub-skills
- Parameter
- Default
- Lookback period
- 1y
- (1 year)
- Data interval
- 1d
- (daily)
- Correlation method
- Pearson
- Minimum correlation threshold
- 0.60
- Number of results
- Top 10
- Return type
- Daily log returns
- Rolling window
- 60 trading days
- Sub-Skill A: Co-movement Discovery
- Goal
- Given a single ticker, find stocks that move with it. A1: Build the peer universe You need 15-30 candidates. Do not use hardcoded ticker lists — build the universe dynamically at runtime. See references/sector_universes.md for the full implementation. The approach: Screen same-industry stocks using yf.screen() + yf.EquityQuery to find stocks in the same industry as the target Broaden to sector if the industry screen returns fewer than 10 peers Add thematic/adjacent industries — read the target's longBusinessSummary and screen 1-2 related industries (e.g., a semiconductor company → also screen semiconductor equipment) Combine, deduplicate, remove target ticker A2: Compute correlations import yfinance as yf import pandas as pd import numpy as np def discover_comovement ( target_ticker , peer_tickers , period = "1y" ) : all_tickers = [ target_ticker ] + [ t for t in peer_tickers if t != target_ticker ] data = yf . download ( all_tickers , period = period , auto_adjust = True , progress = False )
Extract close prices — yf.download returns MultiIndex (Price, Ticker) columns
closes
data [ "Close" ] . dropna ( axis = 1 , thresh = max ( 60 , len ( data ) // 2 ) )
Log returns
returns
np . log ( closes / closes . shift ( 1 ) ) . dropna ( ) corr_series = returns . corr ( ) [ target_ticker ] . drop ( target_ticker , errors = "ignore" )
Rank by absolute correlation
ranked
- corr_series
- .
- abs
- (
- )
- .
- sort_values
- (
- ascending
- =
- False
- )
- result
- =
- pd
- .
- DataFrame
- (
- {
- "Ticker"
- :
- ranked
- .
- index
- ,
- "Correlation"
- :
- [
- round
- (
- corr_series
- [
- t
- ]
- ,
- 4
- )
- for
- t
- in
- ranked
- .
- index
- ]
- ,
- }
- )
- return
- result
- ,
- returns
- A3: Present results
- Show a ranked table with company names and sectors (fetch via
- yf.Ticker(t).info.get("shortName")
- ):
- Rank
- Ticker
- Company
- Correlation
- Why linked
- 1
- AMD
- Advanced Micro Devices
- 0.82
- Same industry — GPU/CPU
- 2
- AVGO
- Broadcom
- 0.78
- AI infrastructure peer
- Include:
- Top 10 positively correlated stocks
- Any notable negatively correlated stocks (potential hedges)
- Brief explanation of
- why
- each might be linked (sector, supply chain, customer overlap)
- Sub-Skill B: Return Correlation
- Goal
- Deep-dive into the relationship between two (or a few) specific tickers. B1: Download and compute import yfinance as yf import pandas as pd import numpy as np def return_correlation ( ticker_a , ticker_b , period = "1y" ) : data = yf . download ( [ ticker_a , ticker_b ] , period = period , auto_adjust = True , progress = False ) closes = data [ "Close" ] [ [ ticker_a , ticker_b ] ] . dropna ( ) returns = np . log ( closes / closes . shift ( 1 ) ) . dropna ( ) corr = returns [ ticker_a ] . corr ( returns [ ticker_b ] )
Beta: how much does B move per unit move of A
cov_matrix
returns . cov ( ) beta = cov_matrix . loc [ ticker_b , ticker_a ] / cov_matrix . loc [ ticker_a , ticker_a ]
R-squared
r_squared
corr ** 2
Rolling 60-day correlation for stability
rolling_corr
returns [ ticker_a ] . rolling ( 60 ) . corr ( returns [ ticker_b ] )
Spread (log price ratio) for mean-reversion
spread
- np
- .
- log
- (
- closes
- [
- ticker_a
- ]
- /
- closes
- [
- ticker_b
- ]
- )
- spread_z
- =
- (
- spread
- -
- spread
- .
- mean
- (
- )
- )
- /
- spread
- .
- std
- (
- )
- return
- {
- "correlation"
- :
- round
- (
- corr
- ,
- 4
- )
- ,
- "beta"
- :
- round
- (
- beta
- ,
- 4
- )
- ,
- "r_squared"
- :
- round
- (
- r_squared
- ,
- 4
- )
- ,
- "rolling_corr_mean"
- :
- round
- (
- rolling_corr
- .
- mean
- (
- )
- ,
- 4
- )
- ,
- "rolling_corr_std"
- :
- round
- (
- rolling_corr
- .
- std
- (
- )
- ,
- 4
- )
- ,
- "rolling_corr_min"
- :
- round
- (
- rolling_corr
- .
- min
- (
- )
- ,
- 4
- )
- ,
- "rolling_corr_max"
- :
- round
- (
- rolling_corr
- .
- max
- (
- )
- ,
- 4
- )
- ,
- "spread_z_current"
- :
- round
- (
- spread_z
- .
- iloc
- [
- -
- 1
- ]
- ,
- 4
- )
- ,
- "observations"
- :
- len
- (
- returns
- )
- ,
- }
- B2: Present results
- Show a summary card:
- Metric
- Value
- Pearson Correlation
- 0.82
- Beta (B vs A)
- 1.15
- R-squared
- 0.67
- Rolling Corr (60d avg)
- 0.80
- Rolling Corr Range
- [0.55, 0.94]
- Rolling Corr Std Dev
- 0.08
- Spread Z-Score (current)
- +1.2
- Observations
- 250
- Interpretation guide:
- Correlation > 0.80
-
- Strong co-movement — these stocks are tightly linked
- Correlation 0.50–0.80
-
- Moderate — shared sector drivers but independent factors too
- Correlation < 0.50
-
- Weak — limited co-movement despite possible sector overlap
- High rolling std
-
- Unstable relationship — correlation varies significantly over time
- Spread Z > |2|
-
- Unusual divergence from historical relationship
- Sub-Skill C: Sector Clustering
- Goal
- Given a group of tickers, show the full correlation structure and identify clusters. C1: Build the correlation matrix import yfinance as yf import pandas as pd import numpy as np def sector_clustering ( tickers , period = "1y" ) : data = yf . download ( tickers , period = period , auto_adjust = True , progress = False )
yf.download returns MultiIndex (Price, Ticker) columns
closes
data [ "Close" ] . dropna ( axis = 1 , thresh = max ( 60 , len ( data ) // 2 ) ) returns = np . log ( closes / closes . shift ( 1 ) ) . dropna ( ) corr_matrix = returns . corr ( )
Hierarchical clustering order
from scipy . cluster . hierarchy import linkage , leaves_list from scipy . spatial . distance import squareform dist_matrix = 1 - corr_matrix . abs ( ) np . fill_diagonal ( dist_matrix . values , 0 ) condensed = squareform ( dist_matrix ) linkage_matrix = linkage ( condensed , method = "ward" ) order = leaves_list ( linkage_matrix ) ordered_tickers = [ corr_matrix . columns [ i ] for i in order ]
Reorder matrix
clustered
- corr_matrix
- .
- loc
- [
- ordered_tickers
- ,
- ordered_tickers
- ]
- return
- clustered
- ,
- returns
- Note: if
- scipy
- is not available, fall back to sorting by average correlation instead of hierarchical clustering.
- C2: Present results
- Full correlation matrix
- — formatted as a table. For more than 8 tickers, show as a heatmap description or highlight only the strongest/weakest pairs.
- Identified clusters
- — group tickers that have high intra-group correlation:
- Cluster 1: [NVDA, AMD, AVGO] — avg intra-correlation 0.82
- Cluster 2: [AAPL, MSFT] — avg intra-correlation 0.75
- Outliers
- — tickers with low average correlation to the group (potential diversifiers).
- Strongest pairs
- — top 5 highest-correlation pairs in the matrix.
- Weakest pairs
- — top 5 lowest/negative-correlation pairs (hedging candidates).
- Sub-Skill D: Realized Correlation
- Goal
-
- Show how correlation changes over time and under different market conditions.
- D1: Rolling correlation
- import
- yfinance
- as
- yf
- import
- pandas
- as
- pd
- import
- numpy
- as
- np
- def
- realized_correlation
- (
- ticker_a
- ,
- ticker_b
- ,
- period
- =
- "2y"
- ,
- windows
- =
- [
- 20
- ,
- 60
- ,
- 120
- ]
- )
- :
- data
- =
- yf
- .
- download
- (
- [
- ticker_a
- ,
- ticker_b
- ]
- ,
- period
- =
- period
- ,
- auto_adjust
- =
- True
- ,
- progress
- =
- False
- )
- closes
- =
- data
- [
- "Close"
- ]
- [
- [
- ticker_a
- ,
- ticker_b
- ]
- ]
- .
- dropna
- (
- )
- returns
- =
- np
- .
- log
- (
- closes
- /
- closes
- .
- shift
- (
- 1
- )
- )
- .
- dropna
- (
- )
- rolling
- =
- {
- }
- for
- w
- in
- windows
- :
- rolling
- [
- f"
- {
- w
- }
- d"
- ]
- =
- returns
- [
- ticker_a
- ]
- .
- rolling
- (
- w
- )
- .
- corr
- (
- returns
- [
- ticker_b
- ]
- )
- return
- rolling
- ,
- returns
- D2: Regime-conditional correlation
- def
- regime_correlation
- (
- returns
- ,
- ticker_a
- ,
- ticker_b
- ,
- condition_ticker
- =
- None
- )
- :
- """Compare correlation across up/down/volatile regimes."""
- if
- condition_ticker
- is
- None
- :
- condition_ticker
- =
- ticker_a
- ret
- =
- returns
- [
- condition_ticker
- ]
- regimes
- =
- {
- "All Days"
- :
- pd
- .
- Series
- (
- True
- ,
- index
- =
- returns
- .
- index
- )
- ,
- "Up Days (target > 0)"
- :
- ret
- >
- 0
- ,
- "Down Days (target < 0)"
- :
- ret
- <
- 0
- ,
- "High Vol (top 25%)"
- :
- ret
- .
- abs
- (
- )
- >
- ret
- .
- abs
- (
- )
- .
- quantile
- (
- 0.75
- )
- ,
- "Low Vol (bottom 25%)"
- :
- ret
- .
- abs
- (
- )
- <
- ret
- .
- abs
- (
- )
- .
- quantile
- (
- 0.25
- )
- ,
- "Large Drawdown (< -2%)"
- :
- ret
- <
- -
- 0.02
- ,
- }
- results
- =
- {
- }
- for
- name
- ,
- mask
- in
- regimes
- .
- items
- (
- )
- :
- subset
- =
- returns
- [
- mask
- ]
- if
- len
- (
- subset
- )
- >=
- 20
- :
- results
- [
- name
- ]
- =
- {
- "correlation"
- :
- round
- (
- subset
- [
- ticker_a
- ]
- .
- corr
- (
- subset
- [
- ticker_b
- ]
- )
- ,
- 4
- )
- ,
- "days"
- :
- int
- (
- mask
- .
- sum
- (
- )
- )
- ,
- }
- return
- results
- D3: Present results
- Rolling correlation summary table
- :
- Window
- Current
- Mean
- Min
- Max
- Std
- 20-day
- 0.88
- 0.76
- 0.32
- 0.95
- 0.12
- 60-day
- 0.82
- 0.78
- 0.55
- 0.92
- 0.08
- 120-day
- 0.80
- 0.79
- 0.68
- 0.88
- 0.05
- Regime correlation table
- :
- Regime
- Correlation
- Days
- All Days
- 0.82
- 250
- Up Days
- 0.75
- 132
- Down Days
- 0.87
- 118
- High Vol (top 25%)
- 0.90
- 63
- Large Drawdown (< -2%)
- 0.93
- 28
- Key insight
-
- Highlight whether correlation
- increases during sell-offs
- (very common — "correlations go to 1 in a crisis"). This is critical for risk management.
- Trend
-
- Is correlation trending higher or lower recently vs. its historical average?
- Step 3: Respond to the User
- After running the appropriate sub-skill, present results clearly:
- Always include
- The
- lookback period
- and
- data interval
- used
- The
- number of observations
- (trading days)
- Any tickers
- dropped due to insufficient data
- Always caveat
- Correlation is not causation
- — co-movement does not imply a causal link
- Past correlation does not guarantee future correlation
- — regimes shift
- Short lookback windows
- produce noisy estimates; longer windows smooth but may miss regime changes
- Practical applications (mention when relevant)
- Sympathy plays
-
- Stocks likely to follow a peer's earnings/news move
- Pair trading
-
- High-correlation pairs where the spread has diverged from its mean
- Portfolio diversification
-
- Finding low-correlation assets to reduce risk
- Hedging
-
- Identifying inversely correlated instruments
- Sector rotation
-
- Understanding which sectors move together
- Risk management
-
- Correlation spikes during stress — diversification may fail when needed most
- Important
- Never recommend specific trades. Present data and let the user draw conclusions. Reference Files references/sector_universes.md — Dynamic peer universe construction using yfinance Screener API Read the reference file when you need to build a peer universe for a given ticker.