recommendation-engine

安装量: 65
排名: #11634

安装

npx skills add https://github.com/secondsky/claude-skills --skill recommendation-engine

Recommendation Engine Build recommendation systems for personalized content and product suggestions. Recommendation Approaches Approach How It Works Pros Cons Collaborative User-item interactions Discovers hidden patterns Cold start Content-based Item features Works for new items Limited discovery Hybrid Combines both Best of both Complex Collaborative Filtering import numpy as np from scipy . sparse import csr_matrix from sklearn . metrics . pairwise import cosine_similarity class CollaborativeFilter : def init ( self ) : self . user_similarity = None self . item_similarity = None def fit ( self , user_item_matrix ) :

User-based similarity

self . user_similarity = cosine_similarity ( user_item_matrix )

Item-based similarity

self . item_similarity = cosine_similarity ( user_item_matrix . T ) def recommend_for_user ( self , user_id , n = 10 ) : scores = self . user_similarity [ user_id ] . dot ( self . user_item_matrix )

Exclude already interacted items

already_interacted

self
.
user_item_matrix
[
user_id
]
.
nonzero
(
)
[
0
]
scores
[
already_interacted
]
=
-
np
.
inf
return
np
.
argsort
(
scores
)
[
-
n
:
]
[
:
:
-
1
]
Matrix Factorization (SVD)
from
sklearn
.
decomposition
import
TruncatedSVD
class
MatrixFactorization
:
def
init
(
self
,
n_factors
=
50
)
:
self
.
svd
=
TruncatedSVD
(
n_components
=
n_factors
)
def
fit
(
self
,
user_item_matrix
)
:
self
.
user_factors
=
self
.
svd
.
fit_transform
(
user_item_matrix
)
self
.
item_factors
=
self
.
svd
.
components_
.
T
def
predict
(
self
,
user_id
,
item_id
)
:
return
np
.
dot
(
self
.
user_factors
[
user_id
]
,
self
.
item_factors
[
item_id
]
)
Hybrid Recommender
class
HybridRecommender
:
def
init
(
self
,
collab_weight
=
0.7
,
content_weight
=
0.3
)
:
self
.
collab
=
CollaborativeFilter
(
)
self
.
content
=
ContentBasedFilter
(
)
self
.
weights
=
(
collab_weight
,
content_weight
)
def
recommend
(
self
,
user_id
,
n
=
10
)
:
collab_scores
=
self
.
collab
.
score
(
user_id
)
content_scores
=
self
.
content
.
score
(
user_id
)
combined
=
self
.
weights
[
0
]
*
collab_scores
+
self
.
weights
[
1
]
*
content_scores
return
np
.
argsort
(
combined
)
[
-
n
:
]
[
:
:
-
1
]
Evaluation Metrics
Precision@K, Recall@K
NDCG (ranking quality)
Coverage (catalog diversity)
A/B test conversion rate
Cold Start Solutions
New users
Popular items, onboarding preferences, demographic-based
New items
Content-based bootstrapping, active learning
Exploration strategies
ε-greedy, Thompson sampling bandits Quick Start: Build a Recommender in 5 Steps from scipy . sparse import csr_matrix import numpy as np

1. Prepare user-item interaction matrix

rows = users, cols = items, values = ratings/interactions

ratings_data

[ ( 0 , 5 , 5 ) , ( 0 , 10 , 4 ) , ( 1 , 5 , 3 ) , . . . ]

(user, item, rating)

n_users , n_items = 1000 , 5000 row_idx = [ r [ 0 ] for r in ratings_data ] col_idx = [ r [ 1 ] for r in ratings_data ] ratings = [ r [ 2 ] for r in ratings_data ] user_item_matrix = csr_matrix ( ( ratings , ( row_idx , col_idx ) ) , shape = ( n_users , n_items ) )

2. Choose and train model

from recommendation_engine import ItemBasedCollaborativeFilter

See references

model

ItemBasedCollaborativeFilter ( similarity_metric = 'cosine' , k_neighbors = 20 ) model . fit ( user_item_matrix )

3. Generate recommendations

recommendations

model . recommend ( user_id = 42 , n = 10 ) print ( recommendations )

[(item_id, score), ...]

4. Evaluate on test set

from evaluation_metrics import precision_at_k , recall_at_k test_items = { 42 : { 10 , 25 , 30 } }

True relevant items for user 42

rec_items

[ item for item , score in recommendations ] precision = precision_at_k ( rec_items , test_items [ 42 ] , k = 10 ) recall = recall_at_k ( rec_items , test_items [ 42 ] , k = 10 ) print ( f"Precision@10: { precision : .3f } , Recall@10: { recall : .3f } " )

5. Handle cold start

from
cold_start
import
PopularityRecommender
popularity_model
=
PopularityRecommender
(
)
popularity_model
.
fit
(
interactions_with_timestamps
)
new_user_recs
=
popularity_model
.
recommend
(
n
=
10
)
Known Issues Prevention
1. Popularity Bias
Problem
Recommending only popular items, ignoring long tail. Reduces diversity and serendipity.
Solution
Balance popularity with personalization, apply re-ranking for diversity: def diversify_recommendations ( recommendations : List [ Tuple [ int , float ] ] , item_features : np . ndarray , diversity_weight : float = 0.3 ) -

List [ Tuple [ int , float ] ] : """Re-rank to increase diversity while maintaining relevance.""" from sklearn . metrics . pairwise import cosine_distances selected = [ ] candidates = recommendations . copy ( ) while len ( selected ) < len ( recommendations ) and candidates : if not selected :

First item: highest score

selected . append ( candidates . pop ( 0 ) ) continue

Compute diversity scores

selected_features

item_features [ [ item for item , _ in selected ] ] diversity_scores = [ ] for item , relevance in candidates : item_feature = item_features [ item ] . reshape ( 1 , - 1 )

Average distance to already selected items

avg_distance

cosine_distances ( item_feature , selected_features ) . mean ( )

Combined score: relevance + diversity

combined

( 1 - diversity_weight ) * relevance + diversity_weight * avg_distance diversity_scores . append ( ( item , relevance , combined ) )

Select item with best combined score

best

max
(
diversity_scores
,
key
=
lambda
x
:
x
[
2
]
)
selected
.
append
(
(
best
[
0
]
,
best
[
1
]
)
)
candidates
=
[
(
i
,
s
)
for
i
,
s
,
_
in
diversity_scores
if
i
!=
best
[
0
]
]
return
selected
2. Data Sparsity (Matrix >99% Empty)
Problem
Collaborative filtering fails when most users have rated <1% of items.
Solution
Use matrix factorization (SVD, ALS) instead of memory-based CF:

❌ Bad: User-based CF on sparse data (fails to find similar users)

user_cf

UserBasedCollaborativeFilter ( ) user_cf . fit ( sparse_matrix )

Most users have <10 ratings

✅ Good: Matrix factorization handles sparsity

from sklearn . decomposition import TruncatedSVD svd = TruncatedSVD ( n_components = 50 ) user_factors = svd . fit_transform ( sparse_matrix ) item_factors = svd . components_ . T

Predict rating: user_factors[u] @ item_factors[i]

  1. Cold Start Without Fallback
    Problem
    Recommender crashes or returns empty results for new users/items.
    Solution
    Always implement fallback chain: def recommend_with_fallback ( user_id , n = 10 ) : """Graceful degradation through fallback chain.""" try :

Try personalized recommendations

if has_sufficient_history ( user_id , min_interactions = 5 ) : return collaborative_filter . recommend ( user_id , n ) except Exception as e : logger . warning ( f"CF failed for user { user_id } : { e } " )

Fallback 1: Demographic-based

if user_demographics_available ( user_id ) : return demographic_recommender . recommend ( user_id , n )

Fallback 2: Popularity

return
popularity_recommender
.
recommend
(
n
)
4. Not Excluding Already-Interacted Items
Problem
Recommending items user already purchased/viewed wastes recommendation slots.
Solution
Always filter interacted items:

✅ Correct: Exclude interacted items

user_items

user_item_matrix [ user_id ] . nonzero ( ) [ 1 ] scores [ user_items ] = - np . inf

Ensure they don't appear in top-K

recommendations

np . argsort ( scores ) [ - n : ] [ : : - 1 ]

❌ Wrong: Forgetting to filter

recommendations

np . argsort ( scores ) [ - n : ] [ : : - 1 ]

May include already purchased!

  1. Ignoring Implicit Feedback Confidence
    Problem
    Treating all clicks/views equally. 1 view ≠ 100 views.
    Solution
    Weight by interaction strength (view count, watch time, etc.):

For implicit feedback, use confidence weighting

confidence_matrix

1 + alpha * np . log ( 1 + interaction_counts )

In ALS: C_ui * (P_ui - X_ui)²

Higher confidence for items with more interactions

  1. Not Evaluating Ranking Quality (Using Only Accuracy)
    Problem
    High prediction accuracy (RMSE) doesn't mean good top-K recommendations.
    Solution
    Use ranking metrics (NDCG, MAP@K):

❌ Bad: Only RMSE

from sklearn . metrics import mean_squared_error rmse = np . sqrt ( mean_squared_error ( y_true , y_pred ) )

✅ Good: Ranking metrics for top-K evaluation

from evaluation_metrics import ndcg_at_k , mean_average_precision_at_k

NDCG rewards putting highly relevant items first

ndcg

ndcg_at_k ( recommendations , relevance_scores , k = 10 )

MAP@K considers precision at each relevant item position

map_score

mean_average_precision_at_k
(
all_recommendations
,
ground_truth
,
k
=
10
)
7. Filter Bubble (Lack of Exploration)
Problem
Always recommending similar items limits discovery, reduces user engagement over time.
Solution
Implement explore-exploit strategy: class ExploreExploitRecommender : def init ( self , base_model , epsilon = 0.1 ) : self . base_model = base_model self . epsilon = epsilon

10% exploration

def recommend ( self , user_id , n = 10 ) :

Exploit: Use trained model for most recommendations

n_exploit

int ( n * ( 1 - self . epsilon ) ) exploitative_recs = self . base_model . recommend ( user_id , n = n_exploit )

Explore: Add random diverse items

n_explore

n

n_exploit
explored_items
=
sample_diverse_items
(
n_explore
)
return
exploitative_recs
+
explored_items
When to Load References
Load reference files when you need detailed implementations:
Collaborative Filtering
Load
references/collaborative-filtering-deep-dive.md
for complete user-based and item-based CF implementations with similarity metrics (cosine, Pearson, Jaccard), scalability optimizations (sparse matrices, approximate nearest neighbors), and handling edge cases (cold start, sparsity)
Matrix Factorization
Load
references/matrix-factorization-methods.md
for SVD, ALS, and NMF implementations with hyperparameter tuning, implicit feedback handling, and advanced techniques (BPR, WARP)
Evaluation Metrics
Load
references/evaluation-metrics-implementation.md
for Precision@K, Recall@K, NDCG, coverage, diversity metrics, cross-validation strategies, and statistical significance testing (paired t-test, bootstrap confidence intervals)
Cold Start Solutions
Load references/cold-start-strategies.md for new user/item strategies (popularity-based, onboarding, demographic, content-based bootstrapping, active learning), explore-exploit approaches (ε-greedy, Thompson sampling), and hybrid fallback chains
返回排行榜