© 2024 Felix Ng

arrow_backBack to Journal
Building a Daily Tech Trends Aggregator with Python
February 5, 2026Journal3 min read

Building a Daily Tech Trends Aggregator with Python

Building a Daily Tech Trends Aggregator with Python

The Problem

Staying on top of tech news is exhausting. Between Hacker News, Reddit, GitHub, and Product Hunt, I was spending 30+ minutes each morning just skimming headlines. Worse, I had no systematic way to identify which stories had true "viral potential" versus which were just noise.

The Solution

I built a Python-based aggregator that:

  1. Collects trending posts from multiple sources
  2. Scores each post for viral potential
  3. Categorizes by topic (AI/ML, Security, Product Launch, etc.)
  4. Delivers a daily report at 9 AM

Technical Architecture

Data Sources

SourceMethodReliability
Hacker NewsFirebase API✅ Excellent
GitHub TrendingScraping✅ Good
Product HuntCrawl4AI⚠️ Moderate
RedditPushshift API❌ Blocked on VPS

Viral Scoring Algorithm

I developed a weighted scoring system (0-100):

Viral Score = 
  Engagement (40%)      → Upvotes/Points normalized
  + Discussion (40%)    → Comments × 3 (weighted higher)
  + Relevance (20%)     → AI/Tech keywords matching
  + Controversy Bonus   → High comment/upvote ratio
  + Recency Bonus       → Posts < 6 hours old

The comment weighting is intentional—a post with 100 upvotes and 200 comments is often more interesting than one with 500 upvotes and 10 comments.

Key Libraries

  • Crawl4AI: Async web scraping with stealth mode
  • requests: Simple API calls for HN/GitHub
  • asyncio: Concurrent fetching
  • Cron: Scheduled execution at 9 AM daily

Challenges Faced

1. Reddit Blocking

Reddit immediately blocked requests from my VPS IP. Solutions attempted:

  • User-Agent rotation ❌
  • Proxy consideration ⏸️ (cost tradeoff)
  • Pushshift API ✅ (limited data)

2. Dynamic Content

Google Trends and Reddit load via JavaScript, making simple requests insufficient. Crawl4AI with headless browser solved this, but added latency.

3. Rate Limiting

Hacker News API is generous, but GitHub trending required careful request timing.

Sample Output

🔥 TOP 10 BÀI VIẾT CÓ VIRAL POTENTIAL CAO NHẤT

1. 🚀 [AI/ML] Notepad++ supply chain attack breakdown
   📰 Hacker News | 🔥 Viral Score: 58.5/100
   ⬆️ 177 | 💬 82
   
2. 📈 [Development] Xcode 26.3 – Coding agents in Xcode
   📰 Hacker News | 🔥 Viral Score: 51.5/100
   ⬆️ 238 | 💬 199

Results After 1 Week

  • Time saved: ~25 minutes/day
  • Stories analyzed: ~50/day across sources
  • Signal-to-noise ratio: Significantly improved
  • Missed stories: Near zero for major tech news

Future Improvements

  1. Sentiment Analysis: Auto-flag negative security coverage
  2. Email Integration: Deliver reports to inbox
  3. Slack Bot: Real-time alerts for high-viral stories
  4. Historical Tracking: Identify trending topic lifecycles

Code Snippet

The core scoring function:

def calculate_viral_score(item):
    score = 0
    
    # Engagement (40 points max)
    engagement = item.get('score', 0) + item.get('points', 0)
    score += min(20, (engagement / 500) * 20)
    
    # Comments weighted 3x (40 points max)
    comments = item.get('comments', 0)
    score += min(20, (comments / 100) * 20)
    
    # Tech relevance keyword matching (20 points)
    ai_keywords = ['ai', 'llm', 'gpt', 'machine learning']
    score += sum(10 for kw in ai_keywords if kw in title.lower())
    
    return min(100, score)

Conclusion

Automation isn't just about saving time—it's about consistency. This aggregator ensures I never miss important tech news while filtering out the noise. The viral scoring adds a layer of intelligence that raw feeds can't provide.

Total build time: ~4 hours ROI: Break-even in less than a week

Sometimes the best side projects solve your own daily frustrations.

Built with Python, Crawl4AI, and too much coffee.