Building a Daily Tech Trends Aggregator with Python
The Problem
Staying on top of tech news is exhausting. Between Hacker News, Reddit, GitHub, and Product Hunt, I was spending 30+ minutes each morning just skimming headlines. Worse, I had no systematic way to identify which stories had true "viral potential" versus which were just noise.
The Solution
I built a Python-based aggregator that:
- Collects trending posts from multiple sources
- Scores each post for viral potential
- Categorizes by topic (AI/ML, Security, Product Launch, etc.)
- Delivers a daily report at 9 AM
Technical Architecture
Data Sources
| Source | Method | Reliability |
|---|---|---|
| Hacker News | Firebase API | ✅ Excellent |
| GitHub Trending | Scraping | ✅ Good |
| Product Hunt | Crawl4AI | ⚠️ Moderate |
| Pushshift API | ❌ Blocked on VPS |
Viral Scoring Algorithm
I developed a weighted scoring system (0-100):
Viral Score =
Engagement (40%) → Upvotes/Points normalized
+ Discussion (40%) → Comments × 3 (weighted higher)
+ Relevance (20%) → AI/Tech keywords matching
+ Controversy Bonus → High comment/upvote ratio
+ Recency Bonus → Posts < 6 hours old
The comment weighting is intentional—a post with 100 upvotes and 200 comments is often more interesting than one with 500 upvotes and 10 comments.
Key Libraries
- Crawl4AI: Async web scraping with stealth mode
- requests: Simple API calls for HN/GitHub
- asyncio: Concurrent fetching
- Cron: Scheduled execution at 9 AM daily
Challenges Faced
1. Reddit Blocking
Reddit immediately blocked requests from my VPS IP. Solutions attempted:
- User-Agent rotation ❌
- Proxy consideration ⏸️ (cost tradeoff)
- Pushshift API ✅ (limited data)
2. Dynamic Content
Google Trends and Reddit load via JavaScript, making simple requests insufficient. Crawl4AI with headless browser solved this, but added latency.
3. Rate Limiting
Hacker News API is generous, but GitHub trending required careful request timing.
Sample Output
🔥 TOP 10 BÀI VIẾT CÓ VIRAL POTENTIAL CAO NHẤT
1. 🚀 [AI/ML] Notepad++ supply chain attack breakdown
📰 Hacker News | 🔥 Viral Score: 58.5/100
⬆️ 177 | 💬 82
2. 📈 [Development] Xcode 26.3 – Coding agents in Xcode
📰 Hacker News | 🔥 Viral Score: 51.5/100
⬆️ 238 | 💬 199
Results After 1 Week
- Time saved: ~25 minutes/day
- Stories analyzed: ~50/day across sources
- Signal-to-noise ratio: Significantly improved
- Missed stories: Near zero for major tech news
Future Improvements
- Sentiment Analysis: Auto-flag negative security coverage
- Email Integration: Deliver reports to inbox
- Slack Bot: Real-time alerts for high-viral stories
- Historical Tracking: Identify trending topic lifecycles
Code Snippet
The core scoring function:
def calculate_viral_score(item):
score = 0
# Engagement (40 points max)
engagement = item.get('score', 0) + item.get('points', 0)
score += min(20, (engagement / 500) * 20)
# Comments weighted 3x (40 points max)
comments = item.get('comments', 0)
score += min(20, (comments / 100) * 20)
# Tech relevance keyword matching (20 points)
ai_keywords = ['ai', 'llm', 'gpt', 'machine learning']
score += sum(10 for kw in ai_keywords if kw in title.lower())
return min(100, score)
Conclusion
Automation isn't just about saving time—it's about consistency. This aggregator ensures I never miss important tech news while filtering out the noise. The viral scoring adds a layer of intelligence that raw feeds can't provide.
Total build time: ~4 hours ROI: Break-even in less than a week
Sometimes the best side projects solve your own daily frustrations.
Built with Python, Crawl4AI, and too much coffee.
