9 回复
感谢
我整理PyCon视频通常用Python写个小脚本,主要做三件事:抓摘要、打评分、存数据库。
先装几个库:
pip install youtube-transcript-api textblob sqlite3
核心代码这么写:
import sqlite3
from youtube_transcript_api import YouTubeTranscriptApi
from textblob import TextBlob
class PyConVideoOrganizer:
def __init__(self, db_path='pycon_videos.db'):
self.conn = sqlite3.connect(db_path)
self.create_table()
def create_table(self):
cursor = self.conn.cursor()
cursor.execute('''
CREATE TABLE IF NOT EXISTS videos (
id TEXT PRIMARY KEY,
title TEXT,
summary TEXT,
sentiment_score REAL,
duration INTEGER,
year INTEGER
)
''')
self.conn.commit()
def get_transcript(self, video_id):
"""获取视频字幕并生成摘要"""
try:
transcript = YouTubeTranscriptApi.get_transcript(video_id)
full_text = ' '.join([entry['text'] for entry in transcript])
# 简单摘要:取前3句和后2句
sentences = full_text.split('. ')
summary = '. '.join(sentences[:3] + sentences[-2:]) if len(sentences) > 5 else full_text
# 情感分析评分
blob = TextBlob(full_text)
sentiment = blob.sentiment.polarity # -1到1的评分
return summary, sentiment
except Exception as e:
print(f"获取字幕失败 {video_id}: {e}")
return None, 0
def add_video(self, video_id, title, duration, year):
summary, score = self.get_transcript(video_id)
if summary:
cursor = self.conn.cursor()
cursor.execute('''
INSERT OR REPLACE INTO videos
VALUES (?, ?, ?, ?, ?, ?)
''', (video_id, title, summary, score, duration, year))
self.conn.commit()
print(f"已添加: {title} (评分: {score:.2f})")
def get_top_videos(self, limit=10):
"""按情感评分获取最佳视频"""
cursor = self.conn.cursor()
cursor.execute('''
SELECT title, summary, sentiment_score
FROM videos
ORDER BY sentiment_score DESC
LIMIT ?
''', (limit,))
return cursor.fetchall()
# 使用示例
organizer = PyConVideoOrganizer()
# 添加几个PyCon视频(需要YouTube视频ID)
videos = [
("dQw4w9WgXcQ", "Awesome PyTalk 2023", 1800, 2023),
("abc123def456", "Python Patterns", 2400, 2022)
]
for vid, title, dur, year in videos:
organizer.add_video(vid, title, dur, year)
# 获取评分最高的视频
top_videos = organizer.get_top_videos(5)
for title, summary, score in top_videos:
print(f"\n{title} ({score:.2f})\n{summary[:200]}...")
我的评分标准很简单:用TextBlob分析字幕文本的情感倾向,正分越高说明内容越积极/有趣。存SQLite方便查询,按年份、评分排序都行。
要更精确的话可以加上关键词提取(用sklearn的TF-IDF),或者用spaCy做实体识别来标记视频涉及的技术主题。
一句话建议:自动化处理加简单评分能快速筛选高质量内容。
我是扫完了 70%的视频, 记录下了自己认为不错的 session, 准备写 blog, 不过还没开工:
<iframe src="https://www.youtube.com/embed/WVnACT48CkE" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/klaGx9Q_SOA" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/H4SS9yVWJYA" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/GBQAKldqgZs" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/T-TwcmT6Rcw" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/Jd8ulMb6_ls" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/zJ9z6Ge-vXs" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/RojaWIoBfOo" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/0Z45gcIwwrQ" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/dT2xjgUInhQ" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/HuuYwUxM-ZY" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/VJ0vibC_Hl0" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
12:14<iframe src="https://www.youtube.com/embed/BwC01zoSRBc" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/-7taKQnndfo" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/EGF4G2feXx4" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/MYucYon2-lk" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/ITksU31c1WY" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
抱歉, 原来贴 Youtube 链接会被自动解析的…
没事,谢谢你的分享。这些也是我非常感兴趣的话题。
最近分析了一下 V2EX 如果要从 Python 2 升级到 Python 3 的障碍,目前看来是基本可行的。
话说我最近也给一个开源项目贡献代码,帮助做 Python 2 到 Python 3 的升级。如果 v2 升级过程中有需要帮忙的话尽管邮件我,我会尽力协助。
具体过程等尘埃落定后也准备写 blog 来介绍
感谢推荐
👍

