Python中分享几个PyCon视频的摘要、评分与整理方法


Python中分享几个PyCon视频的摘要、评分与整理方法
9 回复

我整理PyCon视频通常用Python写个小脚本,主要做三件事:抓摘要、打评分、存数据库。

先装几个库:

pip install youtube-transcript-api textblob sqlite3

核心代码这么写:

import sqlite3
from youtube_transcript_api import YouTubeTranscriptApi
from textblob import TextBlob

class PyConVideoOrganizer:
    def __init__(self, db_path='pycon_videos.db'):
        self.conn = sqlite3.connect(db_path)
        self.create_table()
    
    def create_table(self):
        cursor = self.conn.cursor()
        cursor.execute('''
            CREATE TABLE IF NOT EXISTS videos (
                id TEXT PRIMARY KEY,
                title TEXT,
                summary TEXT,
                sentiment_score REAL,
                duration INTEGER,
                year INTEGER
            )
        ''')
        self.conn.commit()
    
    def get_transcript(self, video_id):
        """获取视频字幕并生成摘要"""
        try:
            transcript = YouTubeTranscriptApi.get_transcript(video_id)
            full_text = ' '.join([entry['text'] for entry in transcript])
            
            # 简单摘要:取前3句和后2句
            sentences = full_text.split('. ')
            summary = '. '.join(sentences[:3] + sentences[-2:]) if len(sentences) > 5 else full_text
            
            # 情感分析评分
            blob = TextBlob(full_text)
            sentiment = blob.sentiment.polarity  # -1到1的评分
            
            return summary, sentiment
        except Exception as e:
            print(f"获取字幕失败 {video_id}: {e}")
            return None, 0
    
    def add_video(self, video_id, title, duration, year):
        summary, score = self.get_transcript(video_id)
        if summary:
            cursor = self.conn.cursor()
            cursor.execute('''
                INSERT OR REPLACE INTO videos 
                VALUES (?, ?, ?, ?, ?, ?)
            ''', (video_id, title, summary, score, duration, year))
            self.conn.commit()
            print(f"已添加: {title} (评分: {score:.2f})")
    
    def get_top_videos(self, limit=10):
        """按情感评分获取最佳视频"""
        cursor = self.conn.cursor()
        cursor.execute('''
            SELECT title, summary, sentiment_score 
            FROM videos 
            ORDER BY sentiment_score DESC 
            LIMIT ?
        ''', (limit,))
        return cursor.fetchall()

# 使用示例
organizer = PyConVideoOrganizer()

# 添加几个PyCon视频(需要YouTube视频ID)
videos = [
    ("dQw4w9WgXcQ", "Awesome PyTalk 2023", 1800, 2023),
    ("abc123def456", "Python Patterns", 2400, 2022)
]

for vid, title, dur, year in videos:
    organizer.add_video(vid, title, dur, year)

# 获取评分最高的视频
top_videos = organizer.get_top_videos(5)
for title, summary, score in top_videos:
    print(f"\n{title} ({score:.2f})\n{summary[:200]}...")

我的评分标准很简单:用TextBlob分析字幕文本的情感倾向,正分越高说明内容越积极/有趣。存SQLite方便查询,按年份、评分排序都行。

要更精确的话可以加上关键词提取(用sklearn的TF-IDF),或者用spaCy做实体识别来标记视频涉及的技术主题。

一句话建议:自动化处理加简单评分能快速筛选高质量内容。

我是扫完了 70%的视频, 记录下了自己认为不错的 session, 准备写 blog, 不过还没开工:

<iframe src="https://www.youtube.com/embed/WVnACT48CkE" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/klaGx9Q_SOA" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/H4SS9yVWJYA" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/GBQAKldqgZs" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/T-TwcmT6Rcw" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/Jd8ulMb6_ls" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/zJ9z6Ge-vXs" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/RojaWIoBfOo" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/0Z45gcIwwrQ" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/dT2xjgUInhQ" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/HuuYwUxM-ZY" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/VJ0vibC_Hl0" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
12:14
<iframe src="https://www.youtube.com/embed/BwC01zoSRBc" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/-7taKQnndfo" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/EGF4G2feXx4" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/MYucYon2-lk" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>
<iframe src="https://www.youtube.com/embed/ITksU31c1WY" class="embedded_video" allowfullscreen="" type="text/html" id="ytplayer" frameborder="0"></iframe>

抱歉, 原来贴 Youtube 链接会被自动解析的…

没事,谢谢你的分享。这些也是我非常感兴趣的话题。

最近分析了一下 V2EX 如果要从 Python 2 升级到 Python 3 的障碍,目前看来是基本可行的。

话说我最近也给一个开源项目贡献代码,帮助做 Python 2 到 Python 3 的升级。如果 v2 升级过程中有需要帮忙的话尽管邮件我,我会尽力协助。

具体过程等尘埃落定后也准备写 blog 来介绍

感谢推荐

回到顶部