Python中使用TensorFlow和Google物体识别API识别电影中的物体（以《生活大爆炸》为例）

使用 google 物体识别的 api 来识别电影里面的物体。目前可以做到每秒 30 帧。

https://github.com/scotthuang1989/object_detection_with_tensorflow

分享自 TensorFlow QQ 群的彼岸：314148034

Target

detect people in a video or from camera. can choose what detection method we are using. Project Structure

From google

object label and detecion api data object_detection My module

myutil: help function object_detection_tf.py: main module for detecting object Usage

run following command python object_detection_tf.py -v /home/scott/Videos/S11E03.mp4

you need replace the video file with your choice Current Status

on my nvidia 1060 (6GB RAM ), the speed is ~25 frame per second, but gpu utilization is below 30%, I will use multiprocessing to speed up.

更多机器学习资源： http://www.tensorflownews.com/

Python中使用TensorFlow和Google物体识别API识别电影中的物体（以《生活大爆炸》为例）

wuwangju 1楼

我理解你想用TensorFlow结合Google的物体识别API来分析《生活大爆炸》剧集中的物体。不过这里有个概念需要澄清：Google物体识别API是云端服务，而TensorFlow是本地深度学习框架。通常我们会选择其中一种方案。

方案一：使用TensorFlow Object Detection API（本地部署） 这是更常见的做法，可以在本地视频帧上运行检测模型：

import cv2
import numpy as np
import tensorflow as tf
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util

# 1. 加载预训练模型和标签
MODEL_NAME = 'ssd_mobilenet_v2_coco_2018_03_29'
PATH_TO_CKPT = f'{MODEL_NAME}/frozen_inference_graph.pb'
PATH_TO_LABELS = 'mscoco_label_map.pbtxt'

detection_graph = tf.Graph()
with detection_graph.as_default():
    od_graph_def = tf.compat.v1.GraphDef()
    with tf.io.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
        od_graph_def.ParseFromString(fid.read())
        tf.import_graph_def(od_graph_def, name='')

# 加载标签映射
category_index = label_map_util.create_category_index_from_labelmap(
    PATH_TO_LABELS, use_display_name=True)

# 2. 处理视频帧
def detect_objects_in_video(video_path, output_path):
    cap = cv2.VideoCapture(video_path)
    frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    out = cv2.VideoWriter(output_path, fourcc, fps, 
                         (frame_width, frame_height))
    
    with detection_graph.as_default():
        with tf.compat.v1.Session(graph=detection_graph) as sess:
            while cap.isOpened():
                ret, frame = cap.read()
                if not ret:
                    break
                
                # 转换为RGB并扩展维度
                image_np_expanded = np.expand_dims(
                    cv2.cvtColor(frame, cv2.COLOR_BGR2RGB), axis=0)
                
                # 获取张量
                image_tensor = detection_graph.get_tensor_by_name(
                    'image_tensor:0')
                boxes = detection_graph.get_tensor_by_name(
                    'detection_boxes:0')
                scores = detection_graph.get_tensor_by_name(
                    'detection_scores:0')
                classes = detection_graph.get_tensor_by_name(
                    'detection_classes:0')
                num_detections = detection_graph.get_tensor_by_name(
                    'num_detections:0')
                
                # 运行检测
                (boxes, scores, classes, num_detections) = sess.run(
                    [boxes, scores, classes, num_detections],
                    feed_dict={image_tensor: image_np_expanded})
                
                # 可视化结果
                vis_util.visualize_boxes_and_labels_on_image_array(
                    frame,
                    np.squeeze(boxes),
                    np.squeeze(classes).astype(np.int32),
                    np.squeeze(scores),
                    category_index,
                    use_normalized_coordinates=True,
                    line_thickness=2)
                
                out.write(frame)
    
    cap.release()
    out.release()
    cv2.destroyAllWindows()

# 使用示例
detect_objects_in_video('big_bang_theory_episode.mp4', 'output_detected.mp4')

方案二：使用Google Cloud Vision API（云端服务） 如果坚持要用Google API，需要先安装客户端库并设置认证：

from google.cloud import videointelligence_v1 as vi
import io

def detect_objects_google_cloud(video_path):
    client = vi.VideoIntelligenceServiceClient()
    
    with io.open(video_path, 'rb') as f:
        video_content = f.read()
    
    features = [vi.Feature.OBJECT_TRACKING]
    
    operation = client.annotate_video(
        request={
            "features": features,
            "input_content": video_content
        }
    )
    
    result = operation.result(timeout=180)
    
    # 解析结果
    for annotation in result.annotation_results[0].object_annotations:
        print(f"物体: {annotation.entity.description}")
        print(f"置信度: {annotation.confidence:.2%}")
        for track in annotation.tracks:
            print(f"出现时间: {track.segment.start_time_offset} - "
                  f"{track.segment.end_time_offset}")

实际建议：

对于连续视频分析，TensorFlow本地方案更合适
需要先下载预训练模型（如COCO数据集训练的模型）
处理电视剧集时考虑采样关键帧以提高效率

用TensorFlow Object Detection API更实际。