interview-transcription

安装量: 38
排名: #18752

安装

npx skills add https://github.com/jamditis/claude-skills-journalism --skill interview-transcription

Interview transcription and management Practical workflows for journalists managing interviews from preparation through publication. When to activate Preparing questions for an interview Processing audio/video recordings Creating or managing transcripts Organizing notes from multiple sources Building a source relationship database Generating timestamped quotes for fact-checking Converting recordings to publishable quotes Pre-interview preparation Research checklist Before recording starts, you should already know:

Source prep for: [Name]

Background

Role/title:

Organization:

Why they're relevant to this story:

Previous media appearances (note inconsistencies):

Key questions (prioritized) 1. [Must-ask question] 2. [Must-ask question] 3. [If time permits]

Documents to reference

[ ] Bring/share [specific document]

[ ] Ask about [specific claim/data point]

Red lines

Topics they'll likely avoid:

Sensitive areas to approach carefully: Recording setup

Standard recording configuration

RECORDING_SETTINGS

{ 'format' : 'wav' ,

Lossless for transcription

'sample_rate' : 44100 ,

Standard quality

'channels' : 1 ,

Mono is fine for speech

'backup' : True ,

Always run backup recorder

}

File naming convention

YYYY-MM-DD_source-lastname_topic.wav

Example: 2024-03-15_smith_budget-hearing.wav

Two-device rule
Always record on two devices. Phone as backup minimum. Transcription workflows Automated transcription pipeline from pathlib import Path import subprocess import json def transcribe_interview ( audio_path : str , output_dir : str = "./transcripts" ) -

dict : """ Transcribe using Whisper with speaker diarization. Returns transcript with timestamps. """ Path ( output_dir ) . mkdir ( exist_ok = True )

Use whisper.cpp or OpenAI Whisper

result

subprocess . run ( [ 'whisper' , audio_path , '--model' , 'medium' , '--output_format' , 'json' , '--output_dir' , output_dir , '--language' , 'en' , '--word_timestamps' , 'True' ] , capture_output = True )

Load and return structured transcript

json_path

Path ( output_dir ) / f" { Path ( audio_path ) . stem } .json" with open ( json_path ) as f : return json . load ( f ) def format_for_editing ( transcript : dict ) -

str : """Convert to journalist-friendly format with timestamps.""" lines = [ ] for segment in transcript . get ( 'segments' , [ ] ) : timestamp = format_timestamp ( segment [ 'start' ] ) text = segment [ 'text' ] . strip ( ) lines . append ( f"[ { timestamp } ] { text } " ) return '\n\n' . join ( lines ) def format_timestamp ( seconds : float ) -

str : """Convert seconds to HH:MM:SS format.""" h = int ( seconds // 3600 ) m = int ( ( seconds % 3600 ) // 60 ) s = int ( seconds % 60 ) return f" { h : 02d } : { m : 02d } : { s : 02d } " Manual transcription template For sensitive interviews or when AI transcription fails:

Transcript: [Source] - [Date]
**
Recording file
**
[filename]
**
Duration
**
[XX:XX]
**
Transcribed by
**
[name]
**
Verified against recording
**
[ ] Yes / [ ] No

[00:00:15]
**
Q
**
[Your question]
[00:00:45]
**
A
**
[Source response - verbatim, including ums, pauses noted as (...)]
[00:01:30]
**
Q
**
[Follow-up]
[00:01:42]
**
A
**
[Response]

Notes

[Anything not captured in audio: gestures, documents shown, etc.]

Potential quotes

[00:01:42] "Quote that stands out" - context: [why it matters] Quote extraction and verification Pull quotes workflow from dataclasses import dataclass from typing import Optional import re @dataclass class Quote : text : str timestamp : str speaker : str context : str verified : bool = False used_in : Optional [ str ] = None class QuoteBank : """Manage quotes from interview transcripts.""" def init ( self ) : self . quotes = [ ] def extract_quote ( self , transcript : str , start_time : str , end_time : str , speaker : str , context : str ) -

Quote : """Extract and store a quote with metadata."""

Pull text between timestamps

pattern

rf'[ { re . escape ( start_time ) } ](.+?)(?=[\d|$)' match = re . search ( pattern , transcript , re . DOTALL ) if match : text = match . group ( 1 ) . strip ( ) quote = Quote ( text = text , timestamp = start_time , speaker = speaker , context = context ) self . quotes . append ( quote ) return quote return None def verify_quote ( self , quote : Quote , audio_path : str ) -

bool : """Mark quote as verified against original recording."""

In practice: listen to audio at timestamp, confirm accuracy

quote . verified = True return True def export_for_story ( self ) -

str : """Export verified quotes ready for publication.""" output = [ ] for q in self . quotes : if q . verified : output . append ( f'" { q . text } "\n— { q . speaker } \n[Timestamp: { q . timestamp } ]' ) return '\n\n' . join ( output ) Quote accuracy checklist Before publishing any quote: - [ ] Listened to original recording at timestamp - [ ] Quote is verbatim (or clearly marked as paraphrased) - [ ] Context preserved (not cherry-picked to change meaning) - [ ] Speaker identified correctly - [ ] Timestamp documented for fact-checker - [ ] Source approved quote (if agreement made) Source management database Interview tracking schema from dataclasses import dataclass , field from datetime import datetime from typing import List , Optional from enum import Enum class SourceStatus ( Enum ) : ACTIVE = "active"

Currently engaged

DORMANT

"dormant"

Not recently contacted

DECLINED

"declined"

Refused to participate

OFF_RECORD

"off_record"

Background only

class InterviewType ( Enum ) : ON_RECORD = "on_record" BACKGROUND = "background" DEEP_BACKGROUND = "deep_background" OFF_RECORD = "off_record" @dataclass class Source : name : str organization : str contact_info : dict

email, phone, signal, etc.

beat : str status : SourceStatus = SourceStatus . ACTIVE interviews : List [ 'Interview' ] = field ( default_factory = list ) notes : str = ""

Relationship tracking

first_contact : Optional [ datetime ] = None trust_level : int = 1

1-5 scale

@dataclass class Interview : source : str date : datetime interview_type : InterviewType recording_path : Optional [ str ] = None transcript_path : Optional [ str ] = None story_slug : Optional [ str ] = None key_quotes : List [ str ] = field ( default_factory = list ) follow_up_needed : bool = False notes : str = "" Quick source lookup def find_sources_for_story ( sources : List [ Source ] , topic : str , beat : str = None ) -

List [ Source ] : """Find relevant sources for a new story.""" matches = [ ] for source in sources :

Filter by beat if specified

if beat and source . beat != beat : continue

Only suggest active sources

if source . status != SourceStatus . ACTIVE : continue

Check if they've spoken on similar topics

for interview in source . interviews : if topic . lower ( ) in interview . notes . lower ( ) : matches . append ( source ) break

Sort by trust level

return sorted ( matches , key = lambda s : s . trust_level , reverse = True ) Audio/video processing Batch processing multiple recordings from pathlib import Path from concurrent . futures import ProcessPoolExecutor import json def batch_transcribe ( recordings_dir : str , output_dir : str ) -

dict : """Process all recordings in a directory.""" recordings = list ( Path ( recordings_dir ) . glob ( '.wav' ) ) + \ list ( Path ( recordings_dir ) . glob ( '.mp3' ) ) + \ list ( Path ( recordings_dir ) . glob ( '*.m4a' ) ) results = { } with ProcessPoolExecutor ( max_workers = 4 ) as executor : futures = { executor . submit ( transcribe_interview , str ( rec ) , output_dir ) : rec for rec in recordings } for future in futures : rec = futures [ future ] try : transcript = future . result ( ) results [ rec . name ] = { 'status' : 'success' , 'transcript' : transcript } except Exception as e : results [ rec . name ] = { 'status' : 'error' , 'error' : str ( e ) } return results Video interview extraction import subprocess def extract_audio_from_video ( video_path : str , output_path : str = None ) -

str : """Extract audio track from video for transcription.""" if output_path is None : output_path = video_path . rsplit ( '.' , 1 ) [ 0 ] + '.wav' subprocess . run ( [ 'ffmpeg' , '-i' , video_path , '-vn' ,

No video

'-acodec' , 'pcm_s16le' ,

WAV format

'-ar' , '44100' ,

Sample rate

'-ac' , '1' ,

Mono

output_path ] , check = True ) return output_path Legal and ethical considerations Consent documentation

Recording consent record
**
Date
**
:
**
Source name
**
:
**
Recording type
**
[ ] Audio [ ] Video
**
Interview type
**
[ ] On record [ ] Background [ ] Off record

[ ] Email confirmation of consent

Jurisdiction notes:

Interview location state/country:

Any specific restrictions agreed:

Agreed terms:

[ ] Full attribution allowed

[ ] Organization attribution only

[ ] Anonymous source

[ ] Review quotes before publication

[ ] Embargo until [date]: Two-party consent states (US) California, Connecticut, Florida, Illinois, Maryland, Massachusetts, Michigan, Montana, Nevada, New Hampshire, Pennsylvania, Washington require all-party consent. Always get explicit consent on recording regardless of jurisdiction. Tools and resources Tool Purpose Notes Whisper Local transcription Free, accurate, private Otter.ai Cloud transcription Real-time, speaker ID Descript Edit audio like text Good for pulling clips Rev Human transcription For sensitive/legal Trint Journalist-focused Collaboration features oTranscribe Free web player Manual transcription aid

返回排行榜