Scraping and converting RealMedia streams to .mp4

This handy little script is a way to convert RealMedia streams (.rm, .ram, etc) to .mp4.

The problem this script – crudely – solved began with a video server containing dozens of recorded lectures in RealMedia format, which were proving beastly in conversion. Some were created and served by Midd’s own streaming media tool, while others were created by an Accordent system from long ago.  Just downloading the .rm files from the server and running them through ffmpeg (or RealPlayer Converter for Windows) left the audio and video tracks wildly out of sync; 1 endless tinkering with codecs and settings didn’t improve the situation. Left with few other options, I wrote this script to dump the A/V streams in real time and convert them to .mp4 via ffmpeg.

Note: the scripted method is SLOOOOOOW and wonky and resource-heavy.  The time to download and convert a .rm video is around 130% of the playback duration.

Available on GitHub

Requires:

  • ffmpeg (or avconv) — note: I am too ignorant to have an opinion on W[ever]TF is going on between these two projects and use whatever the repos give me, but I think RHEL (work computer) and Debian (home computer) are each providing only one of the two.  Pretty sure they share commands and are more or less identical, but this could be worth paying attention to in the future.
  • mplayer
  • libx264
  • libfaac
  • a plaintext list of URLs for the RealMedia streams

Usage: python script.py {urllist.txt}

import urllib2
import urlparse
import os
import commands
import sys

# this script requires mplayer and ffmpeg

argv = sys.argv
infile = open(argv[1], 'r')
# script assumes input (per argv[1]) is a plaintext 
# list of HTTP URLs separated by UNIX style newlines

for url in infile:
  path = urlparse.urlparse(url).path
  ext = os.path.splitext(path)[1].rstrip()
  rmfile = url.split('/')[-1]
  mpegfile = rmfile.replace(ext, '.mp4')
  dumpCommand = 'mplayer -dumpstream '+str(urllib2.urlopen(url).read().rstrip())
  convCommand = 'ffmpeg -i stream.dump -c:v libx264 -c:a libfaac -b:a 32k '+ mpegfile
  cleanCommand = 'rm stream.dump'
  os.system(dumpCommand)
  os.system(convCommand)
  os.system(cleanCommand)

Turn it on, and let that sucker run for hours, days, or weeks until you’ve got what you want.

Converting local .rm files

Later on there were a number of files from the old Accordant server for which a realtime stream was not an option, so I did eventually figure out that the sync problems were the result of some technical issues particular to RealMedia (not going to go into details here).  Local .rm files could be made to play back correctly in mplayer with:

mplayer -autosync 120 -ao pulse -cache 8192 -ni [filename]

But, conversion to .mp4 took still more effort, separately transcoding the video and audio streams to get everything in sync:

mencoder [infile.rm] -oac pcm -ovc x264 -ni -o [newfile.rm]

Followed by:

ffmpeg -i [newfile.rm] -c:v libx264 -c:a libfaac -b:a 32k [outfile.mp4]

For ~90% of the previously unsyncable local files, this method worked.  However, the script still has some utility in that it lets one pretty effectively rip RealMedia streams to which one has no other means of access.

See the results!

The videos, now converted to .mp4 and put up on Internet Archive, are available via the Middlebury College Digital Lecture Archive.

Notes:

  1. My beloved hometown’s own Walker Art Center has documented a Mac-based workflow that runs into the same sync issues, with different solution, here.