UPDATE (10/03/2012): After having a long discussion on Subler's OCR'ing capabilities HERE, I've played a bit with SubRip to find out how it recognizes Blu-ray subtitles. For the test, I've used several BD discs, including Iron Sky and the international version of Red Cliff I.
Unfortunately, the current (1.50b5) version of SubRip is completely incompatible with HD VobSubs - that is, not only the original S_HDMV/PGS subs, but even the (standard-format) output files of BDSup2Sub.
This is not just an incompatiblity with OCR, but even the subpictures - that is, you can't just save the contents of the VobSub as a series of pictures, which, then, you could just import to, say, OmniPage Pro (or other, "serious" OCR apps) for character recognition.
If you, upon importing in BDSup2Sub, do downsize the individual images to PAL / NTSC by enabling the “Convert Resolution” checkbox and selecting either PAL or NTSC resolution in the drop-down list (see the annotations below):
then, you can create a VobSub file more or less compatible with SubRip. Unfortunately, about half of the frames will be completely skipped (unrecognized) by the app. An example run with the beginning of the English subtrack of Iron Sky, showing just garbage for an, otherwise, completely legal subtitle page:
and of Red Cliff I:
Unfortunately, this not only applies to the OCR mode, but also the plain image exporting mode (“Save subpictures as BMP”) - the majority of the exported images will be just empty.
All in all, you can't use SubRip to process BD subtitles in any way: neither OCR'ing nor image exporting work. Unfortunately, BDSup2Sub can't export a series of plain images for further OCR'ing in a third-party app either.
UPDATE (09/11/2012, even later):
To help you choosing and configuring an iOS player capable of displaying bitmap subtitles, I've done some additional work. Again, you'll want to prefer these kinds of (original) subtitles (subs for short) to recognized (OCR'ed) subs. While Subler's OCR engine is great, it has problems. For example, it doesn't support several languages; for example, Swedish. By the way, this is why the demo M4V video (again, it's HERE; feel free to play with it, import it into iOS media players, check out its subtitle tracks etc.) has a pretty much messed-up OCR'ed Swedish track – unlike with Finnish and English, Subler couldn't use a Swedish dictionary when OCR'ing.
Even with languages that have their dictionaries will have problems. For example, with the Finnish subtitle track of Iron Sky, Subler has a tendency to make one word of two originals while recognizing – and, of course, “recognize” “Ä” as “A”. Therefore, consider OCR'ed textual subtracks as “fallback” ones when there's absolutely no way of displaying the original, bitmap subs.
Therefore, I've played quite a bit with the, for hardware video playback with bitmap subtitles, two recommended media players for iOS, ProPlayer and AVPlayerHD. Note that It's Playing and GoodPlayer, the other, otherwise, most recommended players, can't render bitmap subs at all / while hardware decoding, respectively. HERE's a screenshot of GoodPlayer rendering the English subtrack of the test video while using (at 1080p, uselessly slow) software decoding. Interestingly, XBMC (which I still don't recommend for owners of the iPad 3 as it still lacks Retina support) can't render these bitmap subtitles at all.
Unfortunately, AVPlayerHD crashes right away when loading these files. Therefore, you'll need to stay with ProPlayer (AppStore link). The latter is a bit less capable than AVPlayerHD – but, fortunately, MP4 playback-wise, it's really reliable and knows something other players don't: in addition to textual ones, it can render bitmap subs on top of video played back using hardware acceleration. A screenshot of it doing so:
As with AVPlayerHD, in ProPlayer, you'll need to select the subtitle track before starting (or while pausing) playback by tapping Edit in the top right corner of the file list and, then, selecting the just-appearing right arrow next to the video. In the following screenshot, I've annotated both (as with all the screenshots of this update, click them for the original-sized ones):
In the new dialog, tap “Edit” in the “Subtitles” row (annotated below) and select a track from the list:
Unfortunately, unlike most players, ProPlayer doesn't make an attempt to show the user-applied (like “text” in the tutorial below) tags or even the language of the subs – all you get is a numbered list as you can see in the above screenshot. If you have several subtracks in a video, this can be really annoying. To quickly find out which number belongs to which track, open the MP4 file in Subler and check out the order of any subtitle AND text tracks.
For example, in the sample video I've provided you, you'll see the following. I've annotated every track that belongs to the above categories (either subtitle or a generic – in this case, chapter – track):
Based on this, it's very easy to decode what each number means. For example, if you tap “Subtitles 1” in ProPlayer's dialog, the OCR'ed English text will be shown etc. The full list is as follows:
0: Finnish TXT
1: English TXT
2: Swedish TXT
3: chapter track (don't tap it - nothing will be shown!)
4: Finnish VobSub
5: English VobSub
6: Swedish VobSub
Finally, as I've previously stated, as with iTunes on the desktop, the stock (built-in) Videos player renders the text (but, of course, not the bitmap!) subtites added following my tutorial just fine (as does iTunes on the desktop):
UPDATE (09/11/2012):
1.) I've modified the last part of the tutorial - the one that explains how DVD-formatted (that is, VobSub) bitmap subs can be included in the target MP4 / MOV / M4V files so that capable, third-party players (for example, VLC on desktop computers and AVPlayerHD on iOS) can render them. The new version shows how you can render the subs at the bottom of the screen.
2.) I've uploaded a remuxed version of the original MKV video with both textual and VobSub subtracks HERE, should you want to give its playback a try without going through the entire conversion process. Note that this file is also optimized, of which I'll publish a separate article and which is essential, should you want to stream the video from your iTunes to your Apple TV. I've talked about optimizing (for example, recognizing optimized M4V files) HERE (and HERE's a reader's feedback ;-) ).
UPDATE (some hours later):
1.) You don't need to issue any command-line commands to extract Blu-ray (BD) SUP subtracks from BD MKV files if you use the brother of the excellent (see THIS) MP4Tools, MKVTools. As with MP4Tools, you'll want to prefer downloading the beta from THIS page. For our purposes (subtrack extraction), it'll be just fine - the omissions / bugs in beta don't affect subtitle extracting.
All you need to do is as follows:
1, open the BD MKV file in MKVTools (“Open” button in the top right corner)
2, select the S_HDMV/PGS subtracks (topmost red rectangle in the screenshot below)
3, switch to the “Edit Tracks” tab (middle rectangle in the screenshot below)
4, click the “Go” button (bottom-most rectangle in the screenshot below):
Then, just go on with the BD SUP → VobSub conversion using BDSup2Sub. Note that, after the conversion, you won't be able to add the IDX file back to the MKV file with the same MKVTools using its “Add track” button. You'll still need to use MKVtoolnix for that (see the original article on adding IDX files back to MKV's).
Note: you do NOT need to use MKVTools at all, should you want to avoid paying for it or you want to use as few apps as possible - the original workflow works great, it's "just" a bit harder and slower as it involves manual invocations of mkvextract from Terminal.
2.) There is a discussion of Blu-ray and DVD ripping with subtitles HERE and HERE, respectively. Make sure you check out my posts there – for example, THIS (a quick DVD ripping and OCR'ing workflow) and THIS (BD sub conversion)!
Original article:
After yesterday's article on Subler, let me present you some additional tips and tricks for the excellent remuxer tool, Subler. Today, I'll speak of a fairly new and really excellent feature of Subler: optical character recognition (OCR for short) to quickly recognize the subtitles in bitmaps – that is, the default subtitle formats of DVD's, Blu-ray discs and DVB broadcasts. With this feature, you can very easily convert even the subtitle tracks of your DVD's and Blu-ray discs for playback / rendering on iOS devices – something impossible with the original, bitmap (non-textual) subtitles using the stock Videos player.
Subler's approach is vastly different from that of SubRip, the traditionally used app to OCR bitmap subtitles (dedicated article with video). Subler doesn't require any kind of manual character training: it does everything itself, taking its language-specific data from standard, language-specific dictionaries.
To activate the latter, for non-English languages, just copy the file http://code.google.com/p/tesseract-ocr/source/browse/trunk/tessdata/<language code>.traineddata to ~/Library/Application Support/Subler/tessdata (after creating the directory). For example, for Finnish, you'll need the fin.traineddata file. You can copy several language files there.
After this, if you open / import MKV files containing bitmap subtitles (unless you manually override the default “everything should be OCR'ed”), the subtitle tracks will be OCR'ed and exported as textual.
Note that, in this article, I pay special attention to including both the textual (OCR'ed) subtrack and the original graphical bitmap-based subtitle track. While Subler's OCR support is excellent, there still might be cases it recognizes something wrong. Then, it's better to be safe - you can always switch on displaying the embedded graphical subtrack in both desktop players like VLC and some third-party iOS ones like AVPlayerHD. (More on the latter in THIS article.) Then, you'll easily and quickly find out what has been recognized wrong and can avoid misunderstandings.
DVD subtitles (subs for short)
The workflow with DVD's is much simpler than both DVB (on which I'll publish a separate article) and Blu-ray sub imports: you won't need to use any additional software at all.
1, Open (File > Open) the MP4 / M4V file created by HandBrake (already having bitmap VobSub subs, as is explained HERE) from the original MKV created by MakeMKV (here, lupaus-title04-noburntinsubs.m4v; both this and the original MKV file can be downloaded from THIS article):
2, click the "+" button (annotated above) and select the original MKV file (also mentioned in Bullet 1). Deselect all the non-subtitle tracks (unless you also want to include for example additional audio tracks). The subtitle tracks' action will be “3GPP Text” meaning they will be OCR'ed - exactly what we need.
Click Add.
3, you'll see this:
You can safely save your file now. Note that you don't need to enable all the checkboxes of all the subtracks you want to save – while Subler only selects the first of them in the list, it'll, nevertheless, save them all.
After saving (during which Subler OCR's the just-added subtracks), the “Format” column of the just-imported subtitle tracks will change from “VobSub” to “3GPP Text”, showing they're now textual (annotated on the right; see below for the “Text” annotation on the left):
Here, you can also modify the name of the track so that you can easily see which track is bitmap and which is textual. For example, in the screenshot above, I've changed “Subtitle Track” to “Text” for all the textual subtracks (also annotated).
Now, VLC displays the new sub list the following way, making it easy to select the right subtrack based on its type (textual vs. bitmap):
Blu-ray subs
Unfortunately, as opposed to DVD subs, Subler doesn't support S_HDMV/PGS subtracks – the native sub format of Blu-ray dics. If you try to passthru them, Subler won't create a usable file; if you set Action to “3GPP Text” during opening the MKV file, no subtracks will be written to the target file.
Basically, you'll need to extract these subtracks, convert them to the Subler-friendly DVD-based IDX / SUB-format and re-add them to the MKV. Then, you'll already be able to add them, both in their original (bitmap) and OCR'ed form, to the target MP4's.
Let's start with subtraction. Unfortunately, my long-time favorite, iMkvExtract, doesn't support extracting these subtracks – it just doesn't export anything if you select one or more S_HDMV/PGS subtracks.
For this tutorial, I've selected a part of the Blu-ray version of the excellent Iron Sky movie where German is spoken so that I can provide you with a test video you can play with with three subtitle languages as there are no English subtitles for English speech – and the Behind the Scenes section of the disc only contains Finnish subs, not English / Swedish ones (the Blu-ray is only sold in Finland; this is why there are not even Swedish subs here). The video chunk is HERE – feel free to download it and play with its subtracks.
1, get and install MKVtoolnix (fortunately, it's a simple DMG file). Start it.
2, click Add (annotated below) and load the MKV file:
3, in the “Tracks, chapters and tags” list, look for entries starting with “S_HDMV/PGS”. Immediately following this type, in the parentheses, there will be some (track) ID's: in the above screenshot (also annotated), these are 5, 6 and 7.
4, for the next part, you'll need to switch to the Terminal to access the command-line interface of the mkvextract program directly. Fortunately, it's part of MKVtoolnix so you don't need to install it separately.
If you've dragged MKVtoolnix to Applications/Video, just issue the following command in Terminal (assuming you're in the same directory as your source MKV file; if you aren't, use the absolute / relative path to the MKV file):
/Applications/Video/Mkvtoolnix.app/Contents/MacOS/mkvextract tracks MKVfilename trackID1:outputSUPfilename1 [trackID2:outputSUPfilename2 [trackIDN:outputSUPfilenameN]].
For example, in our case with three subtracks with ID's 5, 6, 7 and with a source MKV file named “IronSkyMAIN-rip.mkv”, the command will look as follows:
/Applications/Video/Mkvtoolnix.app/Contents/MacOS/mkvextract tracks IronSkyMAIN-rip.mkv 5:sup1.sup 6:sup2.sup 7:sup3.sup
An example screenshot with the results:
Now, you'll need to convert these BD-specific sup files to traditional IDX / SUB pairs. Unfortunately, most of the traditional tools like SubtitleCreator 2.3rc1 (which I used in a previous article for DVB TS SUP -> IDX + SUB conversion) doesn't recognize the format; neither does SubMagic (which doesn't handle DVB TS SUP's either, BTW). The tool I recommend is, fortunately, fully OS X-compliant as it's written in Java: BDSup2Sub (dedicated thread). Just download BDSup2Sub.jar (the current, stable 4.0.1 version will be just fine) and double-click it.
When the GUI is displayed, select File > Load and load the SUP files, one by one. Just click OK on the first two dialogs to dismiss them; after that, select File > Save/Export and, there, after setting the export language, Save:
Now, to add the new, converted subtracks back to the MKV file, go back to MKVtoolnix and click the same Add button as above. Add the IDX files (only – no need to manually add the .sub files). You can mass-add them if you use the Cmd key while clicking for multiple selection. After adding the three of them, MKVtoolnix will show the following:
Now, just click “Start Muxing” at the bottom left. The MKV file will be muxed; now, with the DVD-format VobSub track, also compatible with Subler.
Now, what you will need to do is straightforward.
1, Open the MKV file in Subler. Don't touch anything in the open dialog: do NOT try enabling the “S_HDMV/PGS” subtracks!
2, Click Add and, then, you can save your video right away (Cmd + S): it'll have the OCR'ed audio tracks.
If you also want to save in the same target MP4 file both the DVD-compliant VobSub bitmap subtracks in addition to the just-created OCR'ed version of them, you'll need to do exactly the same as was the case with DVD's. While still having the just-remuxed (target) MP4 file in Subler, click + in the upper left corner, select the MKV file (again) and set every single VobSub track action to Passthru from the default 3GPP Text; also, don't forget to disable all the non-VobSub-subtitle-tracks (all audio/ video etc. tracks) so that they aren't duplicated in the target file:
To avoid the bitmap subtitles being shown with extra large, blown-up characters, you'll also want to decrease their size after(!!!) saving (Cmd + S). (Changes made before exporting VobSub tracks won't be visible.) To do this, click each of the just-added VobSub subtitle tracks (not the older textual ones!) and enter 1920 in the first field after Scaled Size (and press Tab) and 540 in the second, instead of the original 640 and 480, respectively (if it shows 0, make sure you save the file first!):
Now, you can just save the file. (Again, here, you can also change the subtrack names to reflect their being bitmaps.)
Why just 540? you may ask. I've found it the most ideal. When keeping the default one (after entering 1920 in the first textfield, it'll be computed to be 1440 as can also be seen in THIS screenshot), the bitmap subs will be in the center of the screen as can be seen in the following screenshot (click it for the original-sized one):
After changing the default 1440 to 540, the subtitle will be a bit distorted (vertically scaled) but, at least, displayed at the bottom of the screen: