ShinTakezou's Blog: The YT case (part one)

It is a problem that goes and returns back periodically: how can I download YouTube videos? It's not a problem that has a forever solution since YouTube changes things. Maybe it does so also to make it harder to download videos: we must "pass" through them, so they can control contents better (DRM!) and earn through traffic and data we produce using their service... Even supposing we're on the dark side of believing their shameless lies (rather than on the bright side of thinking they are just taming us turning our bodies in batteries that produce electro-money only for them), we could wish to download a video the author released after some permissive license.

Searching for already-made mass solutions we are just caught in the ads-worlds. So we seek a little bit differently, not too much, and we find interesting sites giving real solutions. I've started my alone researches but dropped them since I've found these working solutions (this is one of the cases you're happy to find people smarter and more efficient than you). Nonetheless I will write some of my researches here, they could be of interest for someone now or in the future.

But first let's see working solutions I've tried (my video target was always the same but I've no reasons why to think they should not work with other videos). Not everybody is able to use these solution and because of this I am working on a C# porting of one of these. Hope I will finish it (I am not a C# programmer, but it's time I start to taste it a bit...)

GAWK solution. This was the second found code I've tried and the first to work. Unluckly perl scripts (one-liners) by the same author (see his other post) failed. I suspect it could be only because of the User-Agent, but I've not done tests yet. If it would be so, there's an easy fix.

Youtube-dl.py; this one looks cool, it seems to support several sites (despite its name), it looks well-written, and the -g option can be specified if one wants to use his/her custom downloader: I've tried wget (without spoofing) and worked!

And now, the part no-one is interested in: my researches. Read at your own risk (if you can waste your time I suggest studying the code of the gawk or python solutions, rather than reading what follows). If you want to read it, consider it as a muddle of scattered thoughts; possible audience maybe should be a little bit computer literate.

Discontinued analysis/study of the YT case

Request URL is simply /watch?v=ID where ID is a video_id identifying the video. This brings us to /v/ID, through the browser it sends back a compressed swf file. Disassembling the file with flasm we see defining a set of variables; one seems to hold the URL of the skin of the player (another swf file at http://s.ytimg.com/yt/swf/cps-vfl165272.swf at least in this case); video_id set to the same value of ID; a variable sk holds a key, it changes every time I download the file. They may appear other variables "mirroring" URL "parameters". Searching it seems like the work this swf file does was indeed done by a simple html file in ancient ages...

At some point this swf contains code that seems to construct a URL, so I followed it a bit and wrote the following pieces. Not so interesting after all :( A a little bit more readable form of the flasm-flavoured flash disassembled code is

main = function ('clip') ( ... )
{
  loadClip = createEmptyMovieClip ...........
  r:2 = new MovieClipLoader .................
  clip.addCallback .......
  
  . (nothing interesting here...)
  .
  .
  
  r:3 = clip.swf
        i.e.  "http://s.ytimg.com/yt/swf/cps-vfl165272.swf"
                         1         2         3         4
               0123456789012345678901234567890123456789012
  r:4 = clip.swf.split("/")[2]
        gets the domain ("s.ytimg.com")
   
  r:4 == "s.ytimg.com"
  branchIfTrue label5

        i.e. is the domain s.ytimg.com? Yes, in this case, so
        branch
  .
  . (see CASE B code)
  .
label5:
  loadClip.loadClip(r:3, r:2)

        where r:3 is clip.swf, see above
          and r:2 is an instance of MovieClipLoader
}

In case the domain is not s.ytimg.com, it builds an URL; this is not the case but it could be interesting.

* CASE B *

r:5 = clip.swf.indexOf("-vfl")
          r:5 is 29 in my case
r:6 = clip.swf.indexOf(".swf")
         r:6 is 39 in my case
r:7 = clip.swf.indexOf("/swf/") + 5
         r:7 is 21 + 5 (index pointing to cps- or whatever
                comes after /swf/ part
r:8 = "cps"
if not (r:5 > -1) then
     // -vfl not found in the string clip.swf
     r:8 = clip.swf.substring(r:7, r:6)
           i.e. everything after /swf/, less the three letters ext
end if

r:9 = loadClip._url.split("/")[2]
r:3 = "http://" + r:9 + "/swf/" + r:8 + ".swf"
      i.e. builds
         http://domain of _url/swf/thing.swf

The interesting part seems to be when the domain is not s.ytimg.com. But it appears the _url variable of clip, which is not set nowhere here... in this case at least. The domain matchs so no need to have _url set, but I wonder when it does not match. I suspect indeed this one is the wrong swf to look for. Maybe URL parameters may change things and this very same code serves other "purposes" too. Interesting to note that this flash says

System.security.allowDomain("*")

So theoretially this swf is usable also externally; this is obvious thinking about embedding. The set of variables the code assigns is:

// the "cover" of the video
iurl = 'http://i4.ytimg.com/vi/ID/hqdefault.jpg'
el   = 'embedded'  // embedded where? in the default flash player?
fs   = '1'         // full screen
title = '...'
avg_rating = '4.7547...'  // wow how many digits for the rating!
video_id = 'ID'
length_seconds = '..' // number for its length
allow_embed = '1'     // interesting...
swf = 'http://s.ytimg.com/yt/swf/cps-vfl165272.swf'
// Security Key? ?
sk  = 'TK755pvEYU-oGqmzRTwz7fq1dipYreRnC' // or alike
rel = '1'
cr  = 'US'
eurl = ''

I am wondering what happens if allow_embed is 0 and I modify it into 1 and use this as "embedding" trampoline.

In the html page of the video (the one we get with /watch?v=ID there are alternate addresses, serving informations in JSON+OMBED or XML+OEMBED, oembed, flying around these we can find an address to use with the RTS Protocol, tried this road, mplayer understand the protocol, but it seems the Google RTSP Server does not like too much it and stop the connection. (SDP used too).

Once upon a time it existed a so called get_video API, it seems to work still but it is different the way we can get the needed parameters (see youtube-dl.py with -g option), which are different too. In the URL given by youtube-dl.py appear video_id (which is ID), t (token... could it be sk? They are not the same, it seems), eurl (null...), el (detailpage), ps (default), gl (US), hl (en); some appears in the analysed swf too. But the most important is for sure the token (t).

Discontinued. Youtube-dl.py works, I'll look how it acts and write something runnable on Windows machine by people not interested in installing python on their system (bad very bad).

ShinTakezou's Blog

2010-05-24

The YT case (part one)

Discontinued analysis/study of the YT case

1 comment:

Other links