It is a problem that goes and returns back periodically: how can I download YouTube videos? It's not a problem that has a forever solution since YouTube changes things. Maybe it does so also to make it harder to download videos: we must "pass" through them, so they can control contents better (DRM!) and earn through traffic and data we produce using their
service... Even supposing we're on the dark side of believing their shameless lies (rather than on the bright side of thinking they are just taming us turning our bodies in batteries that produce electro-money only for them), we could wish to download a video the author released after some permissive license.
Searching for already-made mass solutions we are just caught in the ads-worlds. So we seek a little bit differently, not too much, and we find interesting sites giving real solutions. I've started my alone researches but dropped them since I've found these working solutions (this is one of the cases you're happy to find people smarter and more efficient than you). Nonetheless I will write some of my researches here, they could be of interest for someone now or in the future.
But first let's see working solutions I've tried (my video target was always the same but I've no reasons why to think they should not work with other videos). Not everybody is able to use these solution and because of this I am working on a C# porting of one of these. Hope I will finish it (I am not a C# programmer, but it's time I start to taste it a bit...)
- GAWK solution. This was the second found code I've tried and the first to work. Unluckly perl scripts (one-liners) by the same author (see his other post) failed. I suspect it could be only because of the User-Agent, but I've not done tests yet. If it would be so, there's an easy fix.
- Youtube-dl.py; this one looks cool, it seems to support several sites (despite its name), it looks well-written, and the -g option can be specified if one wants to use his/her custom downloader: I've tried wget (without spoofing) and worked!
And now, the part no-one is interested in: my researches. Read at your own risk (if you can waste your time I suggest studying the code of the gawk or python solutions, rather than reading what follows). If you want to read it, consider it as a muddle of scattered thoughts; possible audience
maybe should be a little bit computer literate.
Discontinued analysis/study of the YT case
Request URL is simply
/watch?v=ID where ID is a video_id identifying the video. This brings us to
/v/ID, through the browser it sends back a compressed swf file. Disassembling the file with flasm we see defining a set of variables; one seems to hold the URL of the skin of the player (another swf file at
http://s.ytimg.com/yt/swf/cps-vfl165272.swf at least in this case);
video_id set to the same value of ID; a variable
sk holds a key, it changes every time I download the file. They may appear other variables "mirroring" URL "parameters". Searching it seems like the work this swf file does was indeed done by a simple html file in ancient ages...
At some point this swf contains code that seems to construct a URL, so I followed it a bit and wrote the following pieces. Not so interesting after all :( A a little bit more readable form of the flasm-flavoured flash disassembled code is
main = function ('clip') ( ... )
{
loadClip = createEmptyMovieClip ...........
r:2 = new MovieClipLoader .................
clip.addCallback .......
. (nothing interesting here...)
.
.
r:3 = clip.swf
r:4 = clip.swf.split("/")[2]
r:4 == "s.ytimg.com"
branchIfTrue label5
.
.
.
label5:
loadClip.loadClip(r:3, r:2)
}
In case the domain is not
s.ytimg.com, it builds an URL; this is not the case but it could be interesting.
* CASE B *
r:5 = clip.swf.indexOf("-vfl")
r:6 = clip.swf.indexOf(".swf")
r:7 = clip.swf.indexOf("/swf/") + 5
r:8 = "cps"
if not (r:5 > -1) then
r:8 = clip.swf.substring(r:7, r:6)
end if
r:9 = loadClip._url.split("/")[2]
r:3 = "http://" + r:9 + "/swf/" + r:8 + ".swf"
The interesting part seems to be when the domain is not
s.ytimg.com. But it appears the
_url variable of clip, which is not set nowhere here... in this case at least. The domain matchs so no need to have _url set, but I wonder when it does not match. I suspect indeed this one is the wrong swf to look for. Maybe URL parameters may change things and this very same code serves other "purposes" too. Interesting to note that this flash says
System.security.allowDomain("*")
So theoretially this swf is usable also externally; this is obvious thinking about embedding. The set of variables the code assigns is:
iurl = 'http://i4.ytimg.com/vi/ID/hqdefault.jpg'
el = 'embedded'
fs = '1'
title = '...'
avg_rating = '4.7547...'
video_id = 'ID'
length_seconds = '..'
allow_embed = '1'
swf = 'http://s.ytimg.com/yt/swf/cps-vfl165272.swf'
sk = 'TK755pvEYU-oGqmzRTwz7fq1dipYreRnC'
rel = '1'
cr = 'US'
eurl = ''
I am wondering what happens if
allow_embed is 0 and I modify it into 1 and use this as "embedding" trampoline.
In the html page of the video (the one we get with
/watch?v=ID there are alternate addresses, serving informations in JSON+OMBED or XML+OEMBED,
oembed, flying around these we can find an address to use with the
RTS Protocol, tried this road, mplayer understand the protocol, but it seems the Google RTSP Server does not like too much it and stop the connection. (
SDP used too).
Once upon a time it existed a so called get_video API, it seems to work still but it is different the way we can get the needed parameters (see youtube-dl.py with -g option), which are different too. In the URL given by youtube-dl.py appear video_id (which is ID), t (token... could it be sk? They are not the same, it seems), eurl (null...), el (detailpage), ps (default), gl (US), hl (en); some appears in the analysed swf too. But the most important is for sure the token (t).
Discontinued. Youtube-dl.py works, I'll look how it acts and write something runnable on Windows machine by people not interested in installing python on their system (bad very bad).