Geek Blight - Origins of the youtube-dl project

Origins of the youtube-dl project

Posted on 2020-11-07T13:52Z. Updated on 2020-11-10T16:28Z.

As you may know, as of the time this text is being written youtube-dl’s repository at GitHub is blocked due to a DMCA takedown letter received by GitHub on behalf of the RIAA. While I cannot comment on the current maintainers' plans or ongoing discussions, in light of the claims made in that letter I thought it would be valuable to put in writing the first years of youtube-dl as the project creator and initial maintainer.

Copper thieves

All good stories need at least a villain so I have arbitrarily chosen copper thieves as the villains of the story that set in motion what youtube-dl is today. Back in 2006 I was living in a town 5 to 10 kilometers away from Avilés, which is itself a small city or town in northern Spain. While people in Avilés enjoyed some nice infrastructures and services, including cable and ADSL Internet access, the area I lived in lacked those advantages. I was too far away from the telephone exchange to enjoy ADSL and copper thieves had been stealing copper wires along the way to it for years, causing telephone service outages from time to time and making the telephone company replace those wires with weaker and thinner wires, knowing they would likely be stolen again. This had been going on for several years at that point.

This meant my only choice for home Internet access so far had been a dial-up connection and a 56k V.90 modem. In fact, connection quality was so poor I had to limit the modem to 33.6 kbps mode so the connection would be at least stable. Actual download speeds rarely surpassed 4 KB/sec. YouTube was gaining popularity then to the point it was purchased by Google at the end of that year.

Up all night to get some bits

Watching any YouTube video on the kind of connection I described above was certainly painful, as you can imagine. Any video that was moderately big would take ages to download. For example, a short 10 MB video would take, if you do the math, 40 minutes to download, making streaming impossible. A longer and higher-quality video would take several hours and render the connection unusable for other purposes while you waited for it to be available, not to mention the possibility of the connection being interrupted and having to start the download process again. Now imagine liking a specific video a lot after watching it and wanting to watch it a second or third time. Going through that process again was almost an act of masochism.

This situation made me interested in the possibility of downloading the videos I was trying to watch: if the video was interesting, having a copy meant I could watch it several times easily. Also, if the downloader was any good, maybe the download process could be resumed if the connection was interrupted, as it frequently was.

At the time, there were other solutions to download videos from YouTube, including a quite popular Greasemonkey script. By pure chance, none of the few I tested were working when I did, so I decided to explore the possibility of creating my own tool. And that is, more or less, how youtube-dl was born. I made it a command-line program so it would be easy to use for me and wrote it in Python because it was easy thanks to its extensive standard library, with the nice side effect that it would be platform independent.

An Ethereal start

The initial version of the program only worked for YouTube videos. It had almost no internal design whatsoever because it was not needed. It did what it had to do as a simple script that proceeded straight to the point. Line count was merely 223, with only 143 being actual lines of code, 44 for comments and 36 of them blank. The name was chosen out of pure convenience: youtube-dl was an obvious name, hard to forget, and it could be intuitively typed as “Y-O-U-TAB” in my terminal.

Having been using Linux for several years at that point, I decided to publish the program under a free software license (MIT for those first versions) just in case someone could find it useful. Back then, GitHub did not exist and we had to “make do” with SourceForge, which had a bit of a tedious form that you needed to fill to create a new project. So, instead of going to SourceForge, I quickly published it under the web space that my Internet provider gave me. While not usual today, it was common for ISPs to give you an email address and some web space you could upload stuff to using FTP. That way, you could have your own personal website on the net. The first ever version made public was 2006.08.08, although I probably had been using the program for a few weeks at that point.

To create the program, I studied what the web browser was doing when watching a YouTube video using Firefox. If I recall correctly, Firefox didn’t yet have the development tools it has today to analyze network activity. Connections were mostly HTTP and Wireshark, known as “Ethereal” up to that year, proved invaluable to inspect the network traffic coming in and out of my box when loading a YouTube video. I wrote youtube-dl with the specific goal of doing the same things the web browser was doing to retrieve the video. It even sent out a User-Agent string that was verbatim copied from Firefox for Linux, as a way to make sure the site would send the program the same version of video web pages that were used to study what the web browser was doing.

In addition, YouTube used Adobe Flash back then for the player. Videos were served as Flash Video files (FLV), and this all meant a proprietary plugin was required to watch them on the browser (many will remember the dreaded libflashplayer.so library), which would have made any browser development tools useless. This proprietary plugin was a constant source of security advisories and problems. I used a Firefox extension called Flashblock that prevented the plugin from being loaded by default and replaced embedded content using the plugin, in web pages, with placeholder elements containing a clickable icon so content would be loaded only on demand and the plugin library was not used unless requested by the user.

Flashblock had two nice side effects apart from making the browsing experience more secure. On the one hand, it removed a lot of noisy and obnoxious ads from many web pages, which could also be a source of security problems when served by third parties. On the other hand, it eased analyzing how videos were being downloaded by the video player. I would wait until the video page had finished downloading completely and then start logging traffic with Wireshark just before clicking on the embedded video player placeholder icon, allowing it to load. This way, the only traffic to analyze was related to the plugin downloading the video player application and the application itself downloading the video.

It’s also worth noting the Flash Player plugin back then was already downloading a copy of those videos to your hard drive (they were stored in /tmp under Linux) and many users relied on that functionality to keep a copy of them without using additional tools. youtube-dl was simply more convenient because it could retrieve the video title and name the file more appropriately in an automated way, for example.

Ahh, fresh meat!

The Flash Player plugin was eventually modified so videos wouldn’t be so easily available to grab. One of the first measures was to unlink the video file after creating it, so the i-node would still exist and be available to the process using it (until it was closed) while keeping the file invisible from the file system point of view. It was still possible to grab the file by using the /proc file system to examine file descriptors used by the browser process, but with every one of those small steps youtube-dl turned to be more and more convenient.

As many free and open source enthusiasts back then, I used Freshmeat to subscribe to new releases of projects I was interested in. When I created youtube-dl, I also created a project entry for it in that website so users could easily get notifications of new releases and a change log listing new features, fixes and improvements. Freshmeat could also be browsed to find new and interesting projects and its front page contained the latest updates, which usually amounted to only a few dozens a day. It’s only my guess that’s the way Joe Barr (rest in peace), an editor for linux.com, found out about the program and decided to write an article about it back in 2006. Linux.com was a bit different then and I think it was one of the frequently-visited sites for Linux enthusiasts together with other classics like Slashdot or Linux Weekly News. At least, it was for me.

From that point on, youtube-dl’s popularity started to grow and I started getting some emails from time to time to thank me for creating and maintaining the program.

Measuring buckets of bits

Fast forward to the year 2008. youtube-dl’s popularity had kept growing slowly and users frequently asked me to create similar programs to download from more sites, a request I had conceded a few times. It was at that point that I decided to rewrite the program from scratch and make it support multiple video sites natively. I had some simple ideas that would separate the program internals into several pieces. To simplify the most important parts: one would be the file downloader, common for every website, and another one would be the information extractors: objects (classes) that would contain code specific to a video site. When given a URL or pseudo-URL, the information extractors would be queried to know which one could handle that type of URL and then requested to extract information about that video or list of videos, with the primary goal of obtaining the video URL or a list of video URLs with available formats, together with some other metadata like the video titles, for example.

I also took the chance to switch version control systems and change where the project would be hosted. At that moment, Git was winning the distributed version control systems war for open source projects, but Mercurial also had a lot of users and, having tested both, I decided I liked it a bit more than Git. I started using it for youtube-dl and moved the project to Bitbucket, which was the natural choice. Back then, Bitbucket could only host Mercurial repositories, while GitHub only hosted Git repositories. Both were launched in 2008 and were a breath of fresh air compared to SourceForge. The combination of compartmentalized per-user project namespaces (i.e. the name of your project did not have to be globally unique but unique for your projects) with distributed source control systems meant you could publish your personal projects in a matter of minutes to any of the two sites. In any case, migrating the project history to Git and moving the project to GitHub was still a couple of years away in the future.

When rewriting the project I should have taken the chance to rename it, no doubt, but I didn’t want to confuse existing users and kept the name in an effort to preserve the little popularity the program had.

The technological context at home also switched a bit that year. Mobile data plans started to gain traction and, at the end of that year, I got myself a 3G modem and data plan that, for the first time, allowed me to browse the web at decent speeds. In any case, that didn’t make me stop using youtube-dl. I was paying 45 euros a month but the monthly data cap was limited to 5GB. Connection speed was finally great but, doing the math, I could only use an average of around 150MB a day, which meant I had to be selective when using the network and avoid big downloads if possible. youtube-dl helped a lot to prevent me from downloading large video files multiple times.

Episode: a new home

Some time later, at the end of 2009, I moved and finally started living with my girlfriend (now my wife and the mother of my two children) in Avilés. For the first time, I started accessing the Internet using the type of connection and service that had been the standard for many of my friends and family for many years. I remember it was a 100/10 Mbps (down/up) cable connection with no monthly cap. That change definitely marked a turning point in how often I used youtube-dl and how much attention I paid to the project.

Not much later, I finally moved it to Git and GitHub, when the market had spoken and both tools were the way to go. YouTube also started experimenting with HTML5 video, even if it wouldn’t become the default option until around 2015. In 2011 I had been working a full-time job as a software engineer for several years and, in general, I was not eager to get home to code a bit more tuning youtube-dl or implementing the most popular feature request I was probably not going to use personally.

In the second half of 2011 I was in the middle of another important personal software project and decided to step down as the youtube-dl maintainer, knowing I hadn’t been up to the task for several months. Philipp Hagemeister had proved to be a great coder and had some pending pull requests in GitHub with several fixes many people were interested in. I gave him commit access to my youtube-dl repo and that’s mostly the end of the story on my side. The project’s Git master branch log shows I had a continuous stream of commits until March 2011, when they jump to August 2011 to merge a fix by Philipp. Since then, a single clerical commit in 2013 to change rg3.github.com to rg3.github.io in the source code, which was needed when GitHub moved user pages from USERNAME.github.com to USERNAME.github.io in order to, if I recall correctly, avoid security problems with malicious user web pages being served from their own official github.com domain.

While I was basically not involved as a developer of youtube-dl, for years the official project page kept sitting under my username at https://github.com/rg3/youtube-dl and https://rg3.github.io/youtube-dl/. I only had to show up when Philipp or other maintainers asked me to give commit access to additional developers, like Filippo Valsorda at the time or Sergey, one of the current maintainers. Unfortunately, in 2019 we had a small troll problem in the project issue tracker and only project owners were allowed to block users. This made us finally move the project to a GitHub organization where everyone with commit access was invited (although not everyone joined). The GitHub organization has allowed project maintainers to act more freely without me having to step in for clerical tasks every now and then.

I want to reiterate my most sincere thanks to the different project maintainers along these years, who greatly improved the code, were able to create an actual community of contributors around it and who made the project immensely more popular than it was when I stepped down almost 10 years ago, serving the needs of thousands of people along the way.

Offline and free

I’d like to remark one more time that the purpose of youtube-dl as a tool has barely changed along its 14 years of existence. Before and after the RIAA’s DMCA letter was received, many people have explained how they use youtube-dl with different goals in mind.

For me, it has always been about offline access to videos that are already available to the general public online. In a world of mobile networks and always-on Internet connections, you may wonder if that’s really needed. It must be, I guess, if Netflix, Amazon, Disney or HBO have all implemented similar functionality in their extremely popular streaming applications. For long road trips, or trips abroad specially with kids, or underground or on an airplane, or in a place with poor connectivity or metered connections, having offline access to that review, report, podcast, lecture, piece of news or work of art is incredibly convenient.

An additional side-effect of youtube-dl is online access when the default online interface is not up to the task. The old proprietary Flash plugin was not available for every platform and architecture, depending on what your choice was. Nowadays, web browsers can play video but may sometimes not take advantage of efficient available GPU decoding, wasting large amounts of battery power along the way. youtube-dl can be combined with a native video player to make playing some videos possible and/or efficient. For example, mpv includes native youtube-dl support. You only need to feed it a supported video site URL and it will use youtube-dl to access the video stream and play it without storing anything in your hard drive.

The default online interface may also lack accessibility features, may make content navigation hard for some people or lack color blind filters that, again, may be available from a native video player application.

Last, but not least, tools like youtube-dl allow people to access online videos using only free software. I know there are not many free, libre and open source software purists out there. I don’t even consider myself one, by a long shot. Proprietary software is ever present in our modern lives and served to us every day in the form of vast amounts of Javascript code for our web browser to run, with many different and varied purposes and not always in the best interest of users. GDPR, with all its flaws and problems, is a testament to that. Accessing online videos using youtube-dl may give you a peace of mind incognito mode, uBlock Origin or Privacy Badger can only barely grasp.

Load comments