Compare commits

...

No commits in common. "master" and "dev" have entirely different histories.
master ... dev

10 changed files with 268 additions and 802 deletions

2
.gitignore vendored
View File

@ -1,6 +1,4 @@
.idea .idea
targets.*
!targets.sample.*
# ---> Python # ---> Python
# Byte-compiled / optimized / DLL files # Byte-compiled / optimized / DLL files

View File

@ -1,14 +1,21 @@
# Example systemd Service # Example systemd Service
`/etc/systemd/system/youtube-dl.service` `/home/user/youtubedl-daemon.sh`
```bash
#!/bin/bash
/usr/bin/python3 /home/user/automated-youtube-dl/downloader.py --daemon --sleep 60 "https://www.youtube.com/playlist?list=example12345" "/mnt/nfs/archive/YouTube/Example Playlist/"
```
`/lib/systemd/system/youtubedl.service`
```systemd ```systemd
[Unit] [Unit]
Description=Youtube-DL Daemon Description=Youtube-DL Daemon
After=network-online.target After=network-online.target
[Service] [Service]
ExecStart=/usr/bin/python3 /home/user/automated-youtube-dl/downloader.py --daemon --silence-errors --sleep 60 "https://www.youtube.com/playlist?list=example12345" "/mnt/nfs/archive/YouTube/Example Playlist/" ExecStart=/home/user/youtubedl-daemon.sh
User=user User=user
Group=user Group=user
@ -16,17 +23,9 @@ Group=user
WantedBy=multi-user.target WantedBy=multi-user.target
``` ```
Now start the service: Now start the service
```bash ```bash
chmod +x /home/user/youtubedl-daemon.sh
sudo systemctl daemon-reload sudo systemctl daemon-reload
sudo systemctl enable --now youtube-dl sudo systemctl enable --now youtubedl
``` ```
You can watch the process with:
```bash
sudo journalctl -b -u youtube-dl.service
```

179
README.md
View File

@ -1,111 +1,68 @@
# automated-youtube-dl # automated-youtube-dl
_Automated YouTube Archival._ _Automated YouTube Archival._
A wrapper for youtube-dl used for keeping very large amounts of data from YouTube in sync. It's designed to be simple and easy to use. A wrapper for youtube-dl used for keeping very large amounts of data from YouTube in sync. It's designed to be simple and easy to use.
I have a single, very large playlist that I add any videos I like to. This runs as a service on my NAS (see [Example systemd Service.md]). I have a single, very large playlist that I add any videos I like to. On my NAS is a service uses this program to download new videos (see [Example systemd Service.md]).
--- ### Features
## Project Status - Uses yt-dlp instead of youtube-dl.
- Skip videos that are already downloaded which makes checking a playlist for new videos quick because youtube-dl doesn't have to fetch the entire playlist.
This project is archived. I was working on a web interface for this project but decided to just use [tubearchivist](https://github.com/tubearchivist/tubearchivist) rather than write my own. If tubearchivist does not meet my needs then I will restart work on this project. - Automatically update yt-dlp on launch.
- Download the videos in a format suitable for archiving:
--- - Complex `format` that balances video quality and file size.
- Embedding of metadata: chapters, thumbnail, english subtitles (automatic too), and YouTube metadata.
- Log progress to a file.
### Features - Simple display using `tqdm`.
- Limit the size of the downloaded videos.
- Uses yt-dlp instead of youtube-dl. - Parallel downloads.
- Skips videos that are already downloaded. - Daemon mode.
- Automatically update yt-dlp on launch.
- Download the videos in a format suitable for archiving: ### Installation
- Complex `format` that balances video quality and file size.
- Embedding of metadata: chapters, thumbnail, english subtitles (automatic too), and YouTube metadata. ```bash
- Log progress to a file. sudo apt update && sudo apt install ffmpeg atomicparsley
- Simple display using `tqdm`. pip install -r requirements.txt
- Limit the size of the downloaded videos. ```
- Parallel downloads.
- Daemon mode for running as a system service. ### Usage
### Installation `./downloader.py <URL to download or path of a file containing the URLs of the videos to download> <output directory>`
```bash To run as a daemon, do:
sudo apt update && sudo apt install ffmpeg atomicparsley phantomjs
pip install -r requirements.txt `/usr/bin/python3 /home/user/automated-youtube-dl/downloader.py --daemon --sleep 60 <url> <ouput folder>`
```
`--sleep` is how many minutes to sleep after completing all downloads.
### Usage
#### Folder Structure
This program has 3 modes:
```
<br> Output Directory/
├─ logs/
**Direct-Download Mode:** │ ├─ youtube_dl-<UNIX timestamp>.log
│ ├─ youtube_dl-errors-<UNIX timestamp>.log
In this mode, you give the downloader a URL to the media you want to download. ├─ download-archive.log
├─ Example Video.mkv
`./downloader.py <video URL to download> --output <output directory>` ```
<br> `download-archive.log` contains the videos that have already been downloaded. You can import videos you've already downloaded by adding their ID to this file.
**Config-File Mode:** Videos will be saved using this name format:
In this mode, you give the downloader the path to a config file that contains the URLs of the media and where to download them to. ```
%(title)s --- %(uploader)s --- %(uploader_id)s --- %(id)s
The config file can be a YAML file or a TXT file with the URL to download on each line. ```
When using the YAML file (see [targets.sample.yml]): `./downloader.py <path to the config file>` #### Arguments
When using a TXT file: `./downloader.py <path to the config file> --output <output directory>` | Argument | Flag | Help |
|---------------|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
<br> | `--no-update` | `-n` | Don\'t update yt-dlp at launch. |
| `--max-size` | | Max allowed size of a video in MB. Default: 1100. |
**Daemon Mode:** | `--rm-cache` | `-r` | Delete the yt-dlp cache on start. |
| `--threads` | | How many download processes to use (threads). Default is how many CPU cores you have. You will want to find a good value that doesn't overload your connection. |
In this mode, the downloader will loop over the media you give it and sleep for a certain number of minutes. It takes | `--daemon` | `-d` | Run in daemon mode. Disables progress bars sleeps for the amount of time specified in --sleep. |
| `--sleep` | | How many minutes to sleep when in daemon mode. |
To run as a daemon, do:
`/usr/bin/python3 /home/user/automated-youtube-dl/downloader.py --daemon --sleep 60 <video URL or config file path>`
`--sleep` is how many minutes to sleep after completing all downloads.
Daemon mode can take a URL (like direct-download mode) or a path to a config file (like config-file mode).
<br>
#### Folder Structure
```
Output Directory/
├─ logs/
│ ├─ youtube_dl-<UNIX timestamp>.log
│ ├─ youtube_dl-errors-<UNIX timestamp>.log
├─ download-archive.log
├─ Example Video.mkv
```
`download-archive.log` contains the videos that have already been downloaded. You can import videos you've already downloaded by adding their ID to this file.
Videos will be saved using this name format:
```
[%(id)s] [%(title)s] [%(uploader)s] [%(uploader_id)s]
```
<br>
#### Arguments
| Argument | Flag | Help |
| --------------------- | ---- | ------------------------------------------------------------ |
| `--no-update` | `-n` | Don\'t update yt-dlp at launch. |
| `--max-size` | | Max allowed size of a video in MB. Default: 1100. |
| `--rm-cache` | `-r` | Delete the yt-dlp cache on start. |
| `--threads` | | How many download processes to use (threads). Default is how many CPU cores you have. You will want to find a good value that doesn't overload your connection. |
| `--daemon` | `-d` | Run in daemon mode. Disables progress bars and sleeps for the amount of time specified in `--sleep`. |
| `--sleep` | | How many minutes to sleep when in daemon mode. |
| `--silent` | `-s` | Don't print any error messages to the console. Errors will still be logged in the log files. |
| `--ignore-downloaded` | `-i` | Ignore videos that have been already downloaded and let youtube-dl handle everything. Videos will not be re-downloaded, but metadata will be updated. |

View File

@ -4,181 +4,87 @@ import logging.config
import math import math
import os import os
import re import re
import shutil
import subprocess import subprocess
import sys import sys
import tempfile
import time import time
from multiprocessing import Manager, Pool, cpu_count from multiprocessing import Manager, Pool, cpu_count
from pathlib import Path
from threading import Thread
import yaml
from appdirs import user_data_dir
from tqdm.auto import tqdm from tqdm.auto import tqdm
import ydl.yt_dlp as ydl
from process.funcs import get_silent_logger, remove_duplicates_from_playlist, restart_program, setup_file_logger from process.funcs import get_silent_logger, remove_duplicates_from_playlist, restart_program, setup_file_logger
from process.threads import bar_eraser, download_video from process.threads import download_video
from ydl.files import create_directories, resolve_path from ydl.files import create_directories, resolve_path
from ydl.yt_dlp import YDL, update_ytdlp
# logging.basicConfig(level=1000)
# logging.getLogger().setLevel(1000)
def signal_handler(sig, frame): urlRegex = re.compile(
# TODO: https://www.g-loaded.eu/2016/11/24/how-to-terminate-running-python-threads-using-signals/ r'^(?:http|ftp)s?://' # http:// or https://
# raise ServiceExit r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' # domain...
sys.exit(0) r'localhost|' # localhost...
r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip
r'(?::\d+)?' # optional port
# signal.signal(signal.SIGTERM, signal_handler) r'(?:/?|[/?]\S+)$', re.IGNORECASE)
# signal.signal(signal.SIGINT, signal_handler)
url_regex = re.compile(r'^(?:http|ftp)s?://' # http:// or https://
r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' # domain...
r'localhost|' # localhost...
r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip
r'(?::\d+)?' # optional port
r'(?:/?|[/?]\S+)$', re.IGNORECASE)
ansi_escape_regex = re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])')
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
parser.add_argument('file', help='URL to download or path of a file containing the URLs of the videos to download.') parser.add_argument('file', help='URL to download or path of a file containing the URLs of the videos to download.')
parser.add_argument('--output', required=False, help='Output directory. Ignored paths specified in a YAML file.') parser.add_argument('output', help='Output directory.')
parser.add_argument('--no-update', '-n', action='store_true', help='Don\'t update yt-dlp at launch.') parser.add_argument('--no-update', '-n', action='store_true', help='Don\'t update yt-dlp at launch.')
parser.add_argument('--max-size', type=int, default=1100, help='Max allowed size of a video in MB.') parser.add_argument('--max-size', type=int, default=1100, help='Max allowed size of a video in MB.')
parser.add_argument('--rm-cache', '-r', action='store_true', help='Delete the yt-dlp cache on start.') parser.add_argument('--rm-cache', '-r', action='store_true', help='Delete the yt-dlp cache on start.')
parser.add_argument('--threads', type=int, default=(cpu_count() - 1), parser.add_argument('--threads', type=int, default=cpu_count(), help='How many download processes to use.')
help=f'How many download processes to use. Default: number of CPU cores (for your machine: {cpu_count()}) - 1 = {cpu_count() - 1}') parser.add_argument('--daemon', '-d', action='store_true', help="Run in daemon mode. Disables progress bars sleeps for the amount of time specified in --sleep.")
parser.add_argument('--daemon', '-d', action='store_true',
help="Run in daemon mode. Disables progress bars sleeps for the amount of time specified in --sleep.")
parser.add_argument('--sleep', type=float, default=60, help='How many minutes to sleep when in daemon mode.') parser.add_argument('--sleep', type=float, default=60, help='How many minutes to sleep when in daemon mode.')
parser.add_argument('--download-cache-file-directory', default=user_data_dir('automated-youtube-dl', 'cyberes'), parser.add_argument('--silence-errors', '-s', action='store_true', help="Don't print any error messages to the console.")
help='The path to the directory to track downloaded videos. Defaults to your appdata path.')
parser.add_argument('--silence-errors', '-s', action='store_true',
help="Don't print any error messages to the console.")
parser.add_argument('--ignore-downloaded', '-i', action='store_true',
help='Ignore videos that have been already downloaded and disable checks. Let youtube-dl handle everything.')
parser.add_argument('--erase-downloaded-tracker', '-e', action='store_true', help='Erase the tracked video file.')
parser.add_argument('--ratelimit-sleep', type=int, default=5,
help='How many seconds to sleep between items to prevent rate-limiting. Does not affect time between videos as you should be fine since it takes a few seconds to merge everything and clean up.')
parser.add_argument('--input-datatype', choices=['auto', 'txt', 'yaml'], default='auto',
help='The datatype of the input file. If set to auto, the file will be scanned for a URL on the first line.'
'If is a URL, the filetype will be set to txt. If it is a key: value pair then the filetype will be set to yaml.')
parser.add_argument('--log-dir', default=None, help='Where to store the logs. Must be set when --output is not.')
parser.add_argument('--verbose', '-v', action='store_true')
parser.add_argument('--verify', '-z', action='store_true', help='Run ffprobe on the downloaded files.')
args = parser.parse_args() args = parser.parse_args()
if args.threads <= 0: if args.threads <= 0:
print("Can't have 0 threads!") print("Can't have 0 threads!")
sys.exit(1) sys.exit(1)
if args.output: args.output = resolve_path(args.output)
args.output = resolve_path(args.output)
if args.log_dir:
args.log_dir = resolve_path(args.log_dir)
elif not args.output and not args.log_dir:
args.log_dir = resolve_path(Path(os.getcwd(), 'automated-youtube-dl_logs'))
# print('Must set --log-dir when --output is not.')
# sys.exit(1)
else:
args.log_dir = args.output / 'logs'
args.download_cache_file_directory = resolve_path(args.download_cache_file_directory)
# TODO: use logging for this
if args.verbose:
print('Cache directory:', args.download_cache_file_directory)
log_time = time.time() log_time = time.time()
# Get the URLs of the videos to download. Is the input a URL or file?
def load_input_file(): if not re.match(urlRegex, str(args.file)):
""" args.file = resolve_path(args.file)
Get the URLs of the videos to download. Is the input a URL or file? if not args.file.exists():
""" print('Input file does not exist:', args.file)
url_list = {} sys.exit(1)
if not re.match(url_regex, str(args.file)) or args.input_datatype in ('txt', 'yaml'): url_list = [x.strip().strip('\n') for x in list(args.file.open())]
args.file = resolve_path(args.file) # Verify each line in the file is a valid URL.
if not args.file.exists(): for i, line in enumerate(url_list):
print('Input file does not exist:', args.file) if not re.match(urlRegex, line):
print(f'Line {i} not a url:', line)
sys.exit(1) sys.exit(1)
input_file = [x.strip().strip('\n') for x in list(args.file.open())] else:
if args.input_datatype == 'yaml' or (re.match(r'^.*?:\w*', input_file[0]) and args.input_datatype == 'auto'): url_list = [args.file]
with open(args.file, 'r') as file:
try:
url_list = yaml.safe_load(file)
except yaml.YAMLError as e:
print('Failed to load config file, error:', e)
sys.exit(1)
elif args.input_datatype == 'txt' or (re.match(url_regex, input_file[0]) and args.input_datatype == 'auto'):
if not args.output:
args.output = resolve_path(Path(os.getcwd(), 'automated-youtube-dl_output'))
# print('You must specify an output path with --output when the input datatype is a text file.')
# sys.exit(1)
url_list[str(args.output)] = input_file
else:
print('Unknown file type:', args.input_datatype)
print(input_file)
sys.exit(1)
del input_file # release file object
# Verify each line in the file is a valid URL.
# Also resolve the paths
resolved_paths = {}
for directory, urls in url_list.items():
for item in urls:
if not re.match(url_regex, str(item)):
print(f'Not a url:', item)
sys.exit(1)
resolved_paths[resolve_path(directory)] = urls
url_list = resolved_paths
else:
# They gave us just a URL
if not args.output:
# Set a default path
args.output = resolve_path(Path(os.getcwd(), 'automated-youtube-dl_output'))
# print('You must specify an output path with --output when the input is a URL.')
# sys.exit(1)
url_list[str(args.output)] = [args.file]
return url_list
url_list = load_input_file()
# Create directories AFTER loading the file
create_directories(*url_list.keys(), args.download_cache_file_directory)
def do_update():
if not args.no_update:
print('Updating yt-dlp...')
updated = update_ytdlp()
if updated:
print('Restarting program...')
restart_program()
else:
print('Up to date.')
if not args.no_update:
print('Checking if yt-dlp needs to be updated...')
updated = ydl.update_ytdlp()
if updated:
print('Restarting program...')
restart_program()
if args.rm_cache: if args.rm_cache:
subprocess.run('yt-dlp --rm-cache-dir', shell=True) subprocess.run('yt-dlp --rm-cache-dir', shell=True)
# TODO: compress old log files
if args.daemon: if args.daemon:
print('Running in daemon mode.') print('Running in daemon mode.')
create_directories(args.log_dir) log_dir = args.output / 'logs'
create_directories(args.output, log_dir)
# TODO: log file rotation https://www.blog.pythonlibrary.org/2014/02/11/python-how-to-create-rotating-logs/ file_logger = setup_file_logger('youtube_dl', log_dir / f'youtube_dl-{str(int(log_time))}.log', level=logging.INFO)
# TODO: log to one file instead of one for each run video_error_logger = setup_file_logger('youtube_dl_video_errors', log_dir / f'youtube_dl-errors-{int(log_time)}.log', level=logging.INFO)
file_logger = setup_file_logger('youtube_dl', args.log_dir / f'{str(int(log_time))}.log', level=logging.INFO)
video_error_logger = setup_file_logger('video_errors', args.log_dir / f'{int(log_time)}-errors.log', level=logging.INFO)
logger = get_silent_logger('yt-dl', silent=not args.daemon) logger = get_silent_logger('yt-dl', silent=not args.daemon)
def log_info_twice(msg): def log_info_twice(msg):
logger.info(msg) logger.info(msg)
file_logger.info(ansi_escape_regex.sub('', msg)) file_logger.info(msg)
log_info_twice('Starting process.') log_info_twice('Starting process.')
@ -186,6 +92,8 @@ start_time = time.time()
manager = Manager() manager = Manager()
download_archive_file = args.output / 'download-archive.log'
def load_existing_videos(): def load_existing_videos():
# Find existing videos. # Find existing videos.
@ -194,39 +102,26 @@ def load_existing_videos():
download_archive_file.touch() download_archive_file.touch()
with open(download_archive_file, 'r') as file: with open(download_archive_file, 'r') as file:
output.update(([line.rstrip() for line in file])) output.update(([line.rstrip() for line in file]))
# Remove duplicate lines.
# Something may have gone wrong in the past so we want to make sure everything is cleaned up.
with open(download_archive_file) as file:
uniqlines = set(file.readlines())
fd, path = tempfile.mkstemp()
with os.fdopen(fd, 'w') as tmp:
tmp.writelines(set(uniqlines))
shutil.move(path, download_archive_file)
return output return output
status_bar = tqdm(position=2, bar_format='{desc}', disable=args.daemon, leave=False) downloaded_videos = load_existing_videos()
print('Found', len(downloaded_videos), 'downloaded videos.')
# Create this object AFTER reading in the download_archive.
download_archive_logger = setup_file_logger('download_archive', download_archive_file, format_str='%(message)s')
status_bar = tqdm(position=2, bar_format='{desc}', disable=args.daemon)
def log_bar(log_msg, level): def log_bar(msg, level):
status_bar.write(f'[{level}] {log_msg}') status_bar.write(f'[{level}] {msg}')
if level == 'warning': if level == 'warning':
logger.warning(log_msg) logger.warning(msg)
elif level == 'error': elif level == 'error':
logger.error(log_msg) logger.error(msg)
else: else:
logger.info(log_msg) logger.info(msg)
# def log_with_video_id(log_msg, video_id, level, logger_obj):
# log_msg = f'{video_id} - {log_msg}'
# if level == 'warning':
# logger_obj.warning(log_msg)
# elif level == 'error':
# logger_obj.error(log_msg)
# else:
# logger_obj.info(log_msg)
def print_without_paths(msg): def print_without_paths(msg):
@ -246,46 +141,33 @@ def print_without_paths(msg):
class ytdl_logger(object): class ytdl_logger(object):
def debug(self, msg): def debug(self, msg):
file_logger.debug(self.__clean_msg(msg)) file_logger.debug(msg)
# if msg.startswith('[debug] '): # if msg.startswith('[debug] '):
# pass # pass
if '[download]' not in msg: if '[download]' not in msg:
print_without_paths(msg) print_without_paths(msg)
def info(self, msg): def info(self, msg):
file_logger.info(self.__clean_msg(msg)) file_logger.info(msg)
print_without_paths(msg) print_without_paths(msg)
def warning(self, msg): def warning(self, msg):
file_logger.warning(self.__clean_msg(msg)) file_logger.warning(msg)
if args.daemon: log_bar(msg, 'warning')
logger.warning(msg)
else:
status_bar.write(msg)
def error(self, msg): def error(self, msg):
file_logger.error(self.__clean_msg(msg)) file_logger.error(msg)
if args.daemon: log_bar(msg, 'error')
logger.error(msg)
else:
status_bar.write(msg)
def __clean_msg(self, msg):
return ansi_escape_regex.sub('', msg)
# TODO: https://github.com/TheFrenchGhosty/TheFrenchGhostys-Ultimate-YouTube-DL-Scripts-Collection/blob/master/docs/Scripts-Type.md#archivist-scripts
# https://github.com/yt-dlp/yt-dlp#embedding-examples # https://github.com/yt-dlp/yt-dlp#embedding-examples
ydl_opts = { ydl_opts = {
# TODO: https://github.com/TheFrenchGhosty/TheFrenchGhostys-Ultimate-YouTube-DL-Scripts-Collection/blob/master/docs/Details.md
# https://old.reddit.com/r/DataHoarder/comments/c6fh4x/after_hoarding_over_50k_youtube_videos_here_is/
'format': f'(bestvideo[filesize<{args.max_size}M][vcodec^=av01][height>=1080][fps>30]/bestvideo[filesize<{args.max_size}M][vcodec=vp9.2][height>=1080][fps>30]/bestvideo[filesize<{args.max_size}M][vcodec=vp9][height>=1080][fps>30]/bestvideo[filesize<{args.max_size}M][vcodec^=av01][height>=1080]/bestvideo[filesize<{args.max_size}M][vcodec=vp9.2][height>=1080]/bestvideo[filesize<{args.max_size}M][vcodec=vp9][height>=1080]/bestvideo[filesize<{args.max_size}M][height>=1080]/bestvideo[filesize<{args.max_size}M][vcodec^=av01][height>=720][fps>30]/bestvideo[filesize<{args.max_size}M][vcodec=vp9.2][height>=720][fps>30]/bestvideo[filesize<{args.max_size}M][vcodec=vp9][height>=720][fps>30]/bestvideo[filesize<{args.max_size}M][vcodec^=av01][height>=720]/bestvideo[filesize<{args.max_size}M][vcodec=vp9.2][height>=720]/bestvideo[filesize<{args.max_size}M][vcodec=vp9][height>=720]/bestvideo[filesize<{args.max_size}M][height>=720]/bestvideo[filesize<{args.max_size}M])+(bestaudio[acodec=opus]/bestaudio)/best', 'format': f'(bestvideo[filesize<{args.max_size}M][vcodec^=av01][height>=1080][fps>30]/bestvideo[filesize<{args.max_size}M][vcodec=vp9.2][height>=1080][fps>30]/bestvideo[filesize<{args.max_size}M][vcodec=vp9][height>=1080][fps>30]/bestvideo[filesize<{args.max_size}M][vcodec^=av01][height>=1080]/bestvideo[filesize<{args.max_size}M][vcodec=vp9.2][height>=1080]/bestvideo[filesize<{args.max_size}M][vcodec=vp9][height>=1080]/bestvideo[filesize<{args.max_size}M][height>=1080]/bestvideo[filesize<{args.max_size}M][vcodec^=av01][height>=720][fps>30]/bestvideo[filesize<{args.max_size}M][vcodec=vp9.2][height>=720][fps>30]/bestvideo[filesize<{args.max_size}M][vcodec=vp9][height>=720][fps>30]/bestvideo[filesize<{args.max_size}M][vcodec^=av01][height>=720]/bestvideo[filesize<{args.max_size}M][vcodec=vp9.2][height>=720]/bestvideo[filesize<{args.max_size}M][vcodec=vp9][height>=720]/bestvideo[filesize<{args.max_size}M][height>=720]/bestvideo[filesize<{args.max_size}M])+(bestaudio[acodec=opus]/bestaudio)/best',
'outtmpl': f'{args.output}/[%(id)s] [%(title)s] [%(uploader)s] [%(uploader_id)s].%(ext)s', # leading dash can cause issues due to bash args so we surround the variables in brackets
'merge_output_format': 'mkv', 'merge_output_format': 'mkv',
'logtostderr': True, 'logtostderr': True,
'embedchapters': True, 'embedchapters': True,
'writethumbnail': True, 'writethumbnail': True, # Save the thumbnail to a file. Embedding seems to be broken right now so this is an alternative.
# Save the thumbnail to a file. Embedding seems to be broken right now so this is an alternative.
'embedthumbnail': True, 'embedthumbnail': True,
'embeddescription': True, 'embeddescription': True,
'writesubtitles': True, 'writesubtitles': True,
@ -293,187 +175,100 @@ ydl_opts = {
'subtitlesformat': 'vtt', 'subtitlesformat': 'vtt',
'subtitleslangs': ['en'], 'subtitleslangs': ['en'],
'writeautomaticsub': True, 'writeautomaticsub': True,
'writedescription': True, # 'writedescription': True,
'ignoreerrors': True, 'ignoreerrors': True,
'continuedl': False, 'continuedl': False,
'addmetadata': True, 'addmetadata': True,
'writeinfojson': True, 'writeinfojson': True,
'verbose': args.verbose,
'postprocessors': [ 'postprocessors': [
{'key': 'FFmpegEmbedSubtitle'}, {'key': 'FFmpegEmbedSubtitle'},
{'key': 'FFmpegMetadata', 'add_metadata': True}, {'key': 'FFmpegMetadata', 'add_metadata': True},
{'key': 'EmbedThumbnail', 'already_have_thumbnail': True}, {'key': 'EmbedThumbnail', 'already_have_thumbnail': True},
{'key': 'FFmpegThumbnailsConvertor', 'format': 'jpg', 'when': 'before_dl'},
# {'key': 'FFmpegSubtitlesConvertor', 'format': 'srt'} # {'key': 'FFmpegSubtitlesConvertor', 'format': 'srt'}
], ],
# 'external_downloader': 'aria2c',
# 'external_downloader_args': ['-j 32', '-s 32', '-x 16', '--file-allocation=none', '--optimize-concurrent-downloads=true', '--http-accept-gzip=true', '--continue=true'],
} }
yt_dlp = YDL(dict(ydl_opts, **{'logger': ytdl_logger()})) main_opts = dict(ydl_opts, **{'logger': ytdl_logger()})
# thread_opts = dict(ydl_opts, **{'logger': ydl.ytdl_no_logger()})
url_count = 0 yt_dlp = ydl.YDL(main_opts)
for k, v in url_list.items():
for item in v:
url_count += 1
# Init bars # Init bars
playlist_bar = tqdm(position=1, desc='Playlist', disable=args.daemon)
video_bars = manager.list() video_bars = manager.list()
if not args.daemon: if not args.daemon:
for i in range(args.threads): for i in range(args.threads):
video_bars.append([3 + i, manager.Lock()]) video_bars.append([
3 + i,
encountered_errors = 0 manager.Lock()
errored_videos = 0 ])
# The video progress bars have an issue where when a bar is closed it
# will shift its position back 1 then return to the correct position.
# This thread will clear empty spots.
if not args.daemon:
eraser_exit = manager.Value(bool, False)
Thread(target=bar_eraser, args=(video_bars, eraser_exit,)).start()
already_erased_downloaded_tracker = False
while True: while True:
# do_update() # this doesn't work very well. freezes for i, target_url in tqdm(enumerate(url_list), total=len(url_list), position=0, desc='Inputs', disable=args.daemon):
progress_bar = tqdm(total=url_count, position=0, desc='Inputs', disable=args.daemon, logger.info('Fetching playlist...')
bar_format='{l_bar}{bar}| {n_fmt}/{total_fmt}') playlist = yt_dlp.playlist_contents(target_url)
for output_path, urls in url_list.items(): playlist['entries'] = remove_duplicates_from_playlist(playlist['entries'])
for target_url in urls: encountered_errors = 0
logger.info('Fetching playlist...') errored_videos = 0
playlist = yt_dlp.playlist_contents(str(target_url))
if not playlist: log_info_twice(f"Downloading item: '{playlist['title']}' {target_url}")
progress_bar.update()
continue
url_list = load_input_file() playlist_bar.total = len(playlist['entries'])
playlist_bar.set_description(playlist['title'])
download_archive_file = args.download_cache_file_directory / (str(playlist['id']) + '.log') # print(playlist['entries'][0])
if args.erase_downloaded_tracker and not already_erased_downloaded_tracker: # sys.exit()
if download_archive_file.exists():
os.remove(download_archive_file)
already_erased_downloaded_tracker = True
downloaded_videos = load_existing_videos()
msg = f'Found {len(downloaded_videos)} downloaded videos for playlist "{playlist["title"]}" ({playlist["id"]}). {"Ignoring." if args.ignore_downloaded else ""}' # Remove already downloaded files from the to-do list.
if args.daemon: download_queue = []
logger.info(msg) s = set()
else: for p, video in enumerate(playlist['entries']):
progress_bar.write(msg) if video['id'] not in downloaded_videos and video['id'] not in s:
download_archive_logger = setup_file_logger('download_archive', download_archive_file, download_queue.append(video)
format_str='%(message)s') s.add(video['id'])
playlist_bar.update(len(downloaded_videos))
playlist['entries'] = remove_duplicates_from_playlist(playlist['entries']) if len(download_queue): # Don't mess with multiprocessing if all videos are already downloaded
with Pool(processes=args.threads) as pool:
status_bar.set_description_str('=' * os.get_terminal_size()[0])
logger.info('Starting downloads...')
for result in pool.imap_unordered(download_video,
((video, {
'bars': video_bars,
'ydl_opts': ydl_opts,
'output_dir': args.output,
}) for video in download_queue)):
# Save the video ID to the file
if result['downloaded_video_id']:
download_archive_logger.info(result['downloaded_video_id'])
log_info_twice(f'Downloading item: "{playlist["title"]}" ({playlist["id"]}) {target_url}') # Print stuff
for line in result['video_error_logger_msg']:
# Remove already downloaded files from the to-do list. video_error_logger.info(line)
download_queue = [] file_logger.error(line)
for p, video in enumerate(playlist['entries']): encountered_errors += 1
if video['id'] not in download_queue: if not args.silence_errors:
if not args.ignore_downloaded and video['id'] not in downloaded_videos:
download_queue.append(video)
# downloaded_videos.add(video['id'])
elif args.ignore_downloaded:
download_queue.append(video)
playlist_bar = tqdm(total=len(playlist['entries']), position=1,
desc=f'"{playlist["title"]}" ({playlist["id"]})', disable=args.daemon, leave=False)
if not args.ignore_downloaded:
playlist_bar.update(len(downloaded_videos))
playlist_ydl_opts = ydl_opts.copy()
# playlist_ydl_opts['outtmpl'] = f'{output_path}/{get_output_templ()}'
if len(download_queue): # Don't mess with multiprocessing if all videos are already downloaded
with Pool(processes=args.threads) as pool:
if sys.stdout.isatty():
# Doesn't work if not connected to a terminal:
# OSError: [Errno 25] Inappropriate ioctl for device
status_bar.set_description_str('=' * os.get_terminal_size()[0])
logger.info('Starting downloads...')
for result in pool.imap_unordered(download_video,
((video, {
'bars': video_bars,
'ydl_opts': playlist_ydl_opts,
'output_dir': Path(output_path),
'ignore_downloaded': args.ignore_downloaded,
'verify': args.verify
}) for video in download_queue)):
# Save the video ID to the file
if result['downloaded_video_id']:
download_archive_logger.info(result['downloaded_video_id'])
# Print short error messages.
# An error should never be added to both video_critical_err_msg_short and video_critical_err_msg.
for line in result['video_critical_err_msg_short']:
# file_msg = f"{result['video_id']} - {ansi_escape_regex.sub('', line)}"
# term_msg = f"{result['video_id']} - {line}"
msg = f"{result['video_id']} - {line}"
video_error_logger.error(msg)
file_logger.error(msg)
encountered_errors += 1
if args.daemon: if args.daemon:
logger.error(msg) logger.error(line)
else: else:
status_bar.write(msg) playlist_bar.write(line)
if len(result['video_error_logger_msg']):
errored_videos += 1
# Print longer error messages. # for line in result['status_msg']:
# Won't print anything to console if the silence_errors arg is set. # playlist_bar.write(line)
for line in result['video_critical_err_msg']: for line in result['logger_msg']:
# file_msg = f"{result['video_id']} - {ansi_escape_regex.sub('', line)}" log_info_twice(line)
# term_msg = f"{result['video_id']} - {line}" playlist_bar.update()
msg = f"{result['video_id']} - {line}" else:
video_error_logger.error(msg) playlist_bar.write(f"All videos already downloaded for '{playlist['title']}'.")
file_logger.error(msg)
encountered_errors += 1
if not args.silence_errors:
if args.daemon:
logger.error(msg)
else:
status_bar.write(msg)
# if len(result['video_critical_err_msg']): error_msg = f'Encountered {encountered_errors} errors on {errored_videos} videos.'
# errored_videos += 1 if args.daemon:
# if args.silence_errors and args.daemon: logger.info(error_msg)
# logger.error(f"{result['video_id']} - Failed due to error.") else:
playlist_bar.write(error_msg)
for line in result['logger_msg']: log_info_twice(f"Finished item: '{playlist['title']}' {target_url}")
log_info_twice(f"{result['video_id']} - {line}")
# TODO: if no error launch a verify multiprocess
# if kwargs['verify']:
# try:
# info = yt_dlp.extract_info(video['url'])
# except Exception as e:
# output_dict['video_critical_err_msg'].append(f'Failed to verify video, extract_info failed: {e}')
# file_path = base_path + info['ext']
# result = ffprobe(file_path)
# if not result[0]:
# output_dict['video_critical_err_msg'].append(f'Failed to verify video: {result[4]}')
playlist_bar.update()
else:
msg = f"All videos already downloaded for \"{playlist['title']}\"."
if args.daemon:
logger.info(msg)
else:
status_bar.write(msg)
log_info_twice(f"Finished item: '{playlist['title']}' {target_url}")
# Sleep a bit to prevent rate-limiting
if progress_bar.n < len(url_list.keys()) - 1:
status_bar.set_description_str(f'Sleeping {args.ratelimit_sleep}s...')
time.sleep(args.ratelimit_sleep)
progress_bar.update()
error_msg = f'Encountered {encountered_errors} errors on {errored_videos} videos.'
if args.daemon:
logger.info(error_msg)
else:
status_bar.write(error_msg)
log_info_twice(f"Finished process in {round(math.ceil(time.time() - start_time) / 60, 2)} min.") log_info_twice(f"Finished process in {round(math.ceil(time.time() - start_time) / 60, 2)} min.")
if not args.daemon: if not args.daemon:
break break
@ -482,27 +277,13 @@ while True:
try: try:
time.sleep(args.sleep * 60) time.sleep(args.sleep * 60)
except KeyboardInterrupt: except KeyboardInterrupt:
sys.exit(0) sys.exit()
# downloaded_videos = load_existing_videos() # reload the videos that have already been downloaded downloaded_videos = load_existing_videos() # reload the videos that have already been downloaded
# Erase the status bar.
status_bar.set_description_str('\x1b[2KDone!')
status_bar.refresh()
# Clean up the remaining bars. Have to close them in order. # Clean up the remaining bars. Have to close them in order.
# These variables may be undefined so we will just ignore any errors. playlist_bar.close()
# Not in one try/catch because we don't want to skip anything. status_bar.close()
try:
eraser_exit.value = True
except NameError:
pass
except AttributeError:
pass
try:
playlist_bar.close()
except NameError:
pass
except AttributeError:
pass
try:
status_bar.close()
except NameError:
pass
except AttributeError:
pass

View File

@ -1,9 +1,7 @@
import logging import logging
import os import os
import re
import sys import sys
import ffmpeg
import psutil import psutil
@ -24,7 +22,7 @@ def restart_program():
os.execl(python, python, *sys.argv) os.execl(python, python, *sys.argv)
def setup_file_logger(name, log_file, level=logging.INFO, format_str: str = '%(asctime)s - %(name)s - %(levelname)s - %(message)s', filemode='a'): def setup_file_logger(name, log_file, level=logging.INFO, format_str: str = '%(asctime)s - %(name)s - %(levelname)s - %(message)s', filemode='a', no_console: bool = True):
formatter = logging.Formatter(format_str) formatter = logging.Formatter(format_str)
logger = logging.getLogger(name) logger = logging.getLogger(name)
@ -42,21 +40,6 @@ def setup_file_logger(name, log_file, level=logging.INFO, format_str: str = '%(a
return logger return logger
def ffprobe(filename):
try:
# stream = stream.output('pipe:', format="null")
# stream.run(capture_stdout=True, capture_stderr=True)
test = ffmpeg.probe(filename)
except Exception as e:
err = []
for x in e.stderr.decode().split('\n'):
if x.strip(' ') != '':
err.append(x)
err_msg = err[-1].split(': ')[-1]
return False, filename, str(e), None, err_msg
return True, filename, None, test, None
def get_silent_logger(name, level=logging.INFO, format_str: str = '%(asctime)s - %(name)s - %(levelname)s - %(message)s', silent: bool = True): def get_silent_logger(name, level=logging.INFO, format_str: str = '%(asctime)s - %(name)s - %(levelname)s - %(message)s', silent: bool = True):
logger = logging.getLogger(name) logger = logging.getLogger(name)
console = logging.StreamHandler() console = logging.StreamHandler()
@ -77,11 +60,3 @@ def remove_duplicates_from_playlist(entries):
videos.append(video) videos.append(video)
s.add(video['id']) s.add(video['id'])
return videos return videos
def remove_special_chars_linux(string, special_chars: list = None):
if special_chars is None:
special_chars = ['\\', '`', '*', '_', '{', '}', '[', ']', '(', ')', '>', '#', '+', '-', '.', '!', '$', '\'']
for char in special_chars:
string = re.sub(re.escape(char), '', string)
return string

View File

@ -1,46 +1,32 @@
import math import math
import os import os
import random
import subprocess
import time import time
import traceback
from pathlib import Path
import numpy as np import numpy as np
import yt_dlp as ydl_ydl
from hurry.filesize import size
from tqdm.auto import tqdm from tqdm.auto import tqdm
from unidecode import unidecode
import ydl.yt_dlp as ydl import ydl.yt_dlp as ydl
from process.funcs import remove_special_chars_linux, setup_file_logger from process.funcs import setup_file_logger
class ytdl_logger(object): class ytdl_logger(object):
errors = [] errors = []
def __init__(self, logger=None): def __init__(self, logger):
self.logger = logger self.logger = logger
# logging.basicConfig(level=logging.DEBUG)
# self.logger = logging
# self.logger.info('testlog')
def debug(self, msg): def debug(self, msg):
if self.logger: self.logger.info(msg)
self.logger.info(msg)
def info(self, msg): def info(self, msg):
if self.logger: self.logger.info(msg)
self.logger.info(msg)
def warning(self, msg): def warning(self, msg):
if self.logger: self.logger.warning(msg)
self.logger.warning(msg)
def error(self, msg): def error(self, msg):
if self.logger: self.logger.error(msg)
self.logger.error(msg) self.errors.append(msg)
self.errors.append(msg)
def is_manager_lock_locked(lock) -> bool: def is_manager_lock_locked(lock) -> bool:
@ -55,225 +41,62 @@ def is_manager_lock_locked(lock) -> bool:
return False return False
name_max = int(subprocess.check_output("getconf NAME_MAX /", shell=True).decode()) - 30
def download_video(args) -> dict: def download_video(args) -> dict:
# Sleep for a little bit to space out the rush of workers flooding the bar locks. # Sleep for a little bit to space out the rush of workers flooding the bar locks.
# time.sleep(random.randint(1, 20) / 1000) # time.sleep(random.randint(1, 20) / 1000)
def progress_hook(d): def progress_hook(d):
# Variables can be None if the download hasn't started yet. # downloaded_bytes and total_bytes can be None if the download hasn't started yet.
if d['status'] == 'downloading': if d['status'] == 'downloading':
total = None if d.get('downloaded_bytes') and d.get('total_bytes'):
if d.get('downloaded_bytes'):
# We want total_bytes but it may not exist so total_bytes_estimate is good too
if d.get('total_bytes'):
total = d.get('total_bytes')
elif d.get('total_bytes_estimate'):
total = d.get('total_bytes_estimate')
if total:
downloaded_bytes = int(d['downloaded_bytes']) downloaded_bytes = int(d['downloaded_bytes'])
if total > 0: total_bytes = int(d['total_bytes'])
percent = (downloaded_bytes / total) * 100 if total_bytes > 0:
percent = (downloaded_bytes / total_bytes) * 100
bar.update(int(np.round(percent - bar.n))) # If the progress bar doesn't end at 100% then round to 1 decimal place bar.update(int(np.round(percent - bar.n))) # If the progress bar doesn't end at 100% then round to 1 decimal place
bar.set_postfix({ bar.set_postfix({
'speed': d['_speed_str'], 'speed': d['_speed_str'],
'size': f"{size(d.get('downloaded_bytes'))}/{size(total)}", 'size': f"{d['_downloaded_bytes_str'].strip()}/{d['_total_bytes_str'].strip()}",
}) })
else:
bar.set_postfix({
'speed': d['_speed_str'],
'size': f"{d['_downloaded_bytes_str'].strip()}/{d['_total_bytes_str'].strip()}",
})
video = args[0] video = args[0]
kwargs = args[1] kwargs = args[1]
output_dict = {'downloaded_video_id': None, 'video_id': video['id'], 'video_critical_err_msg': [], 'video_critical_err_msg_short': [], 'status_msg': [], 'logger_msg': []} # empty object # Get a bar
locked = False
if not kwargs['ignore_downloaded'] and not video['channel_id'] or not video['channel'] or not video['channel_url']:
if video['duration'] or isinstance(video['view_count'], int):
# Sometimes videos don't have channel_id, channel, or channel_url but are actually valid. Like shorts.
pass
else:
output_dict['video_critical_err_msg_short'].append('unavailable.')
return output_dict
# Clean of forign languages
video['title'] = unidecode(video['title'])
if len(kwargs['bars']): if len(kwargs['bars']):
bar_enabled = True # We're going to wait until a bar is available for us to use.
got_lock = False while not locked:
while not got_lock: # Get a bar
for item in kwargs['bars']: for item in kwargs['bars']:
if item[1].acquire(timeout=0.01): if not is_manager_lock_locked(item[1]):
got_lock = True locked = item[1].acquire(timeout=0.1) # get the lock ASAP and don't wait if we didn't get it.
bar_offset = item[0] offset = item[0]
bar_lock = item[1] bar_lock = item[1]
break break
else:
time.sleep(random.uniform(0.1, 0.5))
kwargs['ydl_opts']['progress_hooks'] = [progress_hook] kwargs['ydl_opts']['progress_hooks'] = [progress_hook]
desc_with = int(np.round(os.get_terminal_size()[0] * (1 / 4))) desc_with = int(np.round(os.get_terminal_size()[0] * (1 / 4)))
bar = tqdm(total=100, position=bar_offset, desc=f"{video['id']} - {video['title']}".ljust(desc_with)[:desc_with], bar_format='{l_bar}{bar}| {elapsed}<{remaining}{postfix}', leave=False) bar = tqdm(total=100, position=(offset if locked else None), desc=f"{video['id']} - {video['title']}".ljust(desc_with)[:desc_with], bar_format='{l_bar}{bar}| {n_fmt}/{total_fmt} [{elapsed}<{remaining}{postfix}]', leave=False)
else:
bar_enabled = False
# got_lock = False
# # if len(kwargs['bars']):
# while not got_lock: # We're going to wait until a bar is available for us to use.
# for item in kwargs['bars']:
# # if not is_manager_lock_locked(item[1]):
# got_lock = item[1].acquire(timeout=0.01) # get the lock ASAP and don't wait if we didn't get it.
#
# if got_lock:
# print('GOT LOCK:', video['id'])
# # Now that we've gotten the lock, set some variables related to the bar
# offset = item[0]
# bar_lock = item[1]
# break
# else:
# print('WAITING FOR LOCK:', video['id'])
# time.sleep(uniform(0.1, 0.9))
ylogger = ytdl_logger(setup_file_logger(video['id'], kwargs['output_dir'] / f"[{video['id']}].log"))
kwargs['ydl_opts']['logger'] = ylogger
yt_dlp = ydl.YDL(kwargs['ydl_opts'])
output_dict = {'downloaded_video_id': None, 'blacklist_video_id': None, 'video_error_logger_msg': [], 'status_msg': [], 'logger_msg': []} # empty object
start_time = time.time() start_time = time.time()
try: try:
kwargs['ydl_opts']['logger'] = ytdl_logger() # dummy silent logger error_code = yt_dlp(video['url']) # Do the download
yt_dlp = ydl.YDL(kwargs['ydl_opts'])
video_n = yt_dlp.get_info(video['url'])
if not video_n:
output_dict['video_critical_err_msg_short'].append('failed to get info. Unavailable?')
if bar_enabled:
bar.close()
bar_lock.release()
return output_dict
video_n['url'] = video['url']
video = video_n
del video_n
# We created a new dict
video['title'] = unidecode(video['title'])
video['uploader'] = unidecode(video['uploader']) # now this info is present since we fetched it
# TODO: do we also need to remove the @ char?
video_filename = remove_special_chars_linux(
ydl.get_output_templ(video_id=video['id'], title=video['title'], uploader=video['uploader'], uploader_id=video['uploader_id'], include_ext=False), special_chars=['/']
)
# Make sure the video title isn't too long
while len(video_filename) >= name_max - 3: # -3 so that I can add ...
video['title'] = video['title'][:-1]
video_filename = remove_special_chars_linux(
ydl.get_output_templ(
video_id=video['id'],
title=video['title'] + '...',
uploader=video['uploader'],
uploader_id=video['uploader_id'],
include_ext=False
), special_chars=['/'])
base_path = str(Path(kwargs['output_dir'], video_filename))
kwargs['ydl_opts']['outtmpl'] = f"{base_path}.%(ext)s"
# try:
# base_path = os.path.splitext(Path(kwargs['output_dir'], yt_dlp.prepare_filename(video)))[0]
# except AttributeError:
# # Sometimes we won't be able to pull the video info so just use the video's ID.
# base_path = kwargs['output_dir'] / video['id']
ylogger = ytdl_logger(setup_file_logger(video['id'], base_path + '.log'))
kwargs['ydl_opts']['logger'] = ylogger
with ydl_ydl.YoutubeDL(kwargs['ydl_opts']) as y:
error_code = y.download(video['url'])
# yt_dlp = ydl.YDL(kwargs['ydl_opts']) # recreate the object with the correct logging path
# error_code = yt_dlp(video['url']) # Do the download
if not error_code: if not error_code:
elapsed = round(math.ceil(time.time() - start_time) / 60, 2) elapsed = round(math.ceil(time.time() - start_time) / 60, 2)
output_dict['logger_msg'].append(f"'{video['title']}' - Downloaded in {elapsed} min.") output_dict['logger_msg'].append(f"{video['id']} '{video['title']}' downloaded in {elapsed} min.")
output_dict['downloaded_video_id'] = video['id'] output_dict['downloaded_video_id'] = video['id']
else: else:
output_dict['video_critical_err_msg'] = output_dict['video_critical_err_msg'] + ylogger.errors # m = f'{video["id"]} {video["title"]} -> Failed to download, error code: {error_code}'
except Exception: # output_dict['status_msg'].append(m)
output_dict['video_critical_err_msg'].append(f"EXCEPTION -> {traceback.format_exc()}") # output_dict['video_error_logger_msg'].append(m)
if bar_enabled: output_dict['video_error_logger_msg'] = output_dict['video_error_logger_msg'] + ylogger.errors
bar.update(100 - bar.n) except Exception as e:
output_dict['video_error_logger_msg'].append(f"EXCEPTION -> {e}")
if bar_enabled: if locked:
bar.close() bar.close()
bar_lock.release() bar_lock.release()
return output_dict return output_dict
def bar_eraser(video_bars, eraser_exit):
while not eraser_exit.value:
for i, item in enumerate(video_bars):
if eraser_exit.value:
return
i = int(i)
bar_lock = video_bars[i][1]
if video_bars[i][1].acquire(timeout=0.1):
bar = tqdm(position=video_bars[i][0], leave=False, bar_format='\x1b[2K')
bar.close()
bar_lock.release()
# Old queue and queue processor threads
# manager = Manager()
# queue = manager.dict()
# queue_lock = manager.Lock()
# def eraser():
# nonlocal queue
# try:
# while not eraser_exit.value:
# for i in queue.keys():
# if eraser_exit.value:
# return
# i = int(i)
# lock = video_bars[i][1].acquire(timeout=0.1)
# bar_lock = video_bars[i][1]
# if lock:
# bar = tqdm(position=video_bars[i][0], leave=False, bar_format='\x1b[2K')
# bar.close()
# with queue_lock:
# del queue_dict[i]
# queue = queue_dict
# bar_lock.release()
# except KeyboardInterrupt:
# sys.exit(0)
# except multiprocessing.managers.RemoteError:
# sys.exit(0)
# except SystemExit:
# sys.exit(0)
#
# try:
# Thread(target=eraser).start()
# while not eraser_exit.value:
# for i, item in enumerate(video_bars):
# if eraser_exit.value:
# return
# # Add bars to the queue
# if is_manager_lock_locked(item[1]):
# with queue_lock:
# queue_dict = queue
# queue_dict[i] = True
# queue = queue_dict
# except KeyboardInterrupt:
# sys.exit(0)
# except multiprocessing.managers.RemoteError:
# sys.exit(0)
# except SystemExit:
# sys.exit(0)
class ServiceExit(Exception):
"""
Custom exception which is used to trigger the clean exit
of all running threads and the main program.
"""
pass

View File

@ -2,10 +2,4 @@ yt-dlp
psutil psutil
tqdm tqdm
mergedeep mergedeep
numpy numpy
pyyaml
appdirs
phantomjs
unidecode
ffmpeg-python
hurry.filesize

View File

@ -1 +0,0 @@
https://www.youtube.com/playlist?list=example1234

View File

@ -1,5 +0,0 @@
/path/to/storage/Example Playlist:
- https://www.youtube.com/playlist?list=ExamplePlaylist1234
/path/to/storage/Music:
- https://www.youtube.com/MyMusicPlaylist1234

View File

@ -7,11 +7,8 @@ from mergedeep import merge
class YDL: class YDL:
def __init__(self, ydl_opts: dict = None, extra_ydlp_opts: dict = None): def __init__(self, ydl_opts):
self.ydl_opts = ydl_opts if ydl_opts else {} self.ydl_opts = ydl_opts
extra_ydlp_opts = extra_ydlp_opts if extra_ydlp_opts else {}
self.ydl_opts = merge(ydl_opts, extra_ydlp_opts)
self.ydl_opts['logger'] = self.ydl_opts.get('logger')
self.yt_dlp = yt_dlp.YoutubeDL(ydl_opts) self.yt_dlp = yt_dlp.YoutubeDL(ydl_opts)
def get_formats(self, url: Union[str, Path]) -> tuple: def get_formats(self, url: Union[str, Path]) -> tuple:
@ -32,30 +29,17 @@ class YDL:
sizes.append(d) sizes.append(d)
return tuple(sizes) return tuple(sizes)
def playlist_contents(self, url: str) -> Union[dict, bool]: def playlist_contents(self, url: str) -> dict:
ydl_opts = { ydl_opts = merge({
'extract_flat': True, 'extract_flat': True,
'skip_download': True, 'skip_download': True
'ignoreerrors': True, }, self.ydl_opts)
'logger': self.ydl_opts['logger'],
}
with yt_dlp.YoutubeDL(ydl_opts) as ydl: with yt_dlp.YoutubeDL(ydl_opts) as ydl:
info = self.get_info(url) info = ydl.sanitize_info(ydl.extract_info(url, download=False))
if not info:
return False
entries = [] entries = []
if info['_type'] == 'playlist': if info['_type'] == 'playlist':
if 'entries' in info.keys(): if 'entries' in info.keys():
# When downloading a channel youtube-dl returns a playlist for videos and another for shorts. entries = [x for x in info['entries']]
# We need to combine all the videos into one list.
for item in info['entries']:
if item['_type'] in ('video', 'url'):
entries.append(item)
elif item['_type'] == 'playlist':
for video in self.get_info(item['webpage_url'])['entries']:
entries.append(video)
else:
raise ValueError(f"Unknown sub-media type: {item['_type']}")
elif info['_type'] == 'video': elif info['_type'] == 'video':
# `info` doesn't seem to contain the `url` key so we'll add it manually. # `info` doesn't seem to contain the `url` key so we'll add it manually.
# If any issues arise in the future make sure to double check there isn't any weirdness going on here. # If any issues arise in the future make sure to double check there isn't any weirdness going on here.
@ -69,55 +53,20 @@ class YDL:
'entries': entries, 'entries': entries,
} }
def __call__(self, *args, **kwargs):
return self.yt_dlp.download(*args, **kwargs)
# def filter_filesize(self, info, *, incomplete): # def filter_filesize(self, info, *, incomplete):
# duration = info.get('duration') # duration = info.get('duration')
# if duration and duration < 60: # if duration and duration < 60:
# return 'The video is too short' # return 'The video is too short'
def extract_info(self, *args, **kwargs):
return self.yt_dlp.extract_info(*args, **kwargs)
def prepare_filename(self, *args, **kwargs):
return self.yt_dlp.prepare_filename(*args, **kwargs)
def process_info(self, *args, **kwargs):
return self.yt_dlp.process_info(*args, **kwargs)
def get_info(self, url):
ydl_opts = {
'extract_flat': True,
'skip_download': True,
'ignoreerrors': True,
'logger': self.ydl_opts['logger'],
}
ydl = yt_dlp.YoutubeDL(ydl_opts)
return ydl.sanitize_info(ydl.extract_info(url, download=False))
def __call__(self, *args, **kwargs):
return self.yt_dlp.download(*args, **kwargs)
def update_ytdlp(): def update_ytdlp():
package_name = 'yt-dlp' old = subprocess.check_output('pip freeze | grep yt-dlp', shell=True).decode().strip('\n')
try: subprocess.run('if pip list --outdated | grep -q yt-dlp; then pip install --upgrade yt-dlp; fi', shell=True)
result = subprocess.run( new = subprocess.check_output('pip freeze | grep yt-dlp', shell=True).decode().strip('\n')
["pip", "install", "--disable-pip-version-check", "--upgrade", package_name], return old != new
capture_output=True,
text=True,
check=True
)
if f"Successfully installed {package_name}" in result.stdout:
# print(f"{package_name} was updated.")
return True
else:
# print(f"{package_name} was not updated.")
return False
except subprocess.CalledProcessError as e:
print(f"An error occurred while updating {package_name}:")
print(e.output)
return False
class ytdl_no_logger(object): class ytdl_no_logger(object):
@ -132,7 +81,3 @@ class ytdl_no_logger(object):
def error(self, msg): def error(self, msg):
return return
def get_output_templ(video_id: str = None, title: str = None, uploader: str = None, uploader_id: str = None, include_ext: bool = True):
return f'[{video_id if video_id else "%(id)s"}] [{title if title else "%(title)s"}] [{uploader if uploader else "%(uploader)s"}] [{uploader_id if uploader_id else "%(uploader_id)s"}]{".%(ext)s" if include_ext else ""}' # leading dash can cause issues due to bash args so we surround the variables in brackets