Compare commits
No commits in common. "master" and "dev" have entirely different histories.
|
@ -1,6 +1,4 @@
|
|||
.idea
|
||||
targets.*
|
||||
!targets.sample.*
|
||||
|
||||
# ---> Python
|
||||
# Byte-compiled / optimized / DLL files
|
||||
|
|
|
@ -1,14 +1,21 @@
|
|||
# Example systemd Service
|
||||
|
||||
`/etc/systemd/system/youtube-dl.service`
|
||||
`/home/user/youtubedl-daemon.sh`
|
||||
```bash
|
||||
#!/bin/bash
|
||||
/usr/bin/python3 /home/user/automated-youtube-dl/downloader.py --daemon --sleep 60 "https://www.youtube.com/playlist?list=example12345" "/mnt/nfs/archive/YouTube/Example Playlist/"
|
||||
```
|
||||
|
||||
|
||||
|
||||
`/lib/systemd/system/youtubedl.service`
|
||||
```systemd
|
||||
[Unit]
|
||||
Description=Youtube-DL Daemon
|
||||
After=network-online.target
|
||||
|
||||
[Service]
|
||||
ExecStart=/usr/bin/python3 /home/user/automated-youtube-dl/downloader.py --daemon --silence-errors --sleep 60 "https://www.youtube.com/playlist?list=example12345" "/mnt/nfs/archive/YouTube/Example Playlist/"
|
||||
ExecStart=/home/user/youtubedl-daemon.sh
|
||||
User=user
|
||||
Group=user
|
||||
|
||||
|
@ -16,17 +23,9 @@ Group=user
|
|||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
Now start the service:
|
||||
Now start the service
|
||||
```bash
|
||||
chmod +x /home/user/youtubedl-daemon.sh
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable --now youtube-dl
|
||||
sudo systemctl enable --now youtubedl
|
||||
```
|
||||
|
||||
|
||||
|
||||
You can watch the process with:
|
||||
|
||||
```bash
|
||||
sudo journalctl -b -u youtube-dl.service
|
||||
```
|
||||
|
||||
|
|
179
README.md
179
README.md
|
@ -1,111 +1,68 @@
|
|||
# automated-youtube-dl
|
||||
|
||||
_Automated YouTube Archival._
|
||||
|
||||
A wrapper for youtube-dl used for keeping very large amounts of data from YouTube in sync. It's designed to be simple and easy to use.
|
||||
|
||||
I have a single, very large playlist that I add any videos I like to. This runs as a service on my NAS (see [Example systemd Service.md]).
|
||||
|
||||
---
|
||||
|
||||
## Project Status
|
||||
|
||||
This project is archived. I was working on a web interface for this project but decided to just use [tubearchivist](https://github.com/tubearchivist/tubearchivist) rather than write my own. If tubearchivist does not meet my needs then I will restart work on this project.
|
||||
|
||||
---
|
||||
|
||||
|
||||
### Features
|
||||
|
||||
- Uses yt-dlp instead of youtube-dl.
|
||||
- Skips videos that are already downloaded.
|
||||
- Automatically update yt-dlp on launch.
|
||||
- Download the videos in a format suitable for archiving:
|
||||
- Complex `format` that balances video quality and file size.
|
||||
- Embedding of metadata: chapters, thumbnail, english subtitles (automatic too), and YouTube metadata.
|
||||
- Log progress to a file.
|
||||
- Simple display using `tqdm`.
|
||||
- Limit the size of the downloaded videos.
|
||||
- Parallel downloads.
|
||||
- Daemon mode for running as a system service.
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
sudo apt update && sudo apt install ffmpeg atomicparsley phantomjs
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### Usage
|
||||
|
||||
This program has 3 modes:
|
||||
|
||||
<br>
|
||||
|
||||
**Direct-Download Mode:**
|
||||
|
||||
In this mode, you give the downloader a URL to the media you want to download.
|
||||
|
||||
`./downloader.py <video URL to download> --output <output directory>`
|
||||
|
||||
<br>
|
||||
|
||||
**Config-File Mode:**
|
||||
|
||||
In this mode, you give the downloader the path to a config file that contains the URLs of the media and where to download them to.
|
||||
|
||||
The config file can be a YAML file or a TXT file with the URL to download on each line.
|
||||
|
||||
When using the YAML file (see [targets.sample.yml]): `./downloader.py <path to the config file>`
|
||||
|
||||
When using a TXT file: `./downloader.py <path to the config file> --output <output directory>`
|
||||
|
||||
<br>
|
||||
|
||||
**Daemon Mode:**
|
||||
|
||||
In this mode, the downloader will loop over the media you give it and sleep for a certain number of minutes. It takes
|
||||
|
||||
To run as a daemon, do:
|
||||
|
||||
`/usr/bin/python3 /home/user/automated-youtube-dl/downloader.py --daemon --sleep 60 <video URL or config file path>`
|
||||
|
||||
`--sleep` is how many minutes to sleep after completing all downloads.
|
||||
|
||||
Daemon mode can take a URL (like direct-download mode) or a path to a config file (like config-file mode).
|
||||
|
||||
<br>
|
||||
|
||||
#### Folder Structure
|
||||
|
||||
```
|
||||
Output Directory/
|
||||
├─ logs/
|
||||
│ ├─ youtube_dl-<UNIX timestamp>.log
|
||||
│ ├─ youtube_dl-errors-<UNIX timestamp>.log
|
||||
├─ download-archive.log
|
||||
├─ Example Video.mkv
|
||||
```
|
||||
|
||||
`download-archive.log` contains the videos that have already been downloaded. You can import videos you've already downloaded by adding their ID to this file.
|
||||
|
||||
Videos will be saved using this name format:
|
||||
|
||||
```
|
||||
[%(id)s] [%(title)s] [%(uploader)s] [%(uploader_id)s]
|
||||
```
|
||||
|
||||
<br>
|
||||
|
||||
#### Arguments
|
||||
|
||||
| Argument | Flag | Help |
|
||||
| --------------------- | ---- | ------------------------------------------------------------ |
|
||||
| `--no-update` | `-n` | Don\'t update yt-dlp at launch. |
|
||||
| `--max-size` | | Max allowed size of a video in MB. Default: 1100. |
|
||||
| `--rm-cache` | `-r` | Delete the yt-dlp cache on start. |
|
||||
| `--threads` | | How many download processes to use (threads). Default is how many CPU cores you have. You will want to find a good value that doesn't overload your connection. |
|
||||
| `--daemon` | `-d` | Run in daemon mode. Disables progress bars and sleeps for the amount of time specified in `--sleep`. |
|
||||
| `--sleep` | | How many minutes to sleep when in daemon mode. |
|
||||
| `--silent` | `-s` | Don't print any error messages to the console. Errors will still be logged in the log files. |
|
||||
| `--ignore-downloaded` | `-i` | Ignore videos that have been already downloaded and let youtube-dl handle everything. Videos will not be re-downloaded, but metadata will be updated. |
|
||||
# automated-youtube-dl
|
||||
|
||||
_Automated YouTube Archival._
|
||||
|
||||
A wrapper for youtube-dl used for keeping very large amounts of data from YouTube in sync. It's designed to be simple and easy to use.
|
||||
|
||||
I have a single, very large playlist that I add any videos I like to. On my NAS is a service uses this program to download new videos (see [Example systemd Service.md]).
|
||||
|
||||
### Features
|
||||
|
||||
- Uses yt-dlp instead of youtube-dl.
|
||||
- Skip videos that are already downloaded which makes checking a playlist for new videos quick because youtube-dl doesn't have to fetch the entire playlist.
|
||||
- Automatically update yt-dlp on launch.
|
||||
- Download the videos in a format suitable for archiving:
|
||||
- Complex `format` that balances video quality and file size.
|
||||
- Embedding of metadata: chapters, thumbnail, english subtitles (automatic too), and YouTube metadata.
|
||||
- Log progress to a file.
|
||||
- Simple display using `tqdm`.
|
||||
- Limit the size of the downloaded videos.
|
||||
- Parallel downloads.
|
||||
- Daemon mode.
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
sudo apt update && sudo apt install ffmpeg atomicparsley
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### Usage
|
||||
|
||||
`./downloader.py <URL to download or path of a file containing the URLs of the videos to download> <output directory>`
|
||||
|
||||
To run as a daemon, do:
|
||||
|
||||
`/usr/bin/python3 /home/user/automated-youtube-dl/downloader.py --daemon --sleep 60 <url> <ouput folder>`
|
||||
|
||||
`--sleep` is how many minutes to sleep after completing all downloads.
|
||||
|
||||
#### Folder Structure
|
||||
|
||||
```
|
||||
Output Directory/
|
||||
├─ logs/
|
||||
│ ├─ youtube_dl-<UNIX timestamp>.log
|
||||
│ ├─ youtube_dl-errors-<UNIX timestamp>.log
|
||||
├─ download-archive.log
|
||||
├─ Example Video.mkv
|
||||
```
|
||||
|
||||
`download-archive.log` contains the videos that have already been downloaded. You can import videos you've already downloaded by adding their ID to this file.
|
||||
|
||||
Videos will be saved using this name format:
|
||||
|
||||
```
|
||||
%(title)s --- %(uploader)s --- %(uploader_id)s --- %(id)s
|
||||
```
|
||||
|
||||
#### Arguments
|
||||
|
||||
| Argument | Flag | Help |
|
||||
|---------------|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| `--no-update` | `-n` | Don\'t update yt-dlp at launch. |
|
||||
| `--max-size` | | Max allowed size of a video in MB. Default: 1100. |
|
||||
| `--rm-cache` | `-r` | Delete the yt-dlp cache on start. |
|
||||
| `--threads` | | How many download processes to use (threads). Default is how many CPU cores you have. You will want to find a good value that doesn't overload your connection. |
|
||||
| `--daemon` | `-d` | Run in daemon mode. Disables progress bars sleeps for the amount of time specified in --sleep. |
|
||||
| `--sleep` | | How many minutes to sleep when in daemon mode. |
|
487
downloader.py
487
downloader.py
|
@ -4,181 +4,87 @@ import logging.config
|
|||
import math
|
||||
import os
|
||||
import re
|
||||
import shutil
|
||||
import subprocess
|
||||
import sys
|
||||
import tempfile
|
||||
import time
|
||||
from multiprocessing import Manager, Pool, cpu_count
|
||||
from pathlib import Path
|
||||
from threading import Thread
|
||||
|
||||
import yaml
|
||||
from appdirs import user_data_dir
|
||||
from tqdm.auto import tqdm
|
||||
|
||||
import ydl.yt_dlp as ydl
|
||||
from process.funcs import get_silent_logger, remove_duplicates_from_playlist, restart_program, setup_file_logger
|
||||
from process.threads import bar_eraser, download_video
|
||||
from process.threads import download_video
|
||||
from ydl.files import create_directories, resolve_path
|
||||
from ydl.yt_dlp import YDL, update_ytdlp
|
||||
|
||||
# logging.basicConfig(level=1000)
|
||||
# logging.getLogger().setLevel(1000)
|
||||
|
||||
def signal_handler(sig, frame):
|
||||
# TODO: https://www.g-loaded.eu/2016/11/24/how-to-terminate-running-python-threads-using-signals/
|
||||
# raise ServiceExit
|
||||
sys.exit(0)
|
||||
|
||||
|
||||
# signal.signal(signal.SIGTERM, signal_handler)
|
||||
# signal.signal(signal.SIGINT, signal_handler)
|
||||
|
||||
url_regex = re.compile(r'^(?:http|ftp)s?://' # http:// or https://
|
||||
r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' # domain...
|
||||
r'localhost|' # localhost...
|
||||
r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip
|
||||
r'(?::\d+)?' # optional port
|
||||
r'(?:/?|[/?]\S+)$', re.IGNORECASE)
|
||||
ansi_escape_regex = re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])')
|
||||
urlRegex = re.compile(
|
||||
r'^(?:http|ftp)s?://' # http:// or https://
|
||||
r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' # domain...
|
||||
r'localhost|' # localhost...
|
||||
r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip
|
||||
r'(?::\d+)?' # optional port
|
||||
r'(?:/?|[/?]\S+)$', re.IGNORECASE)
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument('file', help='URL to download or path of a file containing the URLs of the videos to download.')
|
||||
parser.add_argument('--output', required=False, help='Output directory. Ignored paths specified in a YAML file.')
|
||||
parser.add_argument('output', help='Output directory.')
|
||||
parser.add_argument('--no-update', '-n', action='store_true', help='Don\'t update yt-dlp at launch.')
|
||||
parser.add_argument('--max-size', type=int, default=1100, help='Max allowed size of a video in MB.')
|
||||
parser.add_argument('--rm-cache', '-r', action='store_true', help='Delete the yt-dlp cache on start.')
|
||||
parser.add_argument('--threads', type=int, default=(cpu_count() - 1),
|
||||
help=f'How many download processes to use. Default: number of CPU cores (for your machine: {cpu_count()}) - 1 = {cpu_count() - 1}')
|
||||
parser.add_argument('--daemon', '-d', action='store_true',
|
||||
help="Run in daemon mode. Disables progress bars sleeps for the amount of time specified in --sleep.")
|
||||
parser.add_argument('--threads', type=int, default=cpu_count(), help='How many download processes to use.')
|
||||
parser.add_argument('--daemon', '-d', action='store_true', help="Run in daemon mode. Disables progress bars sleeps for the amount of time specified in --sleep.")
|
||||
parser.add_argument('--sleep', type=float, default=60, help='How many minutes to sleep when in daemon mode.')
|
||||
parser.add_argument('--download-cache-file-directory', default=user_data_dir('automated-youtube-dl', 'cyberes'),
|
||||
help='The path to the directory to track downloaded videos. Defaults to your appdata path.')
|
||||
parser.add_argument('--silence-errors', '-s', action='store_true',
|
||||
help="Don't print any error messages to the console.")
|
||||
parser.add_argument('--ignore-downloaded', '-i', action='store_true',
|
||||
help='Ignore videos that have been already downloaded and disable checks. Let youtube-dl handle everything.')
|
||||
parser.add_argument('--erase-downloaded-tracker', '-e', action='store_true', help='Erase the tracked video file.')
|
||||
parser.add_argument('--ratelimit-sleep', type=int, default=5,
|
||||
help='How many seconds to sleep between items to prevent rate-limiting. Does not affect time between videos as you should be fine since it takes a few seconds to merge everything and clean up.')
|
||||
parser.add_argument('--input-datatype', choices=['auto', 'txt', 'yaml'], default='auto',
|
||||
help='The datatype of the input file. If set to auto, the file will be scanned for a URL on the first line.'
|
||||
'If is a URL, the filetype will be set to txt. If it is a key: value pair then the filetype will be set to yaml.')
|
||||
parser.add_argument('--log-dir', default=None, help='Where to store the logs. Must be set when --output is not.')
|
||||
parser.add_argument('--verbose', '-v', action='store_true')
|
||||
parser.add_argument('--verify', '-z', action='store_true', help='Run ffprobe on the downloaded files.')
|
||||
parser.add_argument('--silence-errors', '-s', action='store_true', help="Don't print any error messages to the console.")
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.threads <= 0:
|
||||
print("Can't have 0 threads!")
|
||||
sys.exit(1)
|
||||
|
||||
if args.output:
|
||||
args.output = resolve_path(args.output)
|
||||
if args.log_dir:
|
||||
args.log_dir = resolve_path(args.log_dir)
|
||||
elif not args.output and not args.log_dir:
|
||||
args.log_dir = resolve_path(Path(os.getcwd(), 'automated-youtube-dl_logs'))
|
||||
# print('Must set --log-dir when --output is not.')
|
||||
# sys.exit(1)
|
||||
else:
|
||||
args.log_dir = args.output / 'logs'
|
||||
|
||||
args.download_cache_file_directory = resolve_path(args.download_cache_file_directory)
|
||||
|
||||
# TODO: use logging for this
|
||||
if args.verbose:
|
||||
print('Cache directory:', args.download_cache_file_directory)
|
||||
|
||||
args.output = resolve_path(args.output)
|
||||
log_time = time.time()
|
||||
|
||||
|
||||
def load_input_file():
|
||||
"""
|
||||
Get the URLs of the videos to download. Is the input a URL or file?
|
||||
"""
|
||||
url_list = {}
|
||||
if not re.match(url_regex, str(args.file)) or args.input_datatype in ('txt', 'yaml'):
|
||||
args.file = resolve_path(args.file)
|
||||
if not args.file.exists():
|
||||
print('Input file does not exist:', args.file)
|
||||
# Get the URLs of the videos to download. Is the input a URL or file?
|
||||
if not re.match(urlRegex, str(args.file)):
|
||||
args.file = resolve_path(args.file)
|
||||
if not args.file.exists():
|
||||
print('Input file does not exist:', args.file)
|
||||
sys.exit(1)
|
||||
url_list = [x.strip().strip('\n') for x in list(args.file.open())]
|
||||
# Verify each line in the file is a valid URL.
|
||||
for i, line in enumerate(url_list):
|
||||
if not re.match(urlRegex, line):
|
||||
print(f'Line {i} not a url:', line)
|
||||
sys.exit(1)
|
||||
input_file = [x.strip().strip('\n') for x in list(args.file.open())]
|
||||
if args.input_datatype == 'yaml' or (re.match(r'^.*?:\w*', input_file[0]) and args.input_datatype == 'auto'):
|
||||
with open(args.file, 'r') as file:
|
||||
try:
|
||||
url_list = yaml.safe_load(file)
|
||||
except yaml.YAMLError as e:
|
||||
print('Failed to load config file, error:', e)
|
||||
sys.exit(1)
|
||||
elif args.input_datatype == 'txt' or (re.match(url_regex, input_file[0]) and args.input_datatype == 'auto'):
|
||||
if not args.output:
|
||||
args.output = resolve_path(Path(os.getcwd(), 'automated-youtube-dl_output'))
|
||||
# print('You must specify an output path with --output when the input datatype is a text file.')
|
||||
# sys.exit(1)
|
||||
url_list[str(args.output)] = input_file
|
||||
else:
|
||||
print('Unknown file type:', args.input_datatype)
|
||||
print(input_file)
|
||||
sys.exit(1)
|
||||
del input_file # release file object
|
||||
# Verify each line in the file is a valid URL.
|
||||
# Also resolve the paths
|
||||
resolved_paths = {}
|
||||
for directory, urls in url_list.items():
|
||||
for item in urls:
|
||||
if not re.match(url_regex, str(item)):
|
||||
print(f'Not a url:', item)
|
||||
sys.exit(1)
|
||||
resolved_paths[resolve_path(directory)] = urls
|
||||
url_list = resolved_paths
|
||||
else:
|
||||
# They gave us just a URL
|
||||
if not args.output:
|
||||
# Set a default path
|
||||
args.output = resolve_path(Path(os.getcwd(), 'automated-youtube-dl_output'))
|
||||
# print('You must specify an output path with --output when the input is a URL.')
|
||||
# sys.exit(1)
|
||||
url_list[str(args.output)] = [args.file]
|
||||
return url_list
|
||||
|
||||
|
||||
url_list = load_input_file()
|
||||
|
||||
# Create directories AFTER loading the file
|
||||
create_directories(*url_list.keys(), args.download_cache_file_directory)
|
||||
|
||||
|
||||
def do_update():
|
||||
if not args.no_update:
|
||||
print('Updating yt-dlp...')
|
||||
updated = update_ytdlp()
|
||||
if updated:
|
||||
print('Restarting program...')
|
||||
restart_program()
|
||||
else:
|
||||
print('Up to date.')
|
||||
else:
|
||||
url_list = [args.file]
|
||||
|
||||
if not args.no_update:
|
||||
print('Checking if yt-dlp needs to be updated...')
|
||||
updated = ydl.update_ytdlp()
|
||||
if updated:
|
||||
print('Restarting program...')
|
||||
restart_program()
|
||||
|
||||
if args.rm_cache:
|
||||
subprocess.run('yt-dlp --rm-cache-dir', shell=True)
|
||||
|
||||
# TODO: compress old log files
|
||||
|
||||
if args.daemon:
|
||||
print('Running in daemon mode.')
|
||||
|
||||
create_directories(args.log_dir)
|
||||
log_dir = args.output / 'logs'
|
||||
create_directories(args.output, log_dir)
|
||||
|
||||
# TODO: log file rotation https://www.blog.pythonlibrary.org/2014/02/11/python-how-to-create-rotating-logs/
|
||||
# TODO: log to one file instead of one for each run
|
||||
file_logger = setup_file_logger('youtube_dl', args.log_dir / f'{str(int(log_time))}.log', level=logging.INFO)
|
||||
video_error_logger = setup_file_logger('video_errors', args.log_dir / f'{int(log_time)}-errors.log', level=logging.INFO)
|
||||
file_logger = setup_file_logger('youtube_dl', log_dir / f'youtube_dl-{str(int(log_time))}.log', level=logging.INFO)
|
||||
video_error_logger = setup_file_logger('youtube_dl_video_errors', log_dir / f'youtube_dl-errors-{int(log_time)}.log', level=logging.INFO)
|
||||
logger = get_silent_logger('yt-dl', silent=not args.daemon)
|
||||
|
||||
|
||||
def log_info_twice(msg):
|
||||
logger.info(msg)
|
||||
file_logger.info(ansi_escape_regex.sub('', msg))
|
||||
file_logger.info(msg)
|
||||
|
||||
|
||||
log_info_twice('Starting process.')
|
||||
|
@ -186,6 +92,8 @@ start_time = time.time()
|
|||
|
||||
manager = Manager()
|
||||
|
||||
download_archive_file = args.output / 'download-archive.log'
|
||||
|
||||
|
||||
def load_existing_videos():
|
||||
# Find existing videos.
|
||||
|
@ -194,39 +102,26 @@ def load_existing_videos():
|
|||
download_archive_file.touch()
|
||||
with open(download_archive_file, 'r') as file:
|
||||
output.update(([line.rstrip() for line in file]))
|
||||
|
||||
# Remove duplicate lines.
|
||||
# Something may have gone wrong in the past so we want to make sure everything is cleaned up.
|
||||
with open(download_archive_file) as file:
|
||||
uniqlines = set(file.readlines())
|
||||
fd, path = tempfile.mkstemp()
|
||||
with os.fdopen(fd, 'w') as tmp:
|
||||
tmp.writelines(set(uniqlines))
|
||||
shutil.move(path, download_archive_file)
|
||||
return output
|
||||
|
||||
|
||||
status_bar = tqdm(position=2, bar_format='{desc}', disable=args.daemon, leave=False)
|
||||
downloaded_videos = load_existing_videos()
|
||||
print('Found', len(downloaded_videos), 'downloaded videos.')
|
||||
|
||||
# Create this object AFTER reading in the download_archive.
|
||||
download_archive_logger = setup_file_logger('download_archive', download_archive_file, format_str='%(message)s')
|
||||
|
||||
status_bar = tqdm(position=2, bar_format='{desc}', disable=args.daemon)
|
||||
|
||||
|
||||
def log_bar(log_msg, level):
|
||||
status_bar.write(f'[{level}] {log_msg}')
|
||||
def log_bar(msg, level):
|
||||
status_bar.write(f'[{level}] {msg}')
|
||||
if level == 'warning':
|
||||
logger.warning(log_msg)
|
||||
logger.warning(msg)
|
||||
elif level == 'error':
|
||||
logger.error(log_msg)
|
||||
logger.error(msg)
|
||||
else:
|
||||
logger.info(log_msg)
|
||||
|
||||
|
||||
# def log_with_video_id(log_msg, video_id, level, logger_obj):
|
||||
# log_msg = f'{video_id} - {log_msg}'
|
||||
# if level == 'warning':
|
||||
# logger_obj.warning(log_msg)
|
||||
# elif level == 'error':
|
||||
# logger_obj.error(log_msg)
|
||||
# else:
|
||||
# logger_obj.info(log_msg)
|
||||
logger.info(msg)
|
||||
|
||||
|
||||
def print_without_paths(msg):
|
||||
|
@ -246,46 +141,33 @@ def print_without_paths(msg):
|
|||
|
||||
class ytdl_logger(object):
|
||||
def debug(self, msg):
|
||||
file_logger.debug(self.__clean_msg(msg))
|
||||
file_logger.debug(msg)
|
||||
# if msg.startswith('[debug] '):
|
||||
# pass
|
||||
if '[download]' not in msg:
|
||||
print_without_paths(msg)
|
||||
|
||||
def info(self, msg):
|
||||
file_logger.info(self.__clean_msg(msg))
|
||||
file_logger.info(msg)
|
||||
print_without_paths(msg)
|
||||
|
||||
def warning(self, msg):
|
||||
file_logger.warning(self.__clean_msg(msg))
|
||||
if args.daemon:
|
||||
logger.warning(msg)
|
||||
else:
|
||||
status_bar.write(msg)
|
||||
file_logger.warning(msg)
|
||||
log_bar(msg, 'warning')
|
||||
|
||||
def error(self, msg):
|
||||
file_logger.error(self.__clean_msg(msg))
|
||||
if args.daemon:
|
||||
logger.error(msg)
|
||||
else:
|
||||
status_bar.write(msg)
|
||||
file_logger.error(msg)
|
||||
log_bar(msg, 'error')
|
||||
|
||||
def __clean_msg(self, msg):
|
||||
return ansi_escape_regex.sub('', msg)
|
||||
|
||||
|
||||
# TODO: https://github.com/TheFrenchGhosty/TheFrenchGhostys-Ultimate-YouTube-DL-Scripts-Collection/blob/master/docs/Scripts-Type.md#archivist-scripts
|
||||
|
||||
# https://github.com/yt-dlp/yt-dlp#embedding-examples
|
||||
ydl_opts = {
|
||||
# TODO: https://github.com/TheFrenchGhosty/TheFrenchGhostys-Ultimate-YouTube-DL-Scripts-Collection/blob/master/docs/Details.md
|
||||
# https://old.reddit.com/r/DataHoarder/comments/c6fh4x/after_hoarding_over_50k_youtube_videos_here_is/
|
||||
'format': f'(bestvideo[filesize<{args.max_size}M][vcodec^=av01][height>=1080][fps>30]/bestvideo[filesize<{args.max_size}M][vcodec=vp9.2][height>=1080][fps>30]/bestvideo[filesize<{args.max_size}M][vcodec=vp9][height>=1080][fps>30]/bestvideo[filesize<{args.max_size}M][vcodec^=av01][height>=1080]/bestvideo[filesize<{args.max_size}M][vcodec=vp9.2][height>=1080]/bestvideo[filesize<{args.max_size}M][vcodec=vp9][height>=1080]/bestvideo[filesize<{args.max_size}M][height>=1080]/bestvideo[filesize<{args.max_size}M][vcodec^=av01][height>=720][fps>30]/bestvideo[filesize<{args.max_size}M][vcodec=vp9.2][height>=720][fps>30]/bestvideo[filesize<{args.max_size}M][vcodec=vp9][height>=720][fps>30]/bestvideo[filesize<{args.max_size}M][vcodec^=av01][height>=720]/bestvideo[filesize<{args.max_size}M][vcodec=vp9.2][height>=720]/bestvideo[filesize<{args.max_size}M][vcodec=vp9][height>=720]/bestvideo[filesize<{args.max_size}M][height>=720]/bestvideo[filesize<{args.max_size}M])+(bestaudio[acodec=opus]/bestaudio)/best',
|
||||
'outtmpl': f'{args.output}/[%(id)s] [%(title)s] [%(uploader)s] [%(uploader_id)s].%(ext)s', # leading dash can cause issues due to bash args so we surround the variables in brackets
|
||||
'merge_output_format': 'mkv',
|
||||
'logtostderr': True,
|
||||
'embedchapters': True,
|
||||
'writethumbnail': True,
|
||||
# Save the thumbnail to a file. Embedding seems to be broken right now so this is an alternative.
|
||||
'writethumbnail': True, # Save the thumbnail to a file. Embedding seems to be broken right now so this is an alternative.
|
||||
'embedthumbnail': True,
|
||||
'embeddescription': True,
|
||||
'writesubtitles': True,
|
||||
|
@ -293,187 +175,100 @@ ydl_opts = {
|
|||
'subtitlesformat': 'vtt',
|
||||
'subtitleslangs': ['en'],
|
||||
'writeautomaticsub': True,
|
||||
'writedescription': True,
|
||||
# 'writedescription': True,
|
||||
'ignoreerrors': True,
|
||||
'continuedl': False,
|
||||
'addmetadata': True,
|
||||
'writeinfojson': True,
|
||||
'verbose': args.verbose,
|
||||
'postprocessors': [
|
||||
{'key': 'FFmpegEmbedSubtitle'},
|
||||
{'key': 'FFmpegMetadata', 'add_metadata': True},
|
||||
{'key': 'EmbedThumbnail', 'already_have_thumbnail': True},
|
||||
{'key': 'FFmpegThumbnailsConvertor', 'format': 'jpg', 'when': 'before_dl'},
|
||||
# {'key': 'FFmpegSubtitlesConvertor', 'format': 'srt'}
|
||||
],
|
||||
# 'external_downloader': 'aria2c',
|
||||
# 'external_downloader_args': ['-j 32', '-s 32', '-x 16', '--file-allocation=none', '--optimize-concurrent-downloads=true', '--http-accept-gzip=true', '--continue=true'],
|
||||
}
|
||||
|
||||
yt_dlp = YDL(dict(ydl_opts, **{'logger': ytdl_logger()}))
|
||||
|
||||
url_count = 0
|
||||
for k, v in url_list.items():
|
||||
for item in v:
|
||||
url_count += 1
|
||||
main_opts = dict(ydl_opts, **{'logger': ytdl_logger()})
|
||||
# thread_opts = dict(ydl_opts, **{'logger': ydl.ytdl_no_logger()})
|
||||
yt_dlp = ydl.YDL(main_opts)
|
||||
|
||||
# Init bars
|
||||
playlist_bar = tqdm(position=1, desc='Playlist', disable=args.daemon)
|
||||
video_bars = manager.list()
|
||||
if not args.daemon:
|
||||
for i in range(args.threads):
|
||||
video_bars.append([3 + i, manager.Lock()])
|
||||
|
||||
encountered_errors = 0
|
||||
errored_videos = 0
|
||||
|
||||
# The video progress bars have an issue where when a bar is closed it
|
||||
# will shift its position back 1 then return to the correct position.
|
||||
# This thread will clear empty spots.
|
||||
if not args.daemon:
|
||||
eraser_exit = manager.Value(bool, False)
|
||||
Thread(target=bar_eraser, args=(video_bars, eraser_exit,)).start()
|
||||
|
||||
already_erased_downloaded_tracker = False
|
||||
video_bars.append([
|
||||
3 + i,
|
||||
manager.Lock()
|
||||
])
|
||||
|
||||
while True:
|
||||
# do_update() # this doesn't work very well. freezes
|
||||
progress_bar = tqdm(total=url_count, position=0, desc='Inputs', disable=args.daemon,
|
||||
bar_format='{l_bar}{bar}| {n_fmt}/{total_fmt}')
|
||||
for output_path, urls in url_list.items():
|
||||
for target_url in urls:
|
||||
logger.info('Fetching playlist...')
|
||||
playlist = yt_dlp.playlist_contents(str(target_url))
|
||||
for i, target_url in tqdm(enumerate(url_list), total=len(url_list), position=0, desc='Inputs', disable=args.daemon):
|
||||
logger.info('Fetching playlist...')
|
||||
playlist = yt_dlp.playlist_contents(target_url)
|
||||
playlist['entries'] = remove_duplicates_from_playlist(playlist['entries'])
|
||||
encountered_errors = 0
|
||||
errored_videos = 0
|
||||
|
||||
if not playlist:
|
||||
progress_bar.update()
|
||||
continue
|
||||
log_info_twice(f"Downloading item: '{playlist['title']}' {target_url}")
|
||||
|
||||
url_list = load_input_file()
|
||||
playlist_bar.total = len(playlist['entries'])
|
||||
playlist_bar.set_description(playlist['title'])
|
||||
|
||||
download_archive_file = args.download_cache_file_directory / (str(playlist['id']) + '.log')
|
||||
if args.erase_downloaded_tracker and not already_erased_downloaded_tracker:
|
||||
if download_archive_file.exists():
|
||||
os.remove(download_archive_file)
|
||||
already_erased_downloaded_tracker = True
|
||||
downloaded_videos = load_existing_videos()
|
||||
# print(playlist['entries'][0])
|
||||
# sys.exit()
|
||||
|
||||
msg = f'Found {len(downloaded_videos)} downloaded videos for playlist "{playlist["title"]}" ({playlist["id"]}). {"Ignoring." if args.ignore_downloaded else ""}'
|
||||
if args.daemon:
|
||||
logger.info(msg)
|
||||
else:
|
||||
progress_bar.write(msg)
|
||||
download_archive_logger = setup_file_logger('download_archive', download_archive_file,
|
||||
format_str='%(message)s')
|
||||
# Remove already downloaded files from the to-do list.
|
||||
download_queue = []
|
||||
s = set()
|
||||
for p, video in enumerate(playlist['entries']):
|
||||
if video['id'] not in downloaded_videos and video['id'] not in s:
|
||||
download_queue.append(video)
|
||||
s.add(video['id'])
|
||||
playlist_bar.update(len(downloaded_videos))
|
||||
|
||||
playlist['entries'] = remove_duplicates_from_playlist(playlist['entries'])
|
||||
if len(download_queue): # Don't mess with multiprocessing if all videos are already downloaded
|
||||
with Pool(processes=args.threads) as pool:
|
||||
status_bar.set_description_str('=' * os.get_terminal_size()[0])
|
||||
logger.info('Starting downloads...')
|
||||
for result in pool.imap_unordered(download_video,
|
||||
((video, {
|
||||
'bars': video_bars,
|
||||
'ydl_opts': ydl_opts,
|
||||
'output_dir': args.output,
|
||||
}) for video in download_queue)):
|
||||
# Save the video ID to the file
|
||||
if result['downloaded_video_id']:
|
||||
download_archive_logger.info(result['downloaded_video_id'])
|
||||
|
||||
log_info_twice(f'Downloading item: "{playlist["title"]}" ({playlist["id"]}) {target_url}')
|
||||
|
||||
# Remove already downloaded files from the to-do list.
|
||||
download_queue = []
|
||||
for p, video in enumerate(playlist['entries']):
|
||||
if video['id'] not in download_queue:
|
||||
if not args.ignore_downloaded and video['id'] not in downloaded_videos:
|
||||
download_queue.append(video)
|
||||
# downloaded_videos.add(video['id'])
|
||||
elif args.ignore_downloaded:
|
||||
download_queue.append(video)
|
||||
|
||||
playlist_bar = tqdm(total=len(playlist['entries']), position=1,
|
||||
desc=f'"{playlist["title"]}" ({playlist["id"]})', disable=args.daemon, leave=False)
|
||||
if not args.ignore_downloaded:
|
||||
playlist_bar.update(len(downloaded_videos))
|
||||
|
||||
playlist_ydl_opts = ydl_opts.copy()
|
||||
# playlist_ydl_opts['outtmpl'] = f'{output_path}/{get_output_templ()}'
|
||||
|
||||
if len(download_queue): # Don't mess with multiprocessing if all videos are already downloaded
|
||||
with Pool(processes=args.threads) as pool:
|
||||
if sys.stdout.isatty():
|
||||
# Doesn't work if not connected to a terminal:
|
||||
# OSError: [Errno 25] Inappropriate ioctl for device
|
||||
status_bar.set_description_str('=' * os.get_terminal_size()[0])
|
||||
logger.info('Starting downloads...')
|
||||
for result in pool.imap_unordered(download_video,
|
||||
((video, {
|
||||
'bars': video_bars,
|
||||
'ydl_opts': playlist_ydl_opts,
|
||||
'output_dir': Path(output_path),
|
||||
'ignore_downloaded': args.ignore_downloaded,
|
||||
'verify': args.verify
|
||||
}) for video in download_queue)):
|
||||
# Save the video ID to the file
|
||||
if result['downloaded_video_id']:
|
||||
download_archive_logger.info(result['downloaded_video_id'])
|
||||
|
||||
# Print short error messages.
|
||||
# An error should never be added to both video_critical_err_msg_short and video_critical_err_msg.
|
||||
for line in result['video_critical_err_msg_short']:
|
||||
# file_msg = f"{result['video_id']} - {ansi_escape_regex.sub('', line)}"
|
||||
# term_msg = f"{result['video_id']} - {line}"
|
||||
msg = f"{result['video_id']} - {line}"
|
||||
video_error_logger.error(msg)
|
||||
file_logger.error(msg)
|
||||
encountered_errors += 1
|
||||
# Print stuff
|
||||
for line in result['video_error_logger_msg']:
|
||||
video_error_logger.info(line)
|
||||
file_logger.error(line)
|
||||
encountered_errors += 1
|
||||
if not args.silence_errors:
|
||||
if args.daemon:
|
||||
logger.error(msg)
|
||||
logger.error(line)
|
||||
else:
|
||||
status_bar.write(msg)
|
||||
playlist_bar.write(line)
|
||||
if len(result['video_error_logger_msg']):
|
||||
errored_videos += 1
|
||||
|
||||
# Print longer error messages.
|
||||
# Won't print anything to console if the silence_errors arg is set.
|
||||
for line in result['video_critical_err_msg']:
|
||||
# file_msg = f"{result['video_id']} - {ansi_escape_regex.sub('', line)}"
|
||||
# term_msg = f"{result['video_id']} - {line}"
|
||||
msg = f"{result['video_id']} - {line}"
|
||||
video_error_logger.error(msg)
|
||||
file_logger.error(msg)
|
||||
encountered_errors += 1
|
||||
if not args.silence_errors:
|
||||
if args.daemon:
|
||||
logger.error(msg)
|
||||
else:
|
||||
status_bar.write(msg)
|
||||
# for line in result['status_msg']:
|
||||
# playlist_bar.write(line)
|
||||
for line in result['logger_msg']:
|
||||
log_info_twice(line)
|
||||
playlist_bar.update()
|
||||
else:
|
||||
playlist_bar.write(f"All videos already downloaded for '{playlist['title']}'.")
|
||||
|
||||
# if len(result['video_critical_err_msg']):
|
||||
# errored_videos += 1
|
||||
# if args.silence_errors and args.daemon:
|
||||
# logger.error(f"{result['video_id']} - Failed due to error.")
|
||||
error_msg = f'Encountered {encountered_errors} errors on {errored_videos} videos.'
|
||||
if args.daemon:
|
||||
logger.info(error_msg)
|
||||
else:
|
||||
playlist_bar.write(error_msg)
|
||||
|
||||
for line in result['logger_msg']:
|
||||
log_info_twice(f"{result['video_id']} - {line}")
|
||||
|
||||
# TODO: if no error launch a verify multiprocess
|
||||
# if kwargs['verify']:
|
||||
# try:
|
||||
# info = yt_dlp.extract_info(video['url'])
|
||||
# except Exception as e:
|
||||
# output_dict['video_critical_err_msg'].append(f'Failed to verify video, extract_info failed: {e}')
|
||||
# file_path = base_path + info['ext']
|
||||
# result = ffprobe(file_path)
|
||||
# if not result[0]:
|
||||
# output_dict['video_critical_err_msg'].append(f'Failed to verify video: {result[4]}')
|
||||
|
||||
playlist_bar.update()
|
||||
else:
|
||||
msg = f"All videos already downloaded for \"{playlist['title']}\"."
|
||||
if args.daemon:
|
||||
logger.info(msg)
|
||||
else:
|
||||
status_bar.write(msg)
|
||||
log_info_twice(f"Finished item: '{playlist['title']}' {target_url}")
|
||||
|
||||
# Sleep a bit to prevent rate-limiting
|
||||
if progress_bar.n < len(url_list.keys()) - 1:
|
||||
status_bar.set_description_str(f'Sleeping {args.ratelimit_sleep}s...')
|
||||
time.sleep(args.ratelimit_sleep)
|
||||
|
||||
progress_bar.update()
|
||||
error_msg = f'Encountered {encountered_errors} errors on {errored_videos} videos.'
|
||||
if args.daemon:
|
||||
logger.info(error_msg)
|
||||
else:
|
||||
status_bar.write(error_msg)
|
||||
log_info_twice(f"Finished item: '{playlist['title']}' {target_url}")
|
||||
log_info_twice(f"Finished process in {round(math.ceil(time.time() - start_time) / 60, 2)} min.")
|
||||
if not args.daemon:
|
||||
break
|
||||
|
@ -482,27 +277,13 @@ while True:
|
|||
try:
|
||||
time.sleep(args.sleep * 60)
|
||||
except KeyboardInterrupt:
|
||||
sys.exit(0)
|
||||
# downloaded_videos = load_existing_videos() # reload the videos that have already been downloaded
|
||||
sys.exit()
|
||||
downloaded_videos = load_existing_videos() # reload the videos that have already been downloaded
|
||||
|
||||
# Erase the status bar.
|
||||
status_bar.set_description_str('\x1b[2KDone!')
|
||||
status_bar.refresh()
|
||||
|
||||
# Clean up the remaining bars. Have to close them in order.
|
||||
# These variables may be undefined so we will just ignore any errors.
|
||||
# Not in one try/catch because we don't want to skip anything.
|
||||
try:
|
||||
eraser_exit.value = True
|
||||
except NameError:
|
||||
pass
|
||||
except AttributeError:
|
||||
pass
|
||||
try:
|
||||
playlist_bar.close()
|
||||
except NameError:
|
||||
pass
|
||||
except AttributeError:
|
||||
pass
|
||||
try:
|
||||
status_bar.close()
|
||||
except NameError:
|
||||
pass
|
||||
except AttributeError:
|
||||
pass
|
||||
playlist_bar.close()
|
||||
status_bar.close()
|
||||
|
|
|
@ -1,9 +1,7 @@
|
|||
import logging
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
|
||||
import ffmpeg
|
||||
import psutil
|
||||
|
||||
|
||||
|
@ -24,7 +22,7 @@ def restart_program():
|
|||
os.execl(python, python, *sys.argv)
|
||||
|
||||
|
||||
def setup_file_logger(name, log_file, level=logging.INFO, format_str: str = '%(asctime)s - %(name)s - %(levelname)s - %(message)s', filemode='a'):
|
||||
def setup_file_logger(name, log_file, level=logging.INFO, format_str: str = '%(asctime)s - %(name)s - %(levelname)s - %(message)s', filemode='a', no_console: bool = True):
|
||||
formatter = logging.Formatter(format_str)
|
||||
|
||||
logger = logging.getLogger(name)
|
||||
|
@ -42,21 +40,6 @@ def setup_file_logger(name, log_file, level=logging.INFO, format_str: str = '%(a
|
|||
return logger
|
||||
|
||||
|
||||
def ffprobe(filename):
|
||||
try:
|
||||
# stream = stream.output('pipe:', format="null")
|
||||
# stream.run(capture_stdout=True, capture_stderr=True)
|
||||
test = ffmpeg.probe(filename)
|
||||
except Exception as e:
|
||||
err = []
|
||||
for x in e.stderr.decode().split('\n'):
|
||||
if x.strip(' ') != '':
|
||||
err.append(x)
|
||||
err_msg = err[-1].split(': ')[-1]
|
||||
return False, filename, str(e), None, err_msg
|
||||
return True, filename, None, test, None
|
||||
|
||||
|
||||
def get_silent_logger(name, level=logging.INFO, format_str: str = '%(asctime)s - %(name)s - %(levelname)s - %(message)s', silent: bool = True):
|
||||
logger = logging.getLogger(name)
|
||||
console = logging.StreamHandler()
|
||||
|
@ -77,11 +60,3 @@ def remove_duplicates_from_playlist(entries):
|
|||
videos.append(video)
|
||||
s.add(video['id'])
|
||||
return videos
|
||||
|
||||
|
||||
def remove_special_chars_linux(string, special_chars: list = None):
|
||||
if special_chars is None:
|
||||
special_chars = ['\\', '`', '*', '_', '{', '}', '[', ']', '(', ')', '>', '#', '+', '-', '.', '!', '$', '\'']
|
||||
for char in special_chars:
|
||||
string = re.sub(re.escape(char), '', string)
|
||||
return string
|
||||
|
|
|
@ -1,46 +1,32 @@
|
|||
import math
|
||||
import os
|
||||
import random
|
||||
import subprocess
|
||||
import time
|
||||
import traceback
|
||||
from pathlib import Path
|
||||
|
||||
import numpy as np
|
||||
import yt_dlp as ydl_ydl
|
||||
from hurry.filesize import size
|
||||
from tqdm.auto import tqdm
|
||||
from unidecode import unidecode
|
||||
|
||||
import ydl.yt_dlp as ydl
|
||||
from process.funcs import remove_special_chars_linux, setup_file_logger
|
||||
from process.funcs import setup_file_logger
|
||||
|
||||
|
||||
class ytdl_logger(object):
|
||||
errors = []
|
||||
|
||||
def __init__(self, logger=None):
|
||||
def __init__(self, logger):
|
||||
self.logger = logger
|
||||
# logging.basicConfig(level=logging.DEBUG)
|
||||
# self.logger = logging
|
||||
# self.logger.info('testlog')
|
||||
|
||||
def debug(self, msg):
|
||||
if self.logger:
|
||||
self.logger.info(msg)
|
||||
self.logger.info(msg)
|
||||
|
||||
def info(self, msg):
|
||||
if self.logger:
|
||||
self.logger.info(msg)
|
||||
self.logger.info(msg)
|
||||
|
||||
def warning(self, msg):
|
||||
if self.logger:
|
||||
self.logger.warning(msg)
|
||||
self.logger.warning(msg)
|
||||
|
||||
def error(self, msg):
|
||||
if self.logger:
|
||||
self.logger.error(msg)
|
||||
self.errors.append(msg)
|
||||
self.logger.error(msg)
|
||||
self.errors.append(msg)
|
||||
|
||||
|
||||
def is_manager_lock_locked(lock) -> bool:
|
||||
|
@ -55,225 +41,62 @@ def is_manager_lock_locked(lock) -> bool:
|
|||
return False
|
||||
|
||||
|
||||
name_max = int(subprocess.check_output("getconf NAME_MAX /", shell=True).decode()) - 30
|
||||
|
||||
|
||||
def download_video(args) -> dict:
|
||||
# Sleep for a little bit to space out the rush of workers flooding the bar locks.
|
||||
# time.sleep(random.randint(1, 20) / 1000)
|
||||
|
||||
def progress_hook(d):
|
||||
# Variables can be None if the download hasn't started yet.
|
||||
# downloaded_bytes and total_bytes can be None if the download hasn't started yet.
|
||||
if d['status'] == 'downloading':
|
||||
total = None
|
||||
if d.get('downloaded_bytes'):
|
||||
# We want total_bytes but it may not exist so total_bytes_estimate is good too
|
||||
if d.get('total_bytes'):
|
||||
total = d.get('total_bytes')
|
||||
elif d.get('total_bytes_estimate'):
|
||||
total = d.get('total_bytes_estimate')
|
||||
|
||||
if total:
|
||||
if d.get('downloaded_bytes') and d.get('total_bytes'):
|
||||
downloaded_bytes = int(d['downloaded_bytes'])
|
||||
if total > 0:
|
||||
percent = (downloaded_bytes / total) * 100
|
||||
total_bytes = int(d['total_bytes'])
|
||||
if total_bytes > 0:
|
||||
percent = (downloaded_bytes / total_bytes) * 100
|
||||
bar.update(int(np.round(percent - bar.n))) # If the progress bar doesn't end at 100% then round to 1 decimal place
|
||||
bar.set_postfix({
|
||||
'speed': d['_speed_str'],
|
||||
'size': f"{size(d.get('downloaded_bytes'))}/{size(total)}",
|
||||
})
|
||||
else:
|
||||
bar.set_postfix({
|
||||
'speed': d['_speed_str'],
|
||||
'size': f"{d['_downloaded_bytes_str'].strip()}/{d['_total_bytes_str'].strip()}",
|
||||
})
|
||||
bar.set_postfix({
|
||||
'speed': d['_speed_str'],
|
||||
'size': f"{d['_downloaded_bytes_str'].strip()}/{d['_total_bytes_str'].strip()}",
|
||||
})
|
||||
|
||||
video = args[0]
|
||||
kwargs = args[1]
|
||||
|
||||
output_dict = {'downloaded_video_id': None, 'video_id': video['id'], 'video_critical_err_msg': [], 'video_critical_err_msg_short': [], 'status_msg': [], 'logger_msg': []} # empty object
|
||||
|
||||
if not kwargs['ignore_downloaded'] and not video['channel_id'] or not video['channel'] or not video['channel_url']:
|
||||
if video['duration'] or isinstance(video['view_count'], int):
|
||||
# Sometimes videos don't have channel_id, channel, or channel_url but are actually valid. Like shorts.
|
||||
pass
|
||||
else:
|
||||
output_dict['video_critical_err_msg_short'].append('unavailable.')
|
||||
return output_dict
|
||||
|
||||
# Clean of forign languages
|
||||
video['title'] = unidecode(video['title'])
|
||||
|
||||
# Get a bar
|
||||
locked = False
|
||||
if len(kwargs['bars']):
|
||||
bar_enabled = True
|
||||
got_lock = False
|
||||
while not got_lock: # Get a bar
|
||||
# We're going to wait until a bar is available for us to use.
|
||||
while not locked:
|
||||
for item in kwargs['bars']:
|
||||
if item[1].acquire(timeout=0.01):
|
||||
got_lock = True
|
||||
bar_offset = item[0]
|
||||
if not is_manager_lock_locked(item[1]):
|
||||
locked = item[1].acquire(timeout=0.1) # get the lock ASAP and don't wait if we didn't get it.
|
||||
offset = item[0]
|
||||
bar_lock = item[1]
|
||||
break
|
||||
else:
|
||||
time.sleep(random.uniform(0.1, 0.5))
|
||||
kwargs['ydl_opts']['progress_hooks'] = [progress_hook]
|
||||
desc_with = int(np.round(os.get_terminal_size()[0] * (1 / 4)))
|
||||
bar = tqdm(total=100, position=bar_offset, desc=f"{video['id']} - {video['title']}".ljust(desc_with)[:desc_with], bar_format='{l_bar}{bar}| {elapsed}<{remaining}{postfix}', leave=False)
|
||||
else:
|
||||
bar_enabled = False
|
||||
|
||||
# got_lock = False
|
||||
# # if len(kwargs['bars']):
|
||||
# while not got_lock: # We're going to wait until a bar is available for us to use.
|
||||
# for item in kwargs['bars']:
|
||||
# # if not is_manager_lock_locked(item[1]):
|
||||
# got_lock = item[1].acquire(timeout=0.01) # get the lock ASAP and don't wait if we didn't get it.
|
||||
#
|
||||
# if got_lock:
|
||||
# print('GOT LOCK:', video['id'])
|
||||
# # Now that we've gotten the lock, set some variables related to the bar
|
||||
# offset = item[0]
|
||||
# bar_lock = item[1]
|
||||
# break
|
||||
# else:
|
||||
# print('WAITING FOR LOCK:', video['id'])
|
||||
# time.sleep(uniform(0.1, 0.9))
|
||||
bar = tqdm(total=100, position=(offset if locked else None), desc=f"{video['id']} - {video['title']}".ljust(desc_with)[:desc_with], bar_format='{l_bar}{bar}| {n_fmt}/{total_fmt} [{elapsed}<{remaining}{postfix}]', leave=False)
|
||||
|
||||
ylogger = ytdl_logger(setup_file_logger(video['id'], kwargs['output_dir'] / f"[{video['id']}].log"))
|
||||
kwargs['ydl_opts']['logger'] = ylogger
|
||||
yt_dlp = ydl.YDL(kwargs['ydl_opts'])
|
||||
output_dict = {'downloaded_video_id': None, 'blacklist_video_id': None, 'video_error_logger_msg': [], 'status_msg': [], 'logger_msg': []} # empty object
|
||||
start_time = time.time()
|
||||
|
||||
try:
|
||||
kwargs['ydl_opts']['logger'] = ytdl_logger() # dummy silent logger
|
||||
yt_dlp = ydl.YDL(kwargs['ydl_opts'])
|
||||
video_n = yt_dlp.get_info(video['url'])
|
||||
|
||||
if not video_n:
|
||||
output_dict['video_critical_err_msg_short'].append('failed to get info. Unavailable?')
|
||||
if bar_enabled:
|
||||
bar.close()
|
||||
bar_lock.release()
|
||||
return output_dict
|
||||
|
||||
video_n['url'] = video['url']
|
||||
video = video_n
|
||||
del video_n
|
||||
|
||||
# We created a new dict
|
||||
video['title'] = unidecode(video['title'])
|
||||
video['uploader'] = unidecode(video['uploader']) # now this info is present since we fetched it
|
||||
|
||||
# TODO: do we also need to remove the @ char?
|
||||
video_filename = remove_special_chars_linux(
|
||||
ydl.get_output_templ(video_id=video['id'], title=video['title'], uploader=video['uploader'], uploader_id=video['uploader_id'], include_ext=False), special_chars=['/']
|
||||
)
|
||||
|
||||
# Make sure the video title isn't too long
|
||||
while len(video_filename) >= name_max - 3: # -3 so that I can add ...
|
||||
video['title'] = video['title'][:-1]
|
||||
video_filename = remove_special_chars_linux(
|
||||
ydl.get_output_templ(
|
||||
video_id=video['id'],
|
||||
title=video['title'] + '...',
|
||||
uploader=video['uploader'],
|
||||
uploader_id=video['uploader_id'],
|
||||
include_ext=False
|
||||
), special_chars=['/'])
|
||||
|
||||
base_path = str(Path(kwargs['output_dir'], video_filename))
|
||||
|
||||
kwargs['ydl_opts']['outtmpl'] = f"{base_path}.%(ext)s"
|
||||
|
||||
# try:
|
||||
# base_path = os.path.splitext(Path(kwargs['output_dir'], yt_dlp.prepare_filename(video)))[0]
|
||||
# except AttributeError:
|
||||
# # Sometimes we won't be able to pull the video info so just use the video's ID.
|
||||
# base_path = kwargs['output_dir'] / video['id']
|
||||
ylogger = ytdl_logger(setup_file_logger(video['id'], base_path + '.log'))
|
||||
kwargs['ydl_opts']['logger'] = ylogger
|
||||
with ydl_ydl.YoutubeDL(kwargs['ydl_opts']) as y:
|
||||
error_code = y.download(video['url'])
|
||||
# yt_dlp = ydl.YDL(kwargs['ydl_opts']) # recreate the object with the correct logging path
|
||||
# error_code = yt_dlp(video['url']) # Do the download
|
||||
|
||||
error_code = yt_dlp(video['url']) # Do the download
|
||||
if not error_code:
|
||||
elapsed = round(math.ceil(time.time() - start_time) / 60, 2)
|
||||
output_dict['logger_msg'].append(f"'{video['title']}' - Downloaded in {elapsed} min.")
|
||||
output_dict['logger_msg'].append(f"{video['id']} '{video['title']}' downloaded in {elapsed} min.")
|
||||
output_dict['downloaded_video_id'] = video['id']
|
||||
else:
|
||||
output_dict['video_critical_err_msg'] = output_dict['video_critical_err_msg'] + ylogger.errors
|
||||
except Exception:
|
||||
output_dict['video_critical_err_msg'].append(f"EXCEPTION -> {traceback.format_exc()}")
|
||||
if bar_enabled:
|
||||
bar.update(100 - bar.n)
|
||||
|
||||
if bar_enabled:
|
||||
# m = f'{video["id"]} {video["title"]} -> Failed to download, error code: {error_code}'
|
||||
# output_dict['status_msg'].append(m)
|
||||
# output_dict['video_error_logger_msg'].append(m)
|
||||
output_dict['video_error_logger_msg'] = output_dict['video_error_logger_msg'] + ylogger.errors
|
||||
except Exception as e:
|
||||
output_dict['video_error_logger_msg'].append(f"EXCEPTION -> {e}")
|
||||
if locked:
|
||||
bar.close()
|
||||
bar_lock.release()
|
||||
return output_dict
|
||||
|
||||
|
||||
def bar_eraser(video_bars, eraser_exit):
|
||||
while not eraser_exit.value:
|
||||
for i, item in enumerate(video_bars):
|
||||
if eraser_exit.value:
|
||||
return
|
||||
i = int(i)
|
||||
bar_lock = video_bars[i][1]
|
||||
if video_bars[i][1].acquire(timeout=0.1):
|
||||
bar = tqdm(position=video_bars[i][0], leave=False, bar_format='\x1b[2K')
|
||||
bar.close()
|
||||
bar_lock.release()
|
||||
|
||||
# Old queue and queue processor threads
|
||||
# manager = Manager()
|
||||
# queue = manager.dict()
|
||||
# queue_lock = manager.Lock()
|
||||
# def eraser():
|
||||
# nonlocal queue
|
||||
# try:
|
||||
# while not eraser_exit.value:
|
||||
# for i in queue.keys():
|
||||
# if eraser_exit.value:
|
||||
# return
|
||||
# i = int(i)
|
||||
# lock = video_bars[i][1].acquire(timeout=0.1)
|
||||
# bar_lock = video_bars[i][1]
|
||||
# if lock:
|
||||
# bar = tqdm(position=video_bars[i][0], leave=False, bar_format='\x1b[2K')
|
||||
# bar.close()
|
||||
# with queue_lock:
|
||||
# del queue_dict[i]
|
||||
# queue = queue_dict
|
||||
# bar_lock.release()
|
||||
# except KeyboardInterrupt:
|
||||
# sys.exit(0)
|
||||
# except multiprocessing.managers.RemoteError:
|
||||
# sys.exit(0)
|
||||
# except SystemExit:
|
||||
# sys.exit(0)
|
||||
#
|
||||
# try:
|
||||
# Thread(target=eraser).start()
|
||||
# while not eraser_exit.value:
|
||||
# for i, item in enumerate(video_bars):
|
||||
# if eraser_exit.value:
|
||||
# return
|
||||
# # Add bars to the queue
|
||||
# if is_manager_lock_locked(item[1]):
|
||||
# with queue_lock:
|
||||
# queue_dict = queue
|
||||
# queue_dict[i] = True
|
||||
# queue = queue_dict
|
||||
# except KeyboardInterrupt:
|
||||
# sys.exit(0)
|
||||
# except multiprocessing.managers.RemoteError:
|
||||
# sys.exit(0)
|
||||
# except SystemExit:
|
||||
# sys.exit(0)
|
||||
|
||||
|
||||
class ServiceExit(Exception):
|
||||
"""
|
||||
Custom exception which is used to trigger the clean exit
|
||||
of all running threads and the main program.
|
||||
"""
|
||||
pass
|
||||
|
|
|
@ -2,10 +2,4 @@ yt-dlp
|
|||
psutil
|
||||
tqdm
|
||||
mergedeep
|
||||
numpy
|
||||
pyyaml
|
||||
appdirs
|
||||
phantomjs
|
||||
unidecode
|
||||
ffmpeg-python
|
||||
hurry.filesize
|
||||
numpy
|
|
@ -1 +0,0 @@
|
|||
https://www.youtube.com/playlist?list=example1234
|
|
@ -1,5 +0,0 @@
|
|||
/path/to/storage/Example Playlist:
|
||||
- https://www.youtube.com/playlist?list=ExamplePlaylist1234
|
||||
|
||||
/path/to/storage/Music:
|
||||
- https://www.youtube.com/MyMusicPlaylist1234
|
|
@ -7,11 +7,8 @@ from mergedeep import merge
|
|||
|
||||
|
||||
class YDL:
|
||||
def __init__(self, ydl_opts: dict = None, extra_ydlp_opts: dict = None):
|
||||
self.ydl_opts = ydl_opts if ydl_opts else {}
|
||||
extra_ydlp_opts = extra_ydlp_opts if extra_ydlp_opts else {}
|
||||
self.ydl_opts = merge(ydl_opts, extra_ydlp_opts)
|
||||
self.ydl_opts['logger'] = self.ydl_opts.get('logger')
|
||||
def __init__(self, ydl_opts):
|
||||
self.ydl_opts = ydl_opts
|
||||
self.yt_dlp = yt_dlp.YoutubeDL(ydl_opts)
|
||||
|
||||
def get_formats(self, url: Union[str, Path]) -> tuple:
|
||||
|
@ -32,30 +29,17 @@ class YDL:
|
|||
sizes.append(d)
|
||||
return tuple(sizes)
|
||||
|
||||
def playlist_contents(self, url: str) -> Union[dict, bool]:
|
||||
ydl_opts = {
|
||||
def playlist_contents(self, url: str) -> dict:
|
||||
ydl_opts = merge({
|
||||
'extract_flat': True,
|
||||
'skip_download': True,
|
||||
'ignoreerrors': True,
|
||||
'logger': self.ydl_opts['logger'],
|
||||
}
|
||||
'skip_download': True
|
||||
}, self.ydl_opts)
|
||||
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
|
||||
info = self.get_info(url)
|
||||
if not info:
|
||||
return False
|
||||
info = ydl.sanitize_info(ydl.extract_info(url, download=False))
|
||||
entries = []
|
||||
if info['_type'] == 'playlist':
|
||||
if 'entries' in info.keys():
|
||||
# When downloading a channel youtube-dl returns a playlist for videos and another for shorts.
|
||||
# We need to combine all the videos into one list.
|
||||
for item in info['entries']:
|
||||
if item['_type'] in ('video', 'url'):
|
||||
entries.append(item)
|
||||
elif item['_type'] == 'playlist':
|
||||
for video in self.get_info(item['webpage_url'])['entries']:
|
||||
entries.append(video)
|
||||
else:
|
||||
raise ValueError(f"Unknown sub-media type: {item['_type']}")
|
||||
entries = [x for x in info['entries']]
|
||||
elif info['_type'] == 'video':
|
||||
# `info` doesn't seem to contain the `url` key so we'll add it manually.
|
||||
# If any issues arise in the future make sure to double check there isn't any weirdness going on here.
|
||||
|
@ -69,55 +53,20 @@ class YDL:
|
|||
'entries': entries,
|
||||
}
|
||||
|
||||
def __call__(self, *args, **kwargs):
|
||||
return self.yt_dlp.download(*args, **kwargs)
|
||||
|
||||
# def filter_filesize(self, info, *, incomplete):
|
||||
# duration = info.get('duration')
|
||||
# if duration and duration < 60:
|
||||
# return 'The video is too short'
|
||||
|
||||
def extract_info(self, *args, **kwargs):
|
||||
return self.yt_dlp.extract_info(*args, **kwargs)
|
||||
|
||||
def prepare_filename(self, *args, **kwargs):
|
||||
return self.yt_dlp.prepare_filename(*args, **kwargs)
|
||||
|
||||
def process_info(self, *args, **kwargs):
|
||||
return self.yt_dlp.process_info(*args, **kwargs)
|
||||
|
||||
def get_info(self, url):
|
||||
ydl_opts = {
|
||||
'extract_flat': True,
|
||||
'skip_download': True,
|
||||
'ignoreerrors': True,
|
||||
'logger': self.ydl_opts['logger'],
|
||||
}
|
||||
ydl = yt_dlp.YoutubeDL(ydl_opts)
|
||||
return ydl.sanitize_info(ydl.extract_info(url, download=False))
|
||||
|
||||
def __call__(self, *args, **kwargs):
|
||||
return self.yt_dlp.download(*args, **kwargs)
|
||||
|
||||
|
||||
def update_ytdlp():
|
||||
package_name = 'yt-dlp'
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["pip", "install", "--disable-pip-version-check", "--upgrade", package_name],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
check=True
|
||||
)
|
||||
|
||||
if f"Successfully installed {package_name}" in result.stdout:
|
||||
# print(f"{package_name} was updated.")
|
||||
return True
|
||||
else:
|
||||
# print(f"{package_name} was not updated.")
|
||||
return False
|
||||
|
||||
except subprocess.CalledProcessError as e:
|
||||
print(f"An error occurred while updating {package_name}:")
|
||||
print(e.output)
|
||||
return False
|
||||
old = subprocess.check_output('pip freeze | grep yt-dlp', shell=True).decode().strip('\n')
|
||||
subprocess.run('if pip list --outdated | grep -q yt-dlp; then pip install --upgrade yt-dlp; fi', shell=True)
|
||||
new = subprocess.check_output('pip freeze | grep yt-dlp', shell=True).decode().strip('\n')
|
||||
return old != new
|
||||
|
||||
|
||||
class ytdl_no_logger(object):
|
||||
|
@ -132,7 +81,3 @@ class ytdl_no_logger(object):
|
|||
|
||||
def error(self, msg):
|
||||
return
|
||||
|
||||
|
||||
def get_output_templ(video_id: str = None, title: str = None, uploader: str = None, uploader_id: str = None, include_ext: bool = True):
|
||||
return f'[{video_id if video_id else "%(id)s"}] [{title if title else "%(title)s"}] [{uploader if uploader else "%(uploader)s"}] [{uploader_id if uploader_id else "%(uploader_id)s"}]{".%(ext)s" if include_ext else ""}' # leading dash can cause issues due to bash args so we surround the variables in brackets
|
||||
|
|
Reference in New Issue