Skip to content

Commit

Permalink
Merge pull request #105 from Ljzd-PRO/devel
Browse files Browse the repository at this point in the history
Bump to v0.6.0
  • Loading branch information
Ljzd-PRO authored May 5, 2024
2 parents bf8e4e9 + 69f7343 commit cc55ad6
Show file tree
Hide file tree
Showing 12 changed files with 254 additions and 169 deletions.
4 changes: 2 additions & 2 deletions .github/actions/setup-python/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ inputs:
python-version:
description: Python version
required: false
default: "3.10"
default: "3.11"

runs:
using: "composite"
Expand All @@ -14,7 +14,7 @@ runs:
run: pipx install poetry
shell: bash

- uses: actions/setup-python@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ inputs.python-version }}
architecture: "x64"
Expand Down
42 changes: 31 additions & 11 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,43 @@
## Changes

[//]: # (### 💡 Feature)
### 💡 Feature

- Add support for filename allow-list/block-list to filter downloaded files.
- Use Unix shell-style wildcards
- Edit `KTOOLBOX_JOB__ALLOW_LIST`, `KTOOLBOX_JOB__BLOCK_LIST` in `prod.env` or environment variables to set this option
- 📖More information: [Configuration-Reference-JobConfiguration](https://ktoolbox.readthedocs.io/latest/configuration/reference/#ktoolbox.configuration.JobConfiguration)
```dotenv
# Only download files that match these pattern
KTOOLBOX_JOB__ALLOW_LIST=["*.jpg","*.jpeg","*.png"]
# Not to download files that match these pattern
KTOOLBOX_JOB__BLOCK_LIST=["*.psd"]
```
- Default not to save `creator-indices.ktoolbox` (because it's useless now :(
### 🪲 Fix
- Fix `FileNotFoundError` occurred when filename contains special characters (#94)
- Fix `TypeError` occurred when using `--start-time`, `--end-time` options and posts had no `published` property (#93)
- Fixed incorrect argument order when using bucket storage (#89 - @Nacosia)
- Duplicate file check after HTTP connection started (#88)
- Fix missing `Post.file.name` may cause download file (`Post.file`) named to `None`
- - -
[//]: # (### 💡 新特性)
### 💡 新特性
- 增加文件名白名单/黑名单支持以进行下载文件的过滤
- 使用 Unix 风格通配符
- 在 `prod.env` 或环境变量中编辑 `KTOOLBOX_JOB__POST_DIRNAME_FORMAT` 以设置该选项
- 📖更多信息: [Configuration-Reference-JobConfiguration](https://ktoolbox.readthedocs.io/latest/configuration/reference/#ktoolbox.configuration.JobConfiguration)
```dotenv
# 只下载匹配这些模式的文件
KTOOLBOX_JOB__ALLOW_LIST=["*.jpg","*.jpeg","*.png"]
# 不下载匹配这些模式的文件
KTOOLBOX_JOB__BLOCK_LIST=["*.psd"]
```
- 默认不保存 `creator-indices.ktoolbox` (因为它现在已经没什么用了 :(
### 🪲 修复
- 修复当文件名包含特殊字符时会出现 `FileNotFoundError` 错误的问题 (#94)
- 修复当使用 `--start-time`, `--end-time` 参数且作品 `published` 属性不存在的情况下会出现 `TypeError` 错误的问题 (#93)
- 修复当使用桶储存时参数顺序不正确的问题 (#89 - @Nacosia)
- 在建立 HTTP 连接后进行重复文件检查 (#88)
- 修复缺失 `Post.file.name` 可能导致下载文件(`Post.file`)被命名为 `None`
**Full Changelog**: https://github.com/Ljzd-PRO/KToolBox/compare/v0.5.1...v0.5.2
**Full Changelog**: https://github.com/Ljzd-PRO/KToolBox/compare/v0.5.2...v0.6.0
16 changes: 15 additions & 1 deletion docs/en/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,18 @@ KTOOLBOX_JOB__POST_STRUCTURE__ATTACHMENTS=./

## Commands and flags should use `-` or `_` as seperator?

Both is support, `-` is suggested.
Both is support, `-` is suggested.

## Filename too long

In some cases, the filename or the post directory name can be too long and caused download failure.
To solve this issue, you can set **sequential filename** or use **custom post directory name**

Set the configuration by `prod.env` dotenv file or system environment variables:
```dotenv
# Rename attachments in numerical order, e.g. `1.png`, `2.png`, ...
KTOOLBOX_JOB__SEQUENTIAL_FILENAME=True
# Set the post directory name to its release/publish date and ID, e.g. `[2024-1-1]11223344`
KTOOLBOX_JOB__POST_DIRNAME_FORMAT=[{published}]{id}
```
3 changes: 0 additions & 3 deletions docs/en/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,3 @@ ktoolbox sync-creator https://kemono.su/fanbox/user/9016 --offset=10 --length=5
# Download posts from the creator/artist from 2024-1-1 to 2024-3-1
ktoolbox sync-creator https://kemono.su/fanbox/user/9016 --start-time=2024-1-1 --end-time=2024-3-1
```
??? info "About `creator-indices.ktoolbox` file"
By default, you will get a `creator-indices.ktoolbox` file in the creator directory,
it contains the information and filepath of posts inside the directory.
17 changes: 15 additions & 2 deletions docs/zh/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ uvloop 在 **Windows** 上 **不受支持**。如果你在 Linux 或 macOS 安

你可以设置配置选项 `job.post_structure.attachments``./`

通过 dotenv文件 `prod.env` 或系统环境变量来设置配置:
通过 dotenv 文件 `prod.env` 或系统环境变量来设置配置:
```dotenv
KTOOLBOX_JOB__POST_STRUCTURE__ATTACHMENTS=./
```
Expand All @@ -29,4 +29,17 @@ KTOOLBOX_JOB__POST_STRUCTURE__ATTACHMENTS=./

## 命令和标志(选项)应当使用 `-` 还是 `_` 作为分隔符?

两者都支持,推荐使用 `-`
两者都支持,推荐使用 `-`

## 文件名过长

在一些情况下,文件名或作品目录名过长而导致下载失败。为了解决这个问题,你可以设置 **序列化文件名** 或使用 **自定义作品目录名**

通过 dotenv 文件 `prod.env` 或系统环境变量来设置配置:
```dotenv
# 按照数字顺序重命名附件, 例如 `1.png`, `2.png`, ...
KTOOLBOX_JOB__SEQUENTIAL_FILENAME=True
# 设置作品目录名为其发布日期和ID,例如 `[2024-1-1]11223344`
KTOOLBOX_JOB__POST_DIRNAME_FORMAT=[{published}]{id}
```
2 changes: 0 additions & 2 deletions docs/zh/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,5 +101,3 @@ ktoolbox sync-creator https://kemono.su/fanbox/user/9016 --offset=10 --length=5
# 下载作者/画师从 2024-1-1 到 2024-3-1 的作品
ktoolbox sync-creator https://kemono.su/fanbox/user/9016 --start-time=2024-1-1 --end-time=2024-3-1
```
??? info "关于 `creator-indices.ktoolbox` 文件"
默认情况下你会在作者目录下得到一个 `creator-indices.ktoolbox` 文件,它包含目录下的所有作品的信息和路径。
2 changes: 1 addition & 1 deletion ktoolbox/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__title__ = "KToolBox"
# noinspection SpellCheckingInspection
__description__ = "A useful CLI tool for downloading posts in Kemono.party / .su"
__version__ = "0.5.2"
__version__ = "0.6.0"
79 changes: 51 additions & 28 deletions ktoolbox/action/job.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from datetime import datetime
from fnmatch import fnmatch
from itertools import count
from pathlib import Path
from typing import List, Union, Optional
Expand Down Expand Up @@ -48,35 +49,57 @@ async def create_job_from_post(
attachments_path = post_path
content_path = None

# Create jobs
# Filter and create jobs for ``Post.attachment``
jobs: List[Job] = []
for i, attachment in enumerate(post.attachments or []): # type: int, Attachment
for i, attachment in enumerate(post.attachments): # type: int, Attachment
if not attachment.path:
continue
if is_valid_filename(attachment.name):
alt_filename = f"{i + 1}{Path(attachment.name).suffix}" if config.job.sequential_filename \
else attachment.name
else:
attachment_path_without_params = urlparse(attachment.path).path
alt_filename = f"{i + 1}{Path(attachment_path_without_params).suffix}" if config.job.sequential_filename \
else None
jobs.append(
Job(
path=attachments_path,
alt_filename=alt_filename,
server_path=attachment.path,
type=PostFileTypeEnum.Attachment
)
file_path_obj = Path(attachment.name) if is_valid_filename(attachment.name) else Path(
urlparse(attachment.path).path
)
if post.file and post.file.path: # file
jobs.append(
Job(
path=post_path,
alt_filename=f"{post.id}_{post.file.name}",
server_path=post.file.path,
type=PostFileTypeEnum.File
if (not config.job.allow_list or any(
map(
lambda x: fnmatch(file_path_obj.name, x),
config.job.allow_list
)
)) and not any(
map(
lambda x: fnmatch(file_path_obj.name, x),
config.job.block_list
)
):
alt_filename = f"{i + 1}{file_path_obj.suffix}" if config.job.sequential_filename else file_path_obj.name
jobs.append(
Job(
path=attachments_path,
alt_filename=alt_filename,
server_path=attachment.path,
type=PostFileTypeEnum.Attachment
)
)

# Filter and create jobs for ``Post.file``
if post.file and post.file.path:
post_file_name = post.file.name or Path(post.file.path).name
if (not config.job.allow_list or any(
map(
lambda x: fnmatch(post_file_name, x),
config.job.allow_list
)
)) and not any(
map(
lambda x: fnmatch(post_file_name, x),
config.job.block_list
)
):
jobs.append(
Job(
path=post_path,
alt_filename=f"{post.id}_{post_file_name}",
server_path=post.file.path,
type=PostFileTypeEnum.File
)
)
)

# Write content file
if content_path and post.content:
Expand All @@ -99,7 +122,7 @@ async def create_job_from_creator(
all_pages: bool = False,
offset: int = 0,
length: Optional[int] = 50,
save_creator_indices: bool = True,
save_creator_indices: bool = False,
mix_posts: bool = None,
start_time: Optional[datetime],
end_time: Optional[datetime]
Expand All @@ -113,9 +136,9 @@ async def create_job_from_creator(
:param all_pages: Fetch all posts, ``offset`` and ``length`` will be ignored if enabled
:param offset: Result offset (or start offset)
:param length: The number of posts to fetch
:param save_creator_indices: Record ``CreatorIndices`` data for update posts from current creator directory
:param save_creator_indices: Record ``CreatorIndices`` data.
:param mix_posts: Save all files from different posts at same path, \
``update_from``, ``save_creator_indices`` will be ignored if enabled
``save_creator_indices`` will be ignored if enabled
:param start_time: Start time of the time range
:param end_time: End time of the time range
"""
Expand Down Expand Up @@ -152,7 +175,7 @@ async def create_job_from_creator(

# Filter posts and generate ``CreatorIndices``
if not mix_posts:
if save_creator_indices: # It's unnecessary to create indices again when ``update_from`` was provided
if save_creator_indices:
indices = CreatorIndices(
creator_id=creator_id,
service=service,
Expand Down
7 changes: 3 additions & 4 deletions ktoolbox/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,7 @@ async def sync_creator(
creator_id: str = None,
path: Union[Path, str] = Path("."),
*,
save_creator_indices: bool = True,
save_creator_indices: bool = False,
mix_posts: bool = None,
start_time: str = None,
end_time: str = None,
Expand All @@ -230,16 +230,15 @@ async def sync_creator(
You can update the directory anytime after download finished, \
such as to update after creator published new posts.
* If ``update_from`` was provided, the file should be located **inside the creator directory**.
* ``start_time`` & ``end_time`` example: ``2023-12-7``, ``2023-12-07``
:param url: The post URL
:param service: The service where the post is located
:param creator_id: The ID of the creator
:param path: Download path, default is current directory
:param save_creator_indices: Record ``CreatorIndices`` data for update posts from current creator directory
:param save_creator_indices: Record ``CreatorIndices`` data
:param mix_posts: Save all_pages files from different posts at same path, \
``update_from``, ``save_creator_indices`` will be ignored if enabled
``save_creator_indices`` will be ignored if enabled
:param start_time: Start time of the published time range for posts downloading. \
Set to ``0`` if ``None`` was given. \
Time format: ``%Y-%m-%d``
Expand Down
8 changes: 6 additions & 2 deletions ktoolbox/configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
import tempfile
import warnings
from pathlib import Path
from typing import Literal, Union, Optional
from typing import Literal, Union, Optional, Set

from loguru import logger
from pydantic import BaseModel, model_validator, field_validator
Expand Down Expand Up @@ -102,7 +102,7 @@ class PostStructureConfiguration(BaseModel):
..
├─ content.txt
├─ <Post file>
├─ <Post data (post.ktoolbox.json)>
├─ <Post data (post.json)>
└─ attachments
├─ 1.png
└─ 2.png
Expand Down Expand Up @@ -140,13 +140,17 @@ class JobConfiguration(BaseModel):
:ivar mix_posts: Save all files from different posts at same path in creator directory. \
It would not create any post directory, and ``CreatorIndices`` would not been recorded.
:ivar sequential_filename: Rename attachments in numerical order, e.g. ``1.png``, ``2.png``, ...
:ivar allow_list: Download files which match these patterns (Unix shell-style), e.g. ``["*.png"]``
:ivar block_list: Not to download files which match these patterns (Unix shell-style), e.g. ``["*.psd","*.zip"]``
"""
count: int = 4
post_id_as_path: bool = False
post_dirname_format: str = "{title}"
post_structure: PostStructureConfiguration = PostStructureConfiguration()
mix_posts: bool = False
sequential_filename: bool = False
allow_list: Set[str] = set()
block_list: Set[str] = set()

# job_list_filepath: Optional[Path] = None
# """Filepath for job list data saving, ``None`` for disable job list saving"""
Expand Down
Loading

0 comments on commit cc55ad6

Please sign in to comment.