File-Related Utility Functions

igbpyutils.file.Filename

A type to represent filenames.

alias of str | PathLike

igbpyutils.file.BinaryStream

A type to represent binary file handles.

alias of IO[bytes] | RawIOBase | BufferedIOBase | GzipFile

igbpyutils.file.AnyPaths

A type to represent any path or iterable of paths.

Can be converted to Path objects with to_Paths().

alias of str | PathLike | bytes | Iterable[str | PathLike | bytes]

igbpyutils.file.to_Paths(paths: str | PathLike | bytes | Iterable[str | PathLike | bytes]) Generator[Path, None, None][source]
igbpyutils.file.to_Paths(paths: bytes) Generator[Path, None, None]
igbpyutils.file.to_Paths(paths: str) Generator[Path, None, None]
igbpyutils.file.to_Paths(paths: PathLike) Generator[Path, None, None]

Convert various inputs (AnyPaths) to Path objects.

igbpyutils.file.autoglob(files: Iterable[str], *, force: bool = False) Generator[str, None, None][source]

In Windows cmd.exe, automatically apply glob() and expanduser(), otherwise don’t change the input.

For example, take the following script:

>>> import argparse
... from igbpyutils.file import autoglob
... parser = argparse.ArgumentParser(description='Example')
... parser.add_argument('files', metavar="FILE", help="Files", nargs="+")
... args = parser.parse_args()
... paths = autoglob(args.files)

On a normal *NIX shell, calling this script as python script.py ~/*.py would result in args.files being a list of "/home/username/filename.py" strings if such files exist, or otherwise a single element of "/home/username/*.py". However, in a Windows cmd.exe shell, the aforementioned command always results in args.files being ['~/*.py']. This function fixes that, such that the behavior on Windows is the same as on Linux.

Note

This function now uses a heuristic check of the environment variables COMSPEC and SHELL to detect the current shell. Uncommon values in these variables may cause mis-detection; please feel free to submit patches if the detection does not work on your system.

igbpyutils.file.cmdline_rglob(paths: str | PathLike | bytes | Iterable[str | PathLike | bytes]) Generator[Path, None, None][source]

Given a list of filenames and directories, such as might be given on the command line, return each input item, and also return the result of Path.rglob('*') for each item that is a directory.

If the given list is empty, use Path() instead, i.e. the current directory, but only its contents are included in the output, not the directory itself; to get that you must explicitly pass the directory as an input.

pathlib.Path.absolute() is used to remove duplicates from the output to the best of its ability. This is used instead of pathlib.Path.resolve() because that resolves symlinks and therefore would cause unexpected results for programs that need to see symlinks.

Seealso:

autoglob() can be used on the list of paths before passing it to this function.

class igbpyutils.file.Pushd(newdir: str | PathLike)[source]

A context manager that temporarily changes the current working directory.

On Python >=3.11, this is simply an alias for contextlib.chdir().

igbpyutils.file.filetypestr(st: stat_result) str[source]

Return a string naming the file type reported by os.stat().

igbpyutils.file.is_windows_filename_bad(fn: str) bool[source]

Check whether a Windows filename is invalid.

Tests whether a filename contains invalid characters or has an invalid name, but does not check whether there are name collisions between filenames of differing case.

Reference: https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file

igbpyutils.file.open_out(filename: str | PathLike | None = None, mode='w', *, encoding='UTF-8', errors=None, newline=None)[source]

This context manager either opens the file specified and provides its file object, or, if the filename is not specified or it is the string "-", sys.stdout is provided.

Important

When sys.stdout is returned, the mode, encoding, errors, and newline arguments are ignored. Otherwise, when a file is opened, the default encoding is UTF-8, unless you change it.

igbpyutils.file.replacer(file: str | PathLike, *, binary: bool = False, encoding=None, errors=None, newline=None)[source]

Replace a file by renaming a temporary file over the original.

With this context manager, a temporary file is created in the same directory as the original file. The context manager gives you two file handles: the input file, and the output file, the latter being the temporary file. You can then read from the input file and write to the output file. When the context manager is exited, it will replace the input file with the temporary file. If an error occurs in the context manager, the temporary file is unlinked and the original file left unchanged.

Depending on the OS and file system, the os.replace() used here may be an atomic operation. However, this function doesn’t provide protection against multiple writers and is therefore intended for files with a single writer and multiple readers. Multiple writers will need to be coordinated with external locking mechanisms.

Attempt to atomically replace (or create) a symbolic link pointing to src named dst.

This function works by trying to choose a temporary filename for the link in the destination directory, and then replacing the target with that temporary link.

Depending on the OS and file system, the os.replace() used here may be an atomic operation. However, the surrounding operations (e.g. checking if dst exists etc.) present a small chance for race conditions, so this function is primarily suited for situations with a single writer and multiple readers. Multiple writers will need to be coordinated with external locking mechanisms.

Seealso:

replace_link() can do the same, but using a temporary directory instead of a temporary file in the same directory as the target file.

Attempt to atomically create or replace a hard or symbolic link pointing to src named dst.

This function works by creating the link in a new temporary directory first, thus offloading the responsibility for finding a fitting temporary name and cleanup to TemporaryDirectory.

Depending on the OS and file system, the os.replace() used here may be an atomic operation. However, this function doesn’t provide protection against multiple writers and is therefore intended for files with a single writer and multiple readers. Multiple writers will need to be coordinated with external locking mechanisms.

igbpyutils.file.NamedTempFileDeleteLater(*args, **kwargs) Generator[source]

A NamedTemporaryFile() that is unlinked on context manager exit, not on close.

On Python >=3.12, this simply calls tempfile.NamedTemporaryFile() with delete=True and the new delete_on_close=False.

igbpyutils.file.simple_perms(st_mode: int, *, group_write: bool = False) tuple[int, int][source]

This function tests a file’s permission bits to see if they are in a small set of “simple” permissions and suggests new permission bits if they are not.

Deprecated since version 0.5.0: Use https://pypi.org/project/simple-perms/ instead.

The set of “simple” permissions is (0o444, 0o555, 0o644, 0o755) or, when group_write is True, (0o444, 0o555, 0o664, 0o775).

Parameters:
Returns:

A tuple consisting of the file’s current permission and a suggested permission to use instead, based on the user’s permission bits and whether the file is a directory or not. The two values may be equal indicating that no change is suggested. No changes are suggested for symbolic links.

igbpyutils.file.simple_cache(cache_file: str | PathLike, *, verbose: bool = False) Callable[[Callable[[], _T]], Callable[[], _T]][source]

A very basic caching decorator for functions that take no arguments, intended for caching data that is expensive to generate.

On the first call of the function, its return value is saved to the specified file on disk via pickle, and on subsequent calls that file is loaded instead of calling the wrapped function. The original function can be called via the __wrapped__ attribute on the outer function. Currently, the only way to clear the cache is by deleting the file.

No file locking or other synchronization is performed, so this is likely not safe for threading or multiple processes.

No type checking is performed on the data loaded from the file.

Warning

Please see the security warnings in the pickle documentation!

For much more powerful caching and memoization, look at something like diskcache or similar modules.