Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to upload artifacts to Test PyPI or PyPI - Invalid Distribution Metadata: unrecognized or malformed field: 'license-file' #1216

Closed
1 task done
distortedsignal opened this issue Jan 22, 2025 · 10 comments · May be fixed by #1217
Labels
support Users asking for help using twine

Comments

@distortedsignal
Copy link

distortedsignal commented Jan 22, 2025

Is there an existing issue for this?

  • I have searched the existing issues (open and closed), and could not find an existing issue

What keywords did you use to search existing issues?

InvalidDistribution
Error
license-file

What operating system(s) are you using?

Linux

If you selected 'Other', describe your Operating System here

No response

What version of Python are you running?

$python --version
Python 3.13.1

How did you install twine? Did you use your operating system's package manager or pip or something else?

$ python -m pip install twine

What version of twine do you have installed (include the complete output)

twine version 6.1.0 (keyring: 25.6.0, packaging: 24.2, requests: 2.32.3, requests-toolbelt: 1.0.0, urllib3: 2.3.0, id: 1.5.0)

Which package repository are you using?

upload.testpypi.org

Please describe the issue that you are experiencing

When I run

> python -m twine upload --verbose --repository test-jh-sa dist/*

I get the output

INFO Using configuration from ~/.pypirc
ERROR InvalidDistribution: Invalid distribution Invalid distribution metadata: unrecognized or malformed field 'license-file'

The mentioned file looks like this:

[distutils]
  index-servers =
    ...
    test-jh-sa

...
[test-jh-sa]
  repository = https://upload.testpypi.org/legacy

Please list the steps required to reproduce this behaviour

  1. Build the package at https://github.com/HewlettPackard/jupyterhub-samlauthenticator/ with python -m build
  2. Attempt to upload to pypi with Twine using python -m twine upload ...

Please include the PKG-INFO file contents from the artifact you're attempting to upload

Metadata-Version: 2.2
Name: jupyterhub-samlauthenticator
Version: 0.0.10
Summary: SAML Authenticator for JupyterHub
Home-page: https://github.com/bluedatainc/jupyterhub-samlauthenticator
Author: Tom Kelley
Author-email: Tom Kelley <[email protected]>
License: MIT
Project-URL: Homepage, https://github.com/Hewlett-Packard/jupyterhub-samlauthenticator
Project-URL: Issues, https://github.com/Hewlett-Packard/jupyterhub-samlauthenticator/issues
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE.txt
Dynamic: author
Dynamic: home-page

<!---
(C) Copyright 2019 Hewlett Packard Enterprise Development LP

Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the "Software"),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
--->
# SAMLAuthenticator for JupyterHub

[![Build Status](https://travis-ci.com/bluedatainc/jupyterhub-samlauthenticator.svg?branch=master)](https://travis-ci.com/bluedatainc/jupyterhub-samlauthenticator)
[![codecov](https://codecov.io/gh/bluedatainc/jupyterhub-samlauthenticator/branch/master/graph/badge.svg)](https://codecov.io/gh/bluedatainc/jupyterhub-samlauthenticator)
[![PyPI](https://img.shields.io/pypi/v/jupyterhub-samlauthenticator.svg)](https://pypi.python.org/pypi/jupyterhub-samlauthenticator)

This is a SAML Authenticator for JupyterHub. With this code (and a little elbow grease), you can integrate your JupyterHub instance with a previously setup SAML Single Sign-on system!

## Set Up

This set up section assumes that python 3.6+, pip, and JupyterHub are already set up on the target machine.

If the `jupyterhub_config.py` file has not been generated, this would be a good time to generate it. For a primer on generating the config file, read [here](https://jupyterhub.readthedocs.io/en/stable/getting-started/config-basics.html).

Currently, this Authenticator relies on the IdP being set up beforehand. This Authenticator ONLY supports HTTP-POST based authentication, and ONLY receives SAML Responses at the `/login` and `/hub/login` urls. There are currently no plans to support HTTP-Redirect based authentication or SOAP-based services.

### Installation

In the context in which JupyterHub will be run, install the SAML Authenticator.


pip install jupyterhub-samlauthenticator


### Configuration

Open the `jupyterhub_config.py` file in an available text editor.

Change the configured value of the `authenticator_class` to be `samlauthenticator.SAMLAuthenticator`.

Configure one of the accepted metadata sources. The SAMLAuthenticator can get metadata from three sources:
1. The most preferable option is to configure the SAMLAuthenticator to use a metadata file. This can be done by setting the `metadata_filepath` field of the `SAMLAuthenticator` class to the *_fully justified filepath_* of the metadata file.
1. Another option is to dump the full metadata xml into the JupyterHub configuration file. This is not great because it clutters up the configuration file with a lot of extraneous data. This can be done by setting the `metadata_content` field of the SAMLAuthenticator class.
1. Finally, the least preferable option of the three is to get the metadata from a web request each time a user attempts to log into the server. This is _not recommended_ because DNS poisoning attacks could let a malicious actor impersonate the IdP and gain access to any user private files on the server. However, if this is the configuration that is required, set the `metadata_url` field and the metadata will be refreshed every time a user attempts to log in to the JupyterHub server.

This is all the configuration the Authenticator _usually_ requires, but there are more configuration options to go through.

#### Optional Configuration

If the user that should be created and logged in from a given SAML Response is _not_ specified by the NameID element in the SAML Assertion, an alternate field can be specified. Replace the `xpath_username_location` field in the `SAMLAuthenticator` with an XPath that points to the desired field in the SAML Assertion. Note that this value must be able to be compiled to an XPath by Python's `lxml` module. The namespaces that will be present for this XPath are as follows:


{
    'ds'   : 'http://www.w3.org/2000/09/xmldsig#',
    'saml' : 'urn:oasis:names:tc:SAML:2.0:assertion',
    'samlp': 'urn:oasis:names:tc:SAML:2.0:protocol'
}


The SAMLAuthenticator expects the SAML Response to be in the `SAMLResponse` field of the POST request that the user makes to authenticate themselves. If this expectation does not hold for a given environment, then the `login_post_field` property of the SAMLAuthenticator should be set to the correct field.

A SAML Audience and Recipient can be defined on the IdP to prevent a malicious service from using a SAML Response to inappropriately authenticate to non-malicious services. If either of these values is set by the IdP, they can be checked by setting the `audience` and `recipient` fields on the SAMLAuthenticator.

By default, the SAMLAuthenticator expects the `NotOnOrAfter` and `NotBefore` fields to be of the format `{four-digit-year}-{two-digit-month}-{two-digit-day}T{two-digit-24-hour-hour-value}:{two-digit-minute}:{two-digit-second}Z` where T and Z are character literals. If this is not a good assumption, an alternate time string can be provided by setting the `time_format_string` value of the SAMLAuthenticator. This string will be consumed by Python's [`datetime.strptime()`](https://docs.python.org/3.6/library/datetime.html#datetime.datetime.strptime), so it might be helpful to read up on [the `strftime()` and `strptime()` behavior](https://docs.python.org/3.6/library/datetime.html#strftime-strptime-behavior).

If the timezone being passed in by the `NotOnOrAfter` and `NotBefore` fields cannot be read by `strptime()`, don't fear! So long as the timezone that the IdP resides in is known, it's possible to set the IdP's timezone. Set the `idp_timezone` field to a string that uniquely designates a timezone that can be looked up by [`pytz`](https://pypi.org/project/pytz/), and login should be able to continue.

If an IdP MUST be configured to use a SAML entity id other than the protocol, url, and port number of the JupyterHub install, the `entity_id` field of the SAML Authenticator should be set. This should be a unique string that uniquely identifies the Service Provider in the SAML Architecture.

If the JupyterHub instance is sitting behind a proxy or if the `entity_id` provided above is not a url that refers to where the JupyterHub instance is listening, the `acs_endpoint_url` MUST be set. This is where a user should POST data to complete a SAML Login procedure.

The `organization_name`, `organization_display_name`, and `organization_url` are populated directly from the SAML Authenticator into the SAML SP metadata. If ANY of these values are present, there WILL BE an organization subsection in the SP metadata, and the organization subsection will have an element for each value that is populated. The organization will not have an element for any of the values that are not populated.

The following two configurations are _usually_ on logout handlers, but because SAML is a special login method, we put these on the Authenticator.

If the user's servers should be shut down when they logout, set `shutdown_on_logout` to `True`. This stops all servers that the user was running as part of their session. It is a somewhat dangerous to set this option to `True` because a user may not be done with computations that they are running on those servers.

The SAMLAuthenticator _usually_ attempts to forward users to the SLO URI set in the SAML Metadata. If this is not the desired behavior for whatever reason, set `slo_forward_on_logout` to `False`. This will change the page the user is forwarded to on logout from the page specified in the xml metadata to the standard jupyterhub logout page.

The SAMLAuthenticator creates system users by default on successful authentication. If you are running JupyterHub as a non-root user, you may need to turn off this functionality by setting `create_system_users` to `False`.

The default nameid format that the SAMLAuthenticator expects is defined by the SAML Spec as `urn:oasis:names:tc:SAML:2.0:nameid-format:transient`. This can be changed by setting the `nameid_format` field on the SAMLAuthenticator in the JupyterHub Config file.

If the server administrator wants to create local users for each JupyterHub user but doesn't want to use the `useradd` utility, a user can be added with any binary on the host system Set the `create_system_user_binary` field to either a) a full path to the binary or b) the name of a binary on the host's path. Please note, if the binary exits with code 0, the Authenticator will assume that the user add succeeded, and if the binary exits with any code _other than 0_, it will be assumed that creating the user failed.

Access is given to all users who successfully authenticate regardless of their role or group membership by default. Set the `allowed_roles` field to restrict access to JupyterHub to specific roles. Users with any of the specified roles will be authorized to access JupyterHub. The `xpath_role_location` field can be configured to set the location of the users roles in the SAML response.

#### Example Configurations


# A simple example configuration.
## Class for authenticating users.
c.JupyterHub.authenticator_class = 'samlauthenticator.SAMLAuthenticator'

# Where the SAML IdP's metadata is stored.
c.SAMLAuthenticator.metadata_filepath = '/etc/jupyterhub/metadata.xml'



# A complex example configuration.
## Class for authenticating users.
c.JupyterHub.authenticator_class = 'samlauthenticator.SAMLAuthenticator'

# Where the SAML IdP's metadata is stored.
c.SAMLAuthenticator.metadata_filepath = '/etc/jupyterhub/metadata.xml'

# A field was placed in the SAML Response that contains the user's first name and last name separated by a period.
# Let's use that for the username.
c.SAMLAuthenticator.xpath_username_location = '//saml:Attribute[@Name="DottedName"]/saml:AttributeValue/text()'

# Path to the group/role membership in the SAML response.
c.SAMLAuthenticator.xpath_role_location = '//saml:Attribute[@Name="Roles"]/saml:AttributeValue/text()'

# Comma-separated list of authorized roles. Allows all if not specified.
c.SAMLAuthenticator.allowed_roles = 'group1,group2'

# The IdP is sending the SAML Response in a field named 'R'
c.SAMLAuthenticator.login_post_field = 'R'

# We want to make sure that we're the only one receiving this SAML Response
c.SAMLAuthenticator.audience = 'jupyterhub.myorg.com'
c.SAMLAuthenticator.recipient = 'https://jupyterhub.myorg.com/hub/login'

# The IdP is sending dates in the form 'Tue July 20, 2020 18:30:21'
c.SAMLAuthenticator.time_format_string = '%a %B %d, %Y %H:%M%S'

# Looks like we can't get the timezone from the previous string - we need to set it
c.SAMLAuthenticator.idp_timezone = 'US/Eastern'

# Shutdown all servers when the user logs out
c.SAMLAuthenticator.shutdown_on_logout = True

# Don't send the user to the SLO address on logout
c.SAMLAuthenticator.slo_forward_on_logout = False

# A corporate entity has specified a new entity id for this JupyterHub instance
c.SAMLAuthenticator.entity_id = '6d112afe-0544-4e8e-8b7e-21e6f57763f9'

# Because the entity id isn't a url, we need to set the acs endpoint url
c.SAMLAuthenticator.acs_endpoint_url = 'https://10.0.31.2:8000/hub/login'

# We need these organization values too.
c.SAMLAuthenticator.organization_name = 'My Org'
c.SAMLAuthenticator.organization_display_name = '''My Org's Display Name'''
c.SAMLAuthenticator.organization_url = 'https://myorg.com'

# Turn off system user creation on authentication
# This feature added by GitHub user @mwilbz
c.SAMLAuthenticator.create_system_users = False

# Change nameid format to something else
# This feature added by GitHub user @killerwhile
c.SAMLAuthenticator.nameid_format = 'urn:oasis:names:tc:SAML:2.0:nameid-format:persistent'

# Change the binary called to create users
# This feature added by GitHub user @killerwhile
# If the new_useradd binary isn't on the path, a full path can be provided
c.SAMLAuthenticator.create_system_user_binary = '/full/path/to/new_useradd'
# If the new_useradd binary is on the path, we can use the first-found instance
c.SAMLAuthenticator.create_system_user_binary = 'new_useradd'


## Developing and Contributing

Get the code and create a virtual environment.


git clone {git@git-source}
cd samlauthenticator
virtualenv --python=python3.6 venv


Start the virtual environment and install dependencies


source venv/bin/activate
pip install -r requirements.txt
pip install -r test_requirements.txt


Make sure that unit tests run on your system and complete successfully.


pytest --cov=samlauthenticator --cov-report term-missing

The output should be something like this:

============================= test session starts ==============================
collected 59 items

tests/test_authenticator.py ............................................ [ 97%]
.                                                                        [100%]

Name                                     Stmts   Miss  Cover   Missing
----------------------------------------------------------------------
samlauthenticator/__init__.py                1      0   100%
samlauthenticator/samlauthenticator.py     241      2    99%   332, 440
----------------------------------------------------------------------
TOTAL                                      242      2    99%
========================== 59 passed in 1.13 seconds ===========================


Make your change, write your unit tests, then send a pull request. The Pull Request text MUST contain the Developer Certificate of Origin, which _should be_ prepopulated in the pull request text. Please note that the developer MUST sign off on the Pull Request and the developer MUST provide their full legal name and email address.

A redacted version of your .pypirc file

[distutils]
  index-servers =
    ...
    test-jh-sa

...
[test-jh-sa]
  repository = https://upload.testpypi.org/legacy

Anything else you'd like to mention?

No response

@distortedsignal distortedsignal added the support Users asking for help using twine label Jan 22, 2025
@dnicolodi
Copy link
Contributor

This is due to a combination of the newest twine with an oldish packaging and a build backend that produces invalid metadata. Twine has a work-around for the invalid metadata, but it kicks in only with a newer packaging. I'll fix this. In the meanwhile, yuo can upgrade packaging: pip install -U packaging.

dnicolodi added a commit to dnicolodi/twine that referenced this issue Jan 22, 2025
dnicolodi added a commit to dnicolodi/twine that referenced this issue Jan 22, 2025
dnicolodi added a commit to dnicolodi/twine that referenced this issue Jan 22, 2025
dnicolodi added a commit to dnicolodi/twine that referenced this issue Jan 22, 2025
@dnicolodi
Copy link
Contributor

By the way, how do these two fileds end up being declared as dynamic?

> Dynamic: author
> Dynamic: home-page

I don't thing this is right.

@dnicolodi
Copy link
Contributor

This package is built with setuptools, thus another, maybe better, work-around, if this is not about uploading an existing release, is to configure setuptools to do not emit the incorrect License-File metadata field, see pypa/setuptools#4759. Adding

[tool.setuptools]
license-files = []

should fix the generated metadata and make it compliant with metadara standard version 2.2.

@ricardogaspar2
Copy link

@dnicolodi I am also facing this issue. is the setuptools config the best workaround so far?
Or is it best to roll back the version of twine?
Do you know when will there be a fixed version? (since this seems a breaking change, the minor release doesn't make much sense at the moment)

@dnicolodi
Copy link
Contributor

I am also facing this issue. is the setuptools config the best workaround so far?

setuptools is generating invalid metadata, thus changing its configuration for avoiding that is a very good idea, regardless of this issue with twine.

Do you know when will there be a fixed version?

No, but I have prepared a fix, thus, most likely, very soon.

(since this seems a breaking change, the minor release doesn't make much sense at the moment)

I would tend to agree, but I am not the one to decide.

@dnicolodi
Copy link
Contributor

@ricardogaspar2 That said, the easiest fix is to upgrade packaging.

@distortedsignal
Copy link
Author

WOW it looks like there was a lot of action on this in the last couple hours - let me try to collect my thoughts on this into one message to (maybe) make it easier to consume for y'all.

> By the way, how do these two fileds end up being declared as dynamic?

> Dynamic: author
> Dynamic: home-page

> I don't thing this is right.

I'm, like, super-bonus-new to Python packaging, and I was following this guide and mostly just guessing what should go where. I don't think that guide is very accurate, but this looked like the best place to ask the question. My pyproject.toml looks like this:

pyproject.toml
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"
[project]
name = "jupyterhub-samlauthenticator"
version = "0.0.10"
authors = [
  { name="Tom Kelley", email="[email protected]" },
]
description = "SAML Authenticator for JupyterHub"
readme = "README.md"
requires-python = ">=3.6"
classifiers = [
    "Programming Language :: Python :: 3",
    "Operating System :: OS Independent",
]
license = { text = "MIT" }

[project.urls]
Homepage = "https://github.com/Hewlett-Packard/jupyterhub-samlauthenticator"
Issues = "https://github.com/Hewlett-Packard/jupyterhub-samlauthenticator/issues"

I assume there's something in there that's mucking up the fields.

This package is built with setuptools, thus another, maybe better, work-around...

Yeah - let me try a couple things here. I'll keep you posted on if something works.

@distortedsignal
Copy link
Author

This package is built with setuptools, thus another, maybe better, work-around...

Yeah - let me try a couple things here. I'll keep you posted on if something works.

Ok, so after adding the tools.setuptools section to the pyproject.toml, this seems to work. Here's a link to the current pyproject.toml file. I'll mark this as closed since I can now upload to PyPI with my setup.

Thanks for the help!

@ricardogaspar2
Copy link

This package is built with setuptools, thus another, maybe better, work-around...

Yeah - let me try a couple things here. I'll keep you posted on if something works.

Ok, so after adding the tools.setuptools section to the pyproject.toml, this seems to work. Here's a link to the current pyproject.toml file. I'll mark this as closed since I can now upload to PyPI with my setup.

Thanks for the help!

Hi there @distortedsignal.
With the version 6.1.0 I was having the issue despite having the license file set. Is there any issue while pointing to a file?
See part of my config under the project section ( I have a file named LICENSE):

readme = "README.md"
license = { file="LICENSE" }
requires-python = ">=3.8"
classifiers = [
    "Programming Language :: Python :: 3",
    "License :: OSI Approved :: MIT License",
    "Operating System :: OS Independent",
]

@dnicolodi
Copy link
Contributor

I'll summarize the content above because the last comments in the issue seems to have missed the point.

The core issue is that setuptools (and other build backends) generates invalid metadata. twine 6.1.0 carries a mitigation for that, however, there is a bug in the mitigation for which it works only when twine is using packaging 24.2 or later (there is no later version at the time of writing).

The workaround is to install twine 6.1.0 with packaging 24.2. A simple python -m pip install -U packaging should be enough.

It is also possible to instruct setutools to do not generate the invalid metadata field, in this case the fact that the mitigation in twine for the invalid metadata field does not work become irrelevant. This can be accomplished with this setting in pyproject.toml

[tool.setuptools]
license-files = []

Any other setting is not going to affect the generated metadata in any way relevant to this issue.

Both of the solutions above work and I have tested them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
support Users asking for help using twine
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants