Any reason valid links to pdf files might raise false alarms #105

markcmiller86 · 2024-02-01T00:44:48Z

I am getting false positives both of which have to do with .pdf files, https://github.com/betterscientificsoftware/bssw.io/actions/runs/7734371239/job/21088244489?pr=1633

Any reason to suspect the checker has trouble with .pdf files?

The text was updated successfully, but these errors were encountered:

vsoch · 2024-02-01T01:19:02Z

A pdf file is not a text machine readable file, so you should not ask the checker to parse it (or add to ignore).

markcmiller86 · 2024-02-01T02:04:30Z

Hmm...did you follow the link to the failed tests? I am not using it to check links in pdf files. It is failing on links to PDF files which I can browse to fine.

vsoch · 2024-02-01T03:02:11Z

My apologies - I did not! It looks like it has nothing to do with the PDF files, those servers have bad certificates:

HTTPSConnectionPool(host='iitj.ac.in', port=443): Max retries exceeded with url: /uploaded_docs/cc/HPC_training/mcmuserguide.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)')))

You can reproduce with two lines of python:

import requests
requests.get('https://iitj.ac.in/uploaded_docs/cc/HPC_training/mcmuserguide.pdf')

You can ask the webmasters to update their certs, and if you don't have that control, you'll have to add them to the skiplist.

markcmiller86 · 2024-02-01T03:18:04Z

You can ask the webmasters to update their certs, and if you don't have that control, you'll have to add them to the skiplist.

Any chance you'd be willing to add a feature to ignore bad certs? (maybe even make it the default). Its a common scenario and asking people to populate skip lists for such a common scenario seems onerous. And, its confusing why my browser is able to follow links fine that the urlchecker action deems "broken".

vsoch · 2024-02-01T03:22:15Z

The browser, and actually depending on the browser, does a lot of wonky things to "just make the page load." If you use command line / core tools that enforce best practices to check certificates, you tend to see the truth. And actually, we go to some lengths to try to emulate a web driver, but it's not perfect.

We can definitely consider that feature. You'll still have the timeout issue on the second PDF, however.

vsoch · 2024-02-01T03:39:09Z

okay, I have a branch for you to test, will post shortly. Note that this is an action for urlchecker-python, so you can run the tool manually on your directory to check.

vsoch · 2024-02-01T03:42:20Z

Here you go! Please test this out locally, and let me know if the new option works.

urlstechie/urlchecker-python#89

urlchecker check  --branch master --no-check-certs --no-print --verbose --file-types .md --exclude-patterns http://localhost:4000,[https://preview.bssw.io,https://github.com/](https://preview.bssw.io,https//github.com/)<your-github-handle> --retry-count 3 --timeout 10 --files .github/workflows/check-urls.yml,.github/workflows/README.md,Articles/Blog/2020-01-usrse.md,Articles/Blog/2020-11-PSIP4HDF5.md,Articles/Blog/2021-09-CollegevilleReportDay1.md,Articles/Blog/2021-12-sc21-swe-cse-bof.md,Articles/Blog/ConnectingSoftwareDevelopers.md,Articles/Blog/Covid19WorkstationCleanliness.md,Articles/Blog/HowToEnablePerformancePortability.md,Articles/Blog/HowToWriteGoodDocumentation.md,Articles/Blog/URSSI.md,CuratedContent/GoodEnoughPracticesInScientificComputing.md,CuratedContent/LanguageReferenceOnLine.md,CuratedContent/TeamOfTeamsUNPUB.md,CuratedContent/kitchen-sink-TEST.md,Site/BSSwFellowshipProgram/People/2020-F-Eisty.md .

markcmiller86 · 2024-02-01T03:53:47Z

@vsoch thanks so much!

Lemme give this a try.

vsoch · 2024-02-01T03:56:26Z

Thank you!!

Heads up I'm breaking for dinner, but will be back later.

markcmiller86 · 2024-02-01T04:18:27Z

Am running into ssl version issues...

(myenv) sh-3.2$ urlchecker check ../bssw.io/CuratedContent/LanguageReferenceOnLine.md 
/Users/miller86/ideas-ecp/urlchecker-python/myenv/lib/python3.8/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020

vsoch · 2024-02-01T04:52:02Z

You need to add the --no-check-certs flag as I showed above. I'm not going to take off checking by default, it's a "use at your own risk" feature.

markcmiller86 · 2024-02-01T05:16:54Z

I am using that flag though the command and error I pasted didn't include it. Its an issue with macOS ssl, python and virtual env.

vsoch · 2024-02-01T05:25:04Z

I don't have a Mac that I use for programming, but I'd follow that GitHub link and see if you can track down the issue. This is unrelated to urlchecker and the PR - it seems like it's an issue with the Python/ssl versions on your system.

markcmiller86 · 2024-02-01T05:44:23Z

Ok, well it might help if I was in the correct branch of the clone. I've done that now. And, I built a docker ubuntu container..., but strangley, I am getting cert errors...

# urlchecker check  --branch master --no-check-certs --no-print --verbose --file-types .md /tmp
           original path: /tmp
              final path: /tmp
               subfolder: None
                  branch: master
                 cleanup: False
                  serial: False
              file types: ['.md']
                   files: []
               print all: False
                 verbose: True
           urls excluded: []
   url patterns excluded: []
  file patterns excluded: []
              force pass: False
             retry count: 2
                    save: None
                 timeout: 5
https://yaml.org/spec/1.2.2/
https://docs.python.org/dev/reference/
https://en.wikipedia.org/wiki/POSIX
HTTPSConnectionPool(host='iitj.ac.in', port=443): Max retries exceeded with url: /uploaded_docs/cc/HPC_training/mcmuserguide.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)')))
https://iitj.ac.in/uploaded_docs/cc/HPC_training/mcmuserguide.pdf
HTTPSConnectionPool(host='iitj.ac.in', port=443): Max retries exceeded with url: /uploaded_docs/cc/HPC_training/mcmuserguide.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)')))
https://iitj.ac.in/uploaded_docs/cc/HPC_training/mcmuserguide.pdf
https://docs.python.org/3/reference/
https://docs.microsoft.com/en-us/cpp/standard-library/cpp-standard-library-reference?view=msvc-170
https://www.open-mpi.org/doc/v4.0/
https://wg5-fortran.org/N1151-N1200/N1191.pdf
https://chapel-lang.org/docs/language/spec/index.html

markcmiller86 · 2024-02-01T05:53:36Z

Ok, so I think --branch needs to be set to add-skip-check-certs, right? Well, that still isn't working though...

# urlchecker check --branch add-skip-check-certs --no-check-certs --no-print --verbose --file-types .md /tmp
           original path: /tmp
              final path: /tmp
               subfolder: None
                  branch: add-skip-check-certs
                 cleanup: False
                  serial: False
              file types: ['.md']
                   files: []
               print all: False
                 verbose: True
           urls excluded: []
   url patterns excluded: []
  file patterns excluded: []
              force pass: False
             retry count: 2
                    save: None
                 timeout: 5
https://zsh.sourceforge.io/Guide/zshguide.html
https://parallel-netcdf.github.io/wiki/Documentation.html
https://www.open-std.org/jtc1/sc22/WG14/www/docs/n1570.pdf
https://docs.python.org/3.10/extending/extending.html
https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf
https://julialang.org/blog/2019/07/multithreading/
https://en.wikipedia.org/wiki/C_standard_library#Implementations
https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2020/n4849.pdf
https://access.redhat.com/articles/5594481
https://en.wikipedia.org/wiki/Data_parallelism
https://cplusplus.com/reference/multithreading/
https://docs.oracle.com/cd/E19048-01/chorus4/806-3328/6jcg1bm05/index.html
https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top.html
https://hpx-docs.stellar-group.org/latest/html/index.html
https://kokkos.org/kokkos-core-wiki/
https://gcc.gnu.org/onlinedocs/cpp/
https://www.openmp.org/wp-content/uploads/openmp-4.5.pdf
https://www.lrde.epita.fr/~adl/autotools.html
https://web.archive.org/web/20070205092427/http://www.fortran.com/fortran/F77_std/rjcnf0001.html
https://support.hpe.com/hpesc/public/docDisplay?docId=a00115116en_us&docLocale=en_US&page=The_Cray_Compiling_Environment.html
https://www.mpi-forum.org/docs/mpi-1.3/mpi-report-1.3-2008-05-30.pdf
https://clang.llvm.org
https://www.intel.com/content/www/us/en/develop/documentation/fortran-compiler-oneapi-dev-guide-and-reference/top/language-reference.html
https://www.intel.com/content/www/us/en/develop/documentation/fortran-compiler-oneapi-dev-guide-and-reference/top/language-reference.html
https://svnbook.red-bean.com
http://port70.net/~nsz/c/c89/c89-draft.html
https://docs.globus.org/cli/
https://support.nag.com/nagware/np/r71_doc/compiler.pdf
https://legion.stanford.edu/pdfs/legion-manual.pdf
https://gcc.gnu.org/onlinedocs/libc/
https://developer.download.nvidia.com/compute/DevZone/docs/html/OpenCL/doc/OpenCL_Programming_Guide.pdf
https://github.com/markcmiller86
https://docs.oracle.com/cd/E36784_01/html/E36870/ksh-1.html
https://www.openacc.org/sites/default/files/inline-files/openacc-guide.pdf
https://www.gnu.org/software/libc/manual/html_mono/libc.html#I_002fO-Overview
https://www.mpi-forum.org/
https://docs.microsoft.com/en-us/cpp/preprocessor/c-cpp-preprocessor-reference?view=msvc-170
https://www.latex-project.org/help/documentation/
https://libc.llvm.org/
https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf
https://docs.python.org/2/reference/
https://docs.python.org/2.7/extending/extending.html
https://www.mpich.org/static/docs/v1.5.x/
https://spack.readthedocs.io/en/latest/
https://www.open-mpi.org/doc/v3.1/
https://www.extremetech.com/extreme/289423-it-took-half-a-ton-of-hard-drives-to-store-eht-black-hole-image-data
https://www.khronos.org/sycl/resources
https://en.wikipedia.org/wiki/Distributed_memory
https://docs.readthedocs.io/en/stable/tutorial/
https://www.json.org/json-en.html
https://hpc.pnl.gov/globalarrays/documentation.shtml
https://docs.microsoft.com/en-us/cpp/standard-library/cpp-standard-library-reference?view=msvc-170
https://yaml.org/spec/1.2.2/
https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_API.html
https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf?#page=683
https://en.wikipedia.org/wiki/Virtual_private_network
https://docs.python.org/3/reference/
https://github.com/KhronosGroup/OpenCL-Guide
http://www.lahey.com/docs/LangRefEXP73_revG05.pdf
https://cgns.github.io/CGNS_docs_current/user/index.html
https://docs.hdfgroup.org/hdf5/v1_12/index.html
https://docs.hdfgroup.org/hdf5/v1_12/_r_m.html
https://docs.daos.io/v2.2/user/workflow/
https://cplusplus.com/reference/clibrary/
https://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf
https://support.hpe.com/hpesc/public/docDisplay?docId=a00115296en_us&page=About_the_Cray_Fortran_Reference_Manual.html
https://thrust.github.io/doc/modules.html
https://wg5-fortran.org/N1601-N1650/N1601.pdf
https://learn.microsoft.com/en-us/cpp/c-runtime-library/c-run-time-library-reference?view=msvc-170
https://www.mpich.org/static/docs/v3.4.x/
https://www.mpich.org/
https://www.gnu.org/software/make/manual/make.html
https://linux.die.net/man/1/tcsh
https://www.ibm.com/support/pages/system/files/support/swg/swgdocs.nsf/0/7e46ea600b6646d0852579dc00331978/$FILE/langref.pdf
https://j3-fortran.org/doc/year/18/18-007r1.pdf
https://hpc-tutorials.llnl.gov/posix/AppendixA/
https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf
https://visit-dav.github.io/visit-website/
https://github.com/RadeonOpenCompute/ROCm/raw/rocm-4.5.2/AMD_HIP_Programming_Guide.pdf
https://llnl-conduit.readthedocs.io/en/latest/blueprint.html
https://clang.llvm.org/cxx_status.html
https://learn.microsoft.com/en-us/cpp/c-runtime-library/run-time-routines-by-category?view=msvc-170
https://www.w3.org/TR/xml/
https://docs.python.org/dev/reference/
https://gcc.gnu.org/onlinedocs/libstdc++/
https://www.intel.com/content/www/us/en/develop/documentation/iocl_rt_ref/top.html
https://www.intel.com/content/www/us/en/develop/documentation/iocl_rt_ref/top.html
https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf
https://www.doxygen.nl/manual/
https://en.wikipedia.org/wiki/POSIX
https://www.pgroup.com/resources/docs/17.10/x86/fortran-ref-guide/index.htm
https://www.ibm.com/docs/en/STXKQY_5.1.5/pdf/scale_cpr.pdf
https://github.com/python/cpython
https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html
https://wg5-fortran.org/N1151-N1200/N1191.pdf
https://doc.lustre.org/lustre_manual.xhtml#file_striping.lfs_setstripe
https://www.computerhope.com/unix/scp.htm
https://docs.unidata.ucar.edu/nug/current/
https://en.wikipedia.org/wiki/Reference_implementation
https://man7.org/linux/man-pages/man1/make.1p.html
https://www.ibm.com/docs/en/i/7.3?topic=c-ile-cc-runtime-library-functions
https://www.gnu.org/software/bash/manual/bash.html
https://www.ibm.com/docs/en/ssw_ibm_i_71/rzarg/sc097852.pdf
https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html
https://en.cppreference.com/w/cpp/experimental/parallelism
https://open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf
https://docs.microsoft.com/en-us/cpp/cpp/cpp-language-reference?view=msvc-170
https://www.hdfgroup.org/2017/03/mif-parallel-io-with-hdf5/
https://docs.github.com/en/pages/getting-started-with-github-pages/about-github-pages
https://google.github.io/googletest/
https://www.amd.com/content/dam/amd/en/documents/developer/version-4-1-documents/aocc/aocc-4.1-user-guide.pdf
https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2013/n3797.pdf
https://libcxx.llvm.org/
https://raja.readthedocs.io/en/develop/sphinx/user_guide/index.html
https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2001/n1316/
https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2001/n1316/
https://docs.microsoft.com/en-us/cpp/c-language/c-language-reference?view=msvc-170
https://www.markdownguide.org/tools/github-pages/
HTTPSConnectionPool(host='iitj.ac.in', port=443): Max retries exceeded with url: /uploaded_docs/cc/HPC_training/mcmuserguide.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)')))
https://iitj.ac.in/uploaded_docs/cc/HPC_training/mcmuserguide.pdf
HTTPSConnectionPool(host='iitj.ac.in', port=443): Max retries exceeded with url: /uploaded_docs/cc/HPC_training/mcmuserguide.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)')))
https://iitj.ac.in/uploaded_docs/cc/HPC_training/mcmuserguide.pdf
https://gcc.gnu.org/onlinedocs/cpp/Pragmas.html
https://j3-fortran.org/doc/year/10/10-007r1.pdf
https://chapel-lang.org/docs/language/spec/index.html
https://www.open-mpi.org/doc/v2.1/
https://numpy.org/doc/stable/reference/index.html#reference
https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.pdf
https://man.openbsd.org/ssh
https://docutils.sourceforge.io/rst.html
https://www.cplusplus.com/reference/
https://www.open-mpi.org/doc/v4.0/
https://cmake.org/cmake/help/latest/
https://adios2.readthedocs.io/en/latest/
https://docs.python.org/3.8/library/
https://git-scm.com/docs/user-manual
https://www.open-mpi.org/doc/v4.1/
https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2017/n4659.pdf
https://docs.gitlab.com
https://wg5-fortran.org/N001-N1100/N692.pdf
https://charm.readthedocs.io/en/latest/charm++/manual.html
https://spec.oneapi.io/versions/latest/elements/oneTBB/source/nested-index.html
https://docs.python.org/2.7/library/
https://en.wikipedia.org/wiki/C%2B%2B_Standard_Library#Implementations
https://man7.org/linux/man-pages/man2/syscalls.2.html
https://devdocs.io/gnu_fortran/
https://support.google.com/a/users/answer/9282958?hl=en
https://ftp.mcs.anl.gov/pub/cobalt/archive/cobalt-0.95.2-manual.pdf
https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2011/n3242.pdf
https://web.archive.org/web/20181230041359if_/http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf
https://www.mpich.org/static/docs/v4.0.3/
https://hpss-collaboration.org/wp-content/uploads/2023/09/hpss_10.3_users_guide.pdf?#page=9
https://docs.github.com/en
https://cmake.org/cmake/help/latest/manual/ctest.1.html
https://www.openmp.org/wp-content/uploads/OpenMP3.1.pdf
https://en.wikipedia.org/wiki/CPython
https://github.com/fortran-lang/stdlib
https://slurm.schedmd.com
https://www.opengl.org/
https://rocmdocs.amd.com/_/downloads/en/latest/pdf/
https://llnl-conduit.readthedocs.io/en/latest/index.html
https://en.wikipedia.org/wiki/List_of_compilers
https://docs.nvidia.com/cuda/cuda-runtime-api/index.html

🤔 Uh oh... The following urls did not pass:
/tmp/LanguageReferenceOnLine.md:
     ❌️ https://www.intel.com/content/www/us/en/develop/documentation/fortran-compiler-oneapi-dev-guide-and-reference/top/language-reference.html
     ❌️ https://www.intel.com/content/www/us/en/develop/documentation/iocl_rt_ref/top.html
     ❌️ https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2001/n1316/
     ❌️ https://iitj.ac.in/uploaded_docs/cc/HPC_training/mcmuserguide.pdf

vsoch · 2024-02-01T05:58:28Z

I would leave out branch and just run with --no-check-certs, and ensure the urlchecker "executable" is installed from the branch you cloned. Branch is only when you are cloning something, not when you have files locally. Do a urlchecker --help | grep no-check-certs to ensure you are hitting the right one.

markcmiller86 · 2024-02-01T06:46:58Z

I don't have confidence I am running the correct version in my docker container. I am still messing with it.

vsoch · 2024-02-01T07:34:35Z

Let me know if you want some help to write a Dockerfile for it.

markcmiller86 · 2024-02-01T20:57:35Z

Ok, I am quite confident I've got it installed and am using the correct branch/version of urlchecker and I am not able to get it to work...

# which urlchecker
/usr/local/bin/urlchecker
# urlchecker --help
usage: urlchecker [-h] [--version] {version,check} ...

urlchecker python

options:
  -h, --help       show this help message and exit
  --version        suppress additional output.

actions:
  actions for urlchecker

  {version,check}  urlchecker python actions
    version        show software version
    check          check urls in static files (documentation or code)
# urlchecker --version
0.0.35
# urlchecker check /tmp
           original path: /tmp
              final path: /tmp
               subfolder: None
                  branch: main
                 cleanup: False
                  serial: False
              file types: ['.md', '.py']
                   files: []
               print all: True
                 verbose: False
           urls excluded: []
   url patterns excluded: []
  file patterns excluded: []
          no check certs: False
              force pass: False
             retry count: 2
                    save: None
                 timeout: 5
2024-02-01 20:51:39,908 - urlchecker - ERROR - Error running task
🤔 There were no URLs to check.


🤷. No urls were collected.
# ls /tmp
LanguageReferenceOnLine.md
# cat /tmp/LanguageReferenceOnLine.md | grep http
#### Contributed by [Mark C. Miller](https://github.com/markcmiller86 "Mark C. Miller GitHub Profile")
Instead, they rely solely on a [*reference implementation*](https://en.wikipedia.org/wiki/Reference_implementation).
Python's reference implementation is [CPython](https://en.wikipedia.org/wiki/CPython).
The *implementation* of a programming language is typically embodied in a [compiler](https://en.wikipedia.org/wiki/List_of_compilers) or, for interpretive languages like Python (or Basic), an *interpreter*.
[POSIX](https://en.wikipedia.org/wiki/POSIX) compliance was introduced in the 1990's to address this not only for the C standard library but also for many other aspects of how programs and humans (e.g. command-line *shells*) interact with an operating system.
For example, the GNU compiler collection (GCC) often supports a number of [language *extensions*](https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html) some of which eventually make their way into the formal language standard.
The [VisIt](https://visit-dav.github.io/visit-website/) project decided to permit C++11 constructs (specific to the 2011 C++ standard) into the code base only in 2018, a full 7 years after the language standard had been released.
Nonetheless, one critical differentiator is [shared memory vs. distributed memory](https://en.wikipedia.org/wiki/Distributed_memory) parallelism.
Another critical differentiator is whether parallelism manifests as the same computational task running simultaneously everywhere except on different data (e.g. [Data parallelism](https://en.wikipedia.org/wiki/Data_parallelism)) or something more generalized than this where computational tasks which can be wholly disparate are queued and divvied out to resources as they become available (e.g. Task parallelism).
The canonical example of an API that is managed in this way is the [Message Passing Interface (MPI)](https://www.mpi-forum.org/).
Another example is [OpenGL](https://www.opengl.org/), a graphics programming API (the *L* in OpenGL stands for *Library* but many often treat it as thought it stands for *Language*).
[MPICH](https://www.mpich.org/) serves as a *reference* implementation of MPI.
[3]: #a3 "The most formal resource for Python is the [language reference](https://docs.python.org/dev/reference/) and the *reference* implementation, [CPython](https://github.com/python/cpython)"
<a name="a3"></a><sup>3</sup>The most formal resource for Python is the *reference* implementation, [CPython](https://en.wikipedia.org/wiki/CPython)<br>
<a name="a4"></a><sup>4</sup>CPP is sometimes used to process other kinds of text files including those of other languages. CPP [`#pragma`](https://gcc.gnu.org/onlinedocs/cpp/Pragmas.html) directives are a common way for compiler vendors to extend the language.<br>
<a name="a7"></a><sup>7</sup>*USPSnet* is wordplay for sending physical storage media through the US Mail. Another name is *FootNet*. Sometimes, its the [best way](https://www.extremetech.com/extreme/289423-it-took-half-a-ton-of-hard-drives-to-store-eht-black-hole-image-data) to move a lot of data.
[c89-spec]: http://port70.net/~nsz/c/c89/c89-draft.html
[c99-spec]: https://open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf
[c11-spec]: https://www.open-std.org/jtc1/sc22/WG14/www/docs/n1570.pdf
[c18-spec]: https://web.archive.org/web/20181230041359if_/http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf
[c++03-spec]: https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2001/n1316/
[c++11-spec]: https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2011/n3242.pdf
[c++14-spec]: https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2013/n3797.pdf
[c++17-spec]: https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2017/n4659.pdf
[c++20-spec]: https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2020/n4849.pdf
[f77-spec]: https://web.archive.org/web/20070205092427/http://www.fortran.com/fortran/F77_std/rjcnf0001.html
[f90-spec]: https://wg5-fortran.org/N001-N1100/N692.pdf
[f95-spec]: https://wg5-fortran.org/N1151-N1200/N1191.pdf
[f03-spec]: https://wg5-fortran.org/N1601-N1650/N1601.pdf
[f08-spec]: https://j3-fortran.org/doc/year/10/10-007r1.pdf
[f18-spec]: https://j3-fortran.org/doc/year/18/18-007r1.pdf
[ocl1.2-spec]: https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf
[ocl2.2-spec]: https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_API.html
[ocl3.0-spec]: https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html
[py2-spec]: https://docs.python.org/2/reference/
[py3-spec]: https://docs.python.org/3/reference/
[cpp-gnu]: https://gcc.gnu.org/onlinedocs/cpp/
[cpp-ms]: https://docs.microsoft.com/en-us/cpp/preprocessor/c-cpp-preprocessor-reference?view=msvc-170
[c-gnu]: https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.pdf
[c-cray]: https://support.hpe.com/hpesc/public/docDisplay?docId=a00115116en_us&docLocale=en_US&page=The_Cray_Compiling_Environment.html
[c-ibm]: https://www.ibm.com/docs/en/ssw_ibm_i_71/rzarg/sc097852.pdf
[c-ms]: https://docs.microsoft.com/en-us/cpp/c-language/c-language-reference?view=msvc-170
[c-clang]: https://clang.llvm.org
[c-amd]: https://www.amd.com/content/dam/amd/en/documents/developer/version-4-1-documents/aocc/aocc-4.1-user-guide.pdf
[c++-intel]: https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top.html
[c++-cray]: https://support.hpe.com/hpesc/public/docDisplay?docId=a00115116en_us&docLocale=en_US&page=The_Cray_Compiling_Environment.html
[c++-ibm]: https://www.ibm.com/docs/en/ssw_ibm_i_71/rzarg/sc097852.pdf
[c++-ms]: https://docs.microsoft.com/en-us/cpp/cpp/cpp-language-reference?view=msvc-170
[c++-amd]: https://www.amd.com/content/dam/amd/en/documents/developer/version-4-1-documents/aocc/aocc-4.1-user-guide.pdf 
[c++-clang]: https://clang.llvm.org/cxx_status.html
[f-pg]: https://www.pgroup.com/resources/docs/17.10/x86/fortran-ref-guide/index.htm "Portland Group Compilers"
[f-lf]: http://www.lahey.com/docs/LangRefEXP73_revG05.pdf "Lahey/Fujitsu Fortran 95"
[f-intel]: https://www.intel.com/content/www/us/en/develop/documentation/fortran-compiler-oneapi-dev-guide-and-reference/top/language-reference.html "All Fortran standards 90-18"
[f-cray]: https://support.hpe.com/hpesc/public/docDisplay?docId=a00115296en_us&page=About_the_Cray_Fortran_Reference_Manual.html
[f-ibm]: https://www.ibm.com/support/pages/system/files/support/swg/swgdocs.nsf/0/7e46ea600b6646d0852579dc00331978/$FILE/langref.pdf
[f-nag]: https://support.nag.com/nagware/np/r71_doc/compiler.pdf
[f-gnu]: https://devdocs.io/gnu_fortran/
[opencl-amd]: https://github.com/KhronosGroup/OpenCL-Guide
[opencl-intel]: https://www.intel.com/content/www/us/en/develop/documentation/iocl_rt_ref/top.html
[opencl-nvidia]: https://developer.download.nvidia.com/compute/DevZone/docs/html/OpenCL/doc/OpenCL_Programming_Guide.pdf
[py2]: https://docs.python.org/2/reference/
[py3]: https://docs.python.org/3/reference/
[c-stdlib-0]: https://cplusplus.com/reference/clibrary/
[c++-stdlib-0]: https://www.cplusplus.com/reference/
[c-stdlib-gnu]: https://gcc.gnu.org/onlinedocs/libc/
[c++-stdlib-gnu]: https://gcc.gnu.org/onlinedocs/libstdc++/
[c-stdlib-llvm]: https://libc.llvm.org/
[c++-stdlib-llvm]: https://libcxx.llvm.org/
[c-stdlib-ms]: https://learn.microsoft.com/en-us/cpp/c-runtime-library/c-run-time-library-reference?view=msvc-170
[c++-stdlib-ms]: https://docs.microsoft.com/en-us/cpp/standard-library/cpp-standard-library-reference?view=msvc-170
[c-stdlib-ibm]: https://www.ibm.com/docs/en/i/7.3?topic=c-ile-cc-runtime-library-functions
[c++-stdlib-ibm]: https://www.ibm.com/docs/en/i/7.3?topic=c-ile-cc-runtime-library-functions
[py-stdlib-2]: https://docs.python.org/2.7/library/
[py-stdlib-3]: https://docs.python.org/3.8/library/
[f-stdlib-0.2.1]: https://github.com/fortran-lang/stdlib
[imp-stdlib-c]: https://en.wikipedia.org/wiki/C_standard_library#Implementations
[imp-stdlib-c++]: https://en.wikipedia.org/wiki/C%2B%2B_Standard_Library#Implementations
[smpar-pthreads]: https://hpc-tutorials.llnl.gov/posix/AppendixA/
[smpar-tbb]: https://spec.oneapi.io/versions/latest/elements/oneTBB/source/nested-index.html
[smpar-c++mt]: https://cplusplus.com/reference/multithreading/
[smpar-cuda]: https://docs.nvidia.com/cuda/cuda-runtime-api/index.html
[smpar-hip]: https://github.com/RadeonOpenCompute/ROCm/raw/rocm-4.5.2/AMD_HIP_Programming_Guide.pdf
[smpar-omp-3.1]: https://www.openmp.org/wp-content/uploads/OpenMP3.1.pdf
[smpar-omp-4.5]: https://www.openmp.org/wp-content/uploads/openmp-4.5.pdf
[smpar-omp-5.2]: https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf
[smpar-openacc]: https://www.openacc.org/sites/default/files/inline-files/openacc-guide.pdf
[dmpar-mpi-1.3]: https://www.mpi-forum.org/docs/mpi-1.3/mpi-report-1.3-2008-05-30.pdf
[dmpar-mpi-2.2]: https://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf
[dmpar-mpi-3.1]: https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf
[dmpar-mpi-4.0]: https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf
[dmpar-mpich-1.5]: https://www.mpich.org/static/docs/v1.5.x/
[dmpar-mpich-3.4]: https://www.mpich.org/static/docs/v3.4.x/
[dmpar-mpich-4.0.3]: https://www.mpich.org/static/docs/v4.0.3/
[dmpar-ompi-4.1]: https://www.open-mpi.org/doc/v4.1/
[dmpar-ompi-4.0]: https://www.open-mpi.org/doc/v4.0/
[dmpar-ompi-3.1]: https://www.open-mpi.org/doc/v3.1/
[dmpar-ompi-2.1]: https://www.open-mpi.org/doc/v2.1/
[pparc-stl]: https://en.cppreference.com/w/cpp/experimental/parallelism
[pparc-hpx]: https://hpx-docs.stellar-group.org/latest/html/index.html
[pparc-thrust]: https://thrust.github.io/doc/modules.html
[pparc-raja]: https://raja.readthedocs.io/en/develop/sphinx/user_guide/index.html
[pparc-sycl]: https://www.khronos.org/sycl/resources
[pparc-rocm]: https://rocmdocs.amd.com/_/downloads/en/latest/pdf/
[ppard-kokkos]: https://kokkos.org/kokkos-core-wiki/
[ppard-ga]: https://hpc.pnl.gov/globalarrays/documentation.shtml
[ppard-legion]: https://legion.stanford.edu/pdfs/legion-manual.pdf
[ppard-charm++]: https://charm.readthedocs.io/en/latest/charm++/manual.html
[ppard-chapel]: https://chapel-lang.org/docs/language/spec/index.html
[ppard-julia]: https://julialang.org/blog/2019/07/multithreading/
[api-pyc-2]: https://docs.python.org/2.7/extending/extending.html 
[api-pyc-3]: https://docs.python.org/3.10/extending/extending.html
[api-py-numpy]: https://numpy.org/doc/stable/reference/index.html#reference
[api-sys-linux]: https://man7.org/linux/man-pages/man2/syscalls.2.html
[api-sys-posix]: https://docs.oracle.com/cd/E19048-01/chorus4/806-3328/6jcg1bm05/index.html
[api-sys-windows]: https://learn.microsoft.com/en-us/cpp/c-runtime-library/run-time-routines-by-category?view=msvc-170
[api-mifio]: https://www.hdfgroup.org/2017/03/mif-parallel-io-with-hdf5/
[api-posixio]: https://www.gnu.org/software/libc/manual/html_mono/libc.html#I_002fO-Overview
[api-hdf5-1.12]: https://docs.hdfgroup.org/hdf5/v1_12/index.html
[api-lustre]: https://doc.lustre.org/lustre_manual.xhtml#file_striping.lfs_setstripe
[api-gpfs]: https://www.ibm.com/docs/en/STXKQY_5.1.5/pdf/scale_cpr.pdf
[api-daos]: https://docs.daos.io/v2.2/user/workflow/
[api-adios]: https://adios2.readthedocs.io/en/latest/
[api-pnetcdf]: https://parallel-netcdf.github.io/wiki/Documentation.html
[api-mpiio]: https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf?#page=683
[api-sftp]: https://access.redhat.com/articles/5594481
[api-scp]: https://www.computerhope.com/unix/scp.htm
[api-hpss]: https://hpss-collaboration.org/wp-content/uploads/2023/09/hpss_10.3_users_guide.pdf?#page=9
[api-gdrive]: https://support.google.com/a/users/answer/9282958?hl=en
[api-globus]: https://docs.globus.org/cli/
[api-zsh]: https://zsh.sourceforge.io/Guide/zshguide.html
[api-bash]: https://www.gnu.org/software/bash/manual/bash.html
[api-ksh]: https://docs.oracle.com/cd/E36784_01/html/E36870/ksh-1.html
[api-tcsh]: https://linux.die.net/man/1/tcsh
[api-ssh]: https://man.openbsd.org/ssh
[api-vpn]: https://en.wikipedia.org/wiki/Virtual_private_network
[api-make]: https://man7.org/linux/man-pages/man1/make.1p.html
[api-gmake]: https://www.gnu.org/software/make/manual/make.html
[api-cmake]: https://cmake.org/cmake/help/latest/
[api-spack]: https://spack.readthedocs.io/en/latest/
[api-autotools]: https://www.lrde.epita.fr/~adl/autotools.html
[api-ctest]: https://cmake.org/cmake/help/latest/manual/ctest.1.html
[api-gtest]: https://google.github.io/googletest/
[api-yaml]: https://yaml.org/spec/1.2.2/
[api-json]: https://www.json.org/json-en.html
[api-xml]: https://www.w3.org/TR/xml/
[api-conduit]: https://llnl-conduit.readthedocs.io/en/latest/index.html
[api-hdf5]: https://docs.hdfgroup.org/hdf5/v1_12/_r_m.html
[api-netcdf]: https://docs.unidata.ucar.edu/nug/current/
[api-cgns]: https://cgns.github.io/CGNS_docs_current/user/index.html
[api-blueprint]: https://llnl-conduit.readthedocs.io/en/latest/blueprint.html
[api-latex]: https://www.latex-project.org/help/documentation/
[api-gfm]: https://www.markdownguide.org/tools/github-pages/
[api-rest]: https://docutils.sourceforge.io/rst.html
[api-doxygen]: https://www.doxygen.nl/manual/
[api-rtd]: https://docs.readthedocs.io/en/stable/tutorial/
[api-ghpages]: https://docs.github.com/en/pages/getting-started-with-github-pages/about-github-pages
[api-git]: https://git-scm.com/docs/user-manual
[api-svn]: https://svnbook.red-bean.com
[api-gitlab]: https://docs.gitlab.com
[api-github]: https://docs.github.com/en
[api-slurm]: https://slurm.schedmd.com
[api-cobalt]: https://ftp.mcs.anl.gov/pub/cobalt/archive/cobalt-0.95.2-manual.pdf
[api-moab]: https://iitj.ac.in/uploaded_docs/cc/HPC_training/mcmuserguide.pdf
# urlchecker check  --branch master --no-check-certs --no-print --verbose --file-types .md /tmp
           original path: /tmp
              final path: /tmp
               subfolder: None
                  branch: master
                 cleanup: False
                  serial: False
              file types: ['.md']
                   files: []
               print all: False
                 verbose: True
           urls excluded: []
   url patterns excluded: []
  file patterns excluded: []
          no check certs: True
              force pass: False
             retry count: 2
                    save: None
                 timeout: 5
2024-02-01 20:54:46,353 - urlchecker - ERROR - Error running task
🤔 There were no URLs to check.
# urlchecker check --branch master --no-check-certs --no-print --verbose --file-types .md /tmp/LanguageReferenceOnLine.md
           original path: /tmp/LanguageReferenceOnLine.md
              final path: /tmp/LanguageReferenceOnLine.md
               subfolder: None
                  branch: master
                 cleanup: False
                  serial: False
              file types: ['.md']
                   files: []
               print all: False
                 verbose: True
           urls excluded: []
   url patterns excluded: []
  file patterns excluded: []
          no check certs: True
              force pass: False
             retry count: 2
                    save: None
                 timeout: 5
🤔 There were no URLs to check.


🤷. No urls were collected.

vsoch · 2024-02-01T21:22:42Z

A few things:

no check certs: False should be True (your last run)
--branch should not be set for a local check
You can't target an individual file, just the directory with the files (this is a bug, but it's the current reality)

markcmiller86 · 2024-02-01T21:26:26Z

Right, I gave all scenarios I tried which included with and without --branch. I didn't worry about certs (other than confirming the version I was running is handling that CL arg) because I was never getting any checks to begin with.

All that being said, still not working (could be my container setup). I don't think its doing the task-launch in the loop over files.

# urlchecker check --no-check-certs --no-print --verbose --file-types .md /tmp
           original path: /tmp
              final path: /tmp
               subfolder: None
                  branch: main
                 cleanup: False
                  serial: False
              file types: ['.md']
                   files: []
               print all: False
                 verbose: True
           urls excluded: []
   url patterns excluded: []
  file patterns excluded: []
          no check certs: True
              force pass: False
             retry count: 2
                    save: None
                 timeout: 5
2024-02-01 21:24:39,081 - urlchecker - ERROR - Error running task
🤔 There were no URLs to check.


🤷. No urls were collected.

vsoch · 2024-02-01T21:32:11Z

Try removing file types? I did test it on a directory in tmp with one markdown file and a link (in markdown too, that's important) and it worked, but I since added a raw string and that might have broken it. We also have some bug that the regex is not working as it did before - pinging @SuperKogito he was going to look into that today.

markcmiller86 · 2024-02-01T21:37:30Z

no change...

# ls /tmp
LanguageReferenceOnLine.md
# urlchecker check --no-check-certs --no-print --verbose /tmp
           original path: /tmp
              final path: /tmp
               subfolder: None
                  branch: main
                 cleanup: False
                  serial: False
              file types: ['.md', '.py']
                   files: []
               print all: False
                 verbose: True
           urls excluded: []
   url patterns excluded: []
  file patterns excluded: []
          no check certs: True
              force pass: False
             retry count: 2
                    save: None
                 timeout: 5
2024-02-01 21:36:24,363 - urlchecker - ERROR - Error running task
🤔 There were no URLs to check.


🤷. No urls were collected.

vsoch · 2024-02-01T21:39:45Z

Let me try removing the raw string I added and I'll let you know, repull install and try again.

vsoch · 2024-02-01T21:40:43Z

okay pushed.

markcmiller86 · 2024-02-01T21:45:40Z

Ok, its going now. Getting a ton of error messages...

/usr/local/lib/python3.10/dist-packages/urllib3-2.2.0-py3.10.egg/urllib3/connectionpool.py:1103: InsecureRequestWarning: Unverified HTTPS request is being made to host 'www.khronos.org'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings

Anyway to silence that. I mean, maybe one at the beginning or end would be good...but its echoing on every link. Not urgent. Put it on the todo list.

markcmiller86 · 2024-02-01T21:50:08Z

Ok, that worked...now trying with certs enabled to confirm a difference in behavior.

vsoch · 2024-02-01T21:51:24Z

Yeah no worries about that - this is a non-work, for fun open source project, so I'm good to prioritize based on that! I usually can add comments like this during the day and then actual work during non work hours.

markcmiller86 · 2024-02-02T00:13:02Z

Ok, what I am seeing withOUT --no-check-certs doesn't make sense. Almost all links are failing due to certs. Here are the last bits of output from the run...

.
.
.
https://clang.llvm.org
HTTPSConnectionPool(host='svnbook.red-bean.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://svnbook.red-bean.com
HTTPSConnectionPool(host='svnbook.red-bean.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://svnbook.red-bean.com
https://docs.gitlab.com
https://ftp.mcs.anl.gov/pub/cobalt/archive/cobalt-0.95.2-manual.pdf
HTTPSConnectionPool(host='www.khronos.org', port=443): Max retries exceeded with url: /registry/OpenCL/specs/opencl-1.2.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf
HTTPSConnectionPool(host='www.khronos.org', port=443): Max retries exceeded with url: /registry/OpenCL/specs/opencl-1.2.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf
https://docs.python.org/dev/reference/
HTTPSConnectionPool(host='www.openmp.org', port=443): Max retries exceeded with url: /wp-content/uploads/OpenMP-API-Specification-5-2.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf
HTTPSConnectionPool(host='www.openmp.org', port=443): Max retries exceeded with url: /wp-content/uploads/OpenMP-API-Specification-5-2.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5-2.pdf
HTTPSConnectionPool(host='j3-fortran.org', port=443): Max retries exceeded with url: /doc/year/18/18-007r1.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://j3-fortran.org/doc/year/18/18-007r1.pdf
HTTPSConnectionPool(host='j3-fortran.org', port=443): Max retries exceeded with url: /doc/year/18/18-007r1.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://j3-fortran.org/doc/year/18/18-007r1.pdf
HTTPSConnectionPool(host='web.archive.org', port=443): Max retries exceeded with url: /web/20070205092427/http://www.fortran.com/fortran/F77_std/rjcnf0001.html (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://web.archive.org/web/20070205092427/http://www.fortran.com/fortran/F77_std/rjcnf0001.html
HTTPSConnectionPool(host='web.archive.org', port=443): Max retries exceeded with url: /web/20070205092427/http://www.fortran.com/fortran/F77_std/rjcnf0001.html (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://web.archive.org/web/20070205092427/http://www.fortran.com/fortran/F77_std/rjcnf0001.html
https://github.com/fortran-lang/stdlib
HTTPSConnectionPool(host='www.gnu.org', port=443): Max retries exceeded with url: /software/gnu-c-manual/gnu-c-manual.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.pdf
HTTPSConnectionPool(host='www.gnu.org', port=443): Max retries exceeded with url: /software/gnu-c-manual/gnu-c-manual.pdf (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1007)')))
https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.pdf

🤔 Uh oh... The following urls did not pass:
/tmp/LanguageReferenceOnLine.md:
     ❌️ https://www.openmp.org/wp-content/uploads/OpenMP3.1.pdf
     ❌️ https://adios2.readthedocs.io/en/latest/
     ❌️ https://support.hpe.com/hpesc/public/docDisplay?docId=a00115116en_us&docLocale=en_US&page=The_Cray_Compiling_Environment.html
     ❌️ https://www.gnu.org/software/libc/manual/html_mono/libc.html#I_002fO-Overview
     ❌️ https://www.openmp.org/wp-content/uploads/openmp-4.5.pdf
     ❌️ https://www.json.org/json-en.html
     ❌️ https://www.khronos.org/sycl/resources
     ❌️ https://gcc.gnu.org/onlinedocs/cpp/
     ❌️ https://www.open-mpi.org/doc/v3.1/
     ❌️ https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf
     ❌️ https://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf
     ❌️ https://www.open-mpi.org/doc/v4.0/
     ❌️ https://hpx-docs.stellar-group.org/latest/html/index.html
     ❌️ https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html
     ❌️ https://zsh.sourceforge.io/Guide/zshguide.html
     ❌️ https://slurm.schedmd.com
     ❌️ https://www.computerhope.com/unix/scp.htm
     ❌️ https://www.open-std.org/Jtc1/sc22/WG21/docs/papers/2011/n3242.pdf
     ❌️ https://docs.readthedocs.io/en/stable/tutorial/
     ❌️ https://man7.org/linux/man-pages/man2/syscalls.2.html
     ❌️ https://wg5-fortran.org/N1151-N1200/N1191.pdf
     ❌️ https://libc.llvm.org/
     ❌️ https://www.open-mpi.org/doc/v2.1/
     ❌️ https://legion.stanford.edu/pdfs/legion-manual.pdf
     ❌️ https://en.wikipedia.org/wiki/Reference_implementation
     ❌️ https://en.wikipedia.org/wiki/Virtual_private_network
.
.
.

markcmiller86 · 2024-02-02T00:14:24Z

@vsoch by the way...if you need a proj/task to charge for some time on this, I think I can accomodate. Lemme know.

vsoch · 2024-02-02T00:19:05Z

@markcmiller86 that might be reflecting the setup on your Mac?

I appreciate that, but this project has a FUNDING.yml meaning folks can find it with GitHub sponsors, and is clearly scoped outside of lab work. I have this registered as an outside business agreement and I set a pretty clear line between lab work and these projects, so I don't think that would work.

I'm pretty good at getting stuff done, so I can say I will be able to work on the underlying issues sooner than later, but absolutely not on lab time (I'm taking a quick break and drinking hot chocolate right now). ☕

vsoch · 2024-02-02T00:20:02Z

Also double check you installed ca-certificates in the container, and try using --network=host too. Likely that won't fix it (I am terrible with Macs and know they are terrible with docker) but just a suggestion!

markcmiller86 · 2024-02-02T00:27:50Z

I checked ca-certificates,

# apt-get install ca-certificates
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
ca-certificates is already the newest version (20230311ubuntu0.22.04.1).
ca-certificates set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 19 not upgraded.

I installed links browser and went to one of the URLs urlchecker deemed invalid. It works but claims the cert is invalid...

But, I get your point. Maybe the container is misconfigured. I certainly don't have much experience with them and I didn't launch it to use --network=host.

markcmiller86 · 2024-02-02T01:32:21Z

Ok, I gave up on docker. Installed on pascal. Asked ChatGPT for known sites with bad certs...

expired.badssl.com is set up with an expired SSL certificate.
self-signed.badssl.com uses a self-signed certificate.
wrong.host.badssl.com has a certificate that does not match the domain name.

Created this file

https://www.cultureco-op.com
https://expired.badssl.com
https://self-signed.badssl.com
https://wrong.host.badssl.com
https://www.sandia.gov

With --no-check-certs all pass. With it, it flags wrong and expired cases. So, I think this is working. Thanks for adding the feature!

markcmiller86 · 2024-02-02T01:33:37Z

Also, not sure what you are doing on back-end as far as testing urlchecker but I asked ChatGPT about useful URLs to use for testing....

Yes, for testing tools that check the validity and functionality of URLs in text files, it's helpful to use a variety of test websites and addresses that simulate different scenarios. Here are several categories and examples:

HTTP Status Codes: Websites that return various HTTP status codes can help you test how your tool handles success, redirection, client errors, and server errors.
- httpstat.us provides specific status codes (e.g., http://httpstat.us/200 for OK, http://httpstat.us/404 for Not Found, http://httpstat.us/500 for Internal Server Error).
Invalid URLs: To test the handling of invalid URLs, you can construct URLs that are clearly malformed or unlikely to exist.
- Example: http://thisisnotarealwebsite.invalid, https://123.456.789.012, or ftp://invalid.url.example.
Timeout and Delay: To check how your tool handles timeouts and slow responses.
- http://httpbin.org/delay/5 delays the response by 5 seconds, which can be used to simulate a slow server.
DNS Errors: URLs that simulate DNS resolution errors can test how your tool handles domain names that cannot be resolved.
- Example: http://domain.notfound.example, assuming example is a valid TLD but the subdomain does not exist.
Redirects: Testing how your tool handles HTTP redirects is crucial for ensuring it follows or respects redirects correctly.
- http://httpbin.org/redirect/1 redirects to another page, which can be used to test redirect handling.
SSL/TLS Issues: As previously mentioned, badssl.com hosts various subdomains with specific SSL/TLS issues, which is useful for testing secure connection errors.
Large Response Bodies: To test how your tool handles large data transfers.
- http://httpbin.org/stream/20 streams 20 lines of JSON, which can be useful for testing how your tool handles streaming data or large responses.
WebSockets: Testing WebSocket connections can be important for tools that need to verify real-time communication protocols.
- wss://echo.websocket.org provides a WebSocket server that echoes messages sent to it, useful for testing WebSocket connections.

When using these resources, it's important to consider the impact of your testing on third-party services. Ensure that your testing complies with any usage policies or terms of service to avoid causing undue load or other issues.

Apparently the action does not work with URls pointing at content. See also: - urlstechie/urlchecker-action#105

vsoch mentioned this issue Feb 1, 2024

add option to --no-check-certs use at own risk urlstechie/urlchecker-python#89

Merged

markcmiller86 mentioned this issue Feb 2, 2024

URL testing in wikize_refs betterscientificsoftware/bssw.io#1990

Open

Garanas added a commit to FAForever/fa that referenced this issue Jul 15, 2024

Remove invalid URL and exclude some URLs

2fa7f76

Apparently the action does not work with URls pointing at content. See also: - urlstechie/urlchecker-action#105

Any reason valid links to pdf files might raise false alarms #105

Any reason valid links to pdf files might raise false alarms #105

Comments

markcmiller86 commented Feb 1, 2024

vsoch commented Feb 1, 2024 • edited Loading

markcmiller86 commented Feb 1, 2024 • edited Loading

vsoch commented Feb 1, 2024

markcmiller86 commented Feb 1, 2024

vsoch commented Feb 1, 2024

vsoch commented Feb 1, 2024

vsoch commented Feb 1, 2024

markcmiller86 commented Feb 1, 2024

vsoch commented Feb 1, 2024

markcmiller86 commented Feb 1, 2024

vsoch commented Feb 1, 2024

markcmiller86 commented Feb 1, 2024

vsoch commented Feb 1, 2024

markcmiller86 commented Feb 1, 2024

markcmiller86 commented Feb 1, 2024

vsoch commented Feb 1, 2024

markcmiller86 commented Feb 1, 2024

vsoch commented Feb 1, 2024

markcmiller86 commented Feb 1, 2024

vsoch commented Feb 1, 2024

markcmiller86 commented Feb 1, 2024

vsoch commented Feb 1, 2024

markcmiller86 commented Feb 1, 2024

vsoch commented Feb 1, 2024

vsoch commented Feb 1, 2024

markcmiller86 commented Feb 1, 2024 • edited Loading

markcmiller86 commented Feb 1, 2024

vsoch commented Feb 1, 2024

markcmiller86 commented Feb 2, 2024

markcmiller86 commented Feb 2, 2024

vsoch commented Feb 2, 2024

vsoch commented Feb 2, 2024

markcmiller86 commented Feb 2, 2024

markcmiller86 commented Feb 2, 2024

markcmiller86 commented Feb 2, 2024

vsoch commented Feb 1, 2024 •

edited

Loading

markcmiller86 commented Feb 1, 2024 •

edited

Loading

markcmiller86 commented Feb 1, 2024 •

edited

Loading