-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: baseline violations #137
Comments
Thank you for the proposal. I've missed a baseline feature for a very long time.
Exactly this. PHPStan and Psalm come with a baseline and it makes working with them on legacy code so much easier. We can look at how they've built a baseline if we need inspiration. From the options above, I think I'd pick "violation count" because it feels like a reasonable compromise. I believe we should use the same configuration format for the baseline as for the main configuration, so XML. |
Thanks @frankdekker for taking the time to write this up. I'm missing two things though in this write-up. 1. Discussion of other approachesThis is basically about making it easier to introduce PHPCS to older codebases. A baseline is one of the most detailed options - which also makes it the most complex and creates the most work for implementing and maintaining the feature -, but there have been various proposals in the past suggesting other possibilities. Here's a list of some tickets I found via a quick search in the Squizlabs repo. Not claiming completeness. A more thorough search might a good idea.
I also know there are various external projects which have already created either threshold based or baseline based solutions for their own purposes, some re-usable by other projects, some custom to one project. It may be worth looking at those for inspiration and talking through the pros and cons of each approach ?
On that note: is it a must for a baseline/threshold feature to be part of PHPCS itself or is there a conceivable way to build this as an external dependency which could be maintained separately ? (with possibly some tweaks in PHPCS to accommodate) Without an opinion on the above, know that I'd be happy to create a new repo in this organisation and give someone maintainer rights to it, if the discussion would lean that way. Could also possibly be a way to work on a proof of concept, work out the kinks and stabilize it before adding it into PHPCS itself. 2. Impact on PHPCBFWhen running If we take the example of a baseline with code signature - it is likely that by the second loop the signature will no longer match, so the violation will no longer be ignored. If we take the example of baseline with a count per error code - if the count is 2 and the number of violations found is 5, how is PHPCBF to know which one to fix ? ...etc... It's easy to glance over this when comparing with PHPStan or Psalm, but as far as I know, those don't run fixers. |
Hi, thanks for the reply. I've looked some through older tickets and different baseline strategies. I took the strategies I would find useful. 1. Discussion of other approachessquizlabs/PHP_CodeSniffer#2094 I skimmed through the remaining approaches and created the 3 strategies I listed in the first post. Extensions
Most/some of these projects were created by the lack of the baseline feature in PHPCS itself. In my personal opinion I would expect a project like phpcs would contain its own baseline mechanism. PHPStan, PHPMD provide this mechanism out of the box, and creating the baseline in these tools is usually only adding a I can understand that you don't want the maintainer responsibility for this. Adding support for extension would be nice second option then. Although that approach is out of scope for this issue. If that option is desired, this issue can be closed. 2. Impact on PHPCBF
By default if a violation is in the baseline, phpcbf shouldn't pick it up. That's currently the behaviour with https://github.com/123inkt/php-codesniffer-baseline extension I wrote. The code signature mechanism, takes about 3 lines above and below, and generates a signature. So only changes within these lines will invalidate the baseline. For phpcbf to make changes near that line, somebody already made modifications near these lines because it's not in the baseline and phpcbf detected it. By default phpcbf will not change the baselined code and therefore will not accidentally invalidate violations. Only when you edit near these violations, but you're already doing that as developer then.
This one is more interesting indeed. I would expect it to ignore violation 1 and 2, and only fix violation 3-5. But obviously, violation 1 and 2 might be the new violations. This would happen less often with the code signature approach. There's no clean solution for this, but I look at it differently. Does the baseline have to be perfect? The goal here is that devs do not add more violations to an existing codebase, and that violations count is only going down. The above problems only arises when a dev makes modifications to that specific file. Shouldn't Secondly, when you hit a PHPCS violation issue as dev, those are usually very low impact fixes. Most of the time it's removing or adding some whitespaces without running the risk of breaking something in the codebase. The problem is that you can't blindly update a codebase with 20000 violations. Hence in my opinion, a near perfect baseline feature would also suffice. |
Speaking for myself, it would be acceptable if PHPCBF would simply ignore the baseline completely. For me, the baseline is mostly interesting for non-fixable sniffs. If I run PHPCBF, I want stuff to be fixed. If the fixer is safe, I'd rather run it on the whole codebase once than baseline all violations. And if the fixer is unsafe, I'd disable the sniff for PHPCBF until all violations have been addressed. |
For formats, I suggest you also include php. For large baseline files it can be a big performance improvement not to have to parse a config format. |
We need this feature. Is there any way to use/test it? |
@jrfnl any chance we can get baseline support added? I'll put up $100 for it :) |
@LukeTowers If someone can figure out a way to make a baseline work with a system containing auto-fixing ability, I'm willing to consider it. |
@jrfnl @LukeTowers Is there more discussion needed for the other points? For example regarding the baseline strategy. Violation count vs violation message vs code signature? |
@frankdekker That alone is IMO a way too crude solution and will still lead to tons of support requests and complaints about things not working as expected, so yes, that would still need a lot more discussion and more than anything, please think through the different scenarios. |
@jrfnl what are you looking for specifically? Can @frankdekker submit an initial draft PR and then we can continue the discussion about any concerns you might have there so that we can have actionable steps to take in order to get this feature added? |
@LukeTowers Been there, done that. Not helpful. |
@jrfnl can you list out what specifically you need answers for before you'll accept a PR? Right now I'm not really sure how the discussion on this progresses as all you've said is that it "needs a lot more discussion" and to "think through the different scenarios". If you could list off your specific concerns and the scenarios you'd like considered then that would be very helpful to help guide the discussion so that we can come up with a solution. |
I'm looking for a well thought out architectural design for this. Not a "works for us"-implementation which ignores the full capabilities of PHPCS. |
In a large existing codebase, it usually not possible to fix all violations at once. Or when new violations appear when new rules are added. Think for example the Slevomat Coding Standard. This feature will allow to create a baseline of the current violations, and only report new violations. This is a proposal to add a baseline feature:
Commandline
Create a baseline, and save it to the
--baseline-file <path>
or default tophpcs-baseline.xml
Update the baseline, but only remove, not add new violations
Using the baseline. If
--baseline-file
argument is absent will look forphpcs-baseline.xml
in the root of the project.To ignore the current baseline:
Options for baseline file format
Which baseline format would we prefer?
xml
: No additional dependency, due to theext-simplexml
dependency.json
: can be read with if theext-json
dependency is added.neon
: requires additional dependency, for examplenette/neon
.Proposal for baseline file name
phpcs-baseline._format
. Similar to PHPStan's phpstan-baseline.neon./vendor/
directory lives.Proposal for strict mode
result in a violation. Commandline option:
--baseline-strict
( ? )Violation tracking strategies
To be able to track violations, we need to be able to identify them. We minimally need:
file path
: This is the file path the violation occurred in. The file path will be normalized to be relative to the project root and only contain forward slashes.Further the following strategies can be chosen:
Code signature:
the name of the sniff
: For examplePSR12.Files.FileHeader.SpacingAfterBlock
.hash of surrounding code
: The code surrounding the violation is hashed. This is a very strict strategy, butalso very sensitive to changes in the code. Downside there's a performance penalty for hashing the code.
Example:
Violation count:
the name of the sniff
: For examplePSR12.Files.FileHeader.SpacingAfterBlock
.count the number of violations
: Keep track of thatPSR12.Files.FileHeader.SpacingAfterBlock
occurs at most 2 timesin the file. This is a very loose strategy, but also very insensitive to changes in the code. Downside is that
it's not very strict, and it's not possible to track the specific violation messages. It will however prevent
additional violations from being added.
Example:
Violation message:
the name of the sniff
: For examplePSR12.Files.FileHeader.SpacingAfterBlock
.the violation message
: Keep track of the specific violation message. Token index andline number should be excluded here as adding code above the violation with change these values.
The difference with the strategy above is that different messages within the same Sniff can be
baselined. This will improve accuracy.
count the number of violations
: In this strategy also keep count of the violations.Example:
Final words
In short:
Choose file format:
xml
,json
, orneon
Choose track strategy:
code signature
,violation count
ormessage + violation count
I intend to create a pull request to implement this feature. See Add phpcs baseline support #115
The text was updated successfully, but these errors were encountered: