-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We should be able to use <example_url> for example URLs that should not be selected #11
Comments
dont_select will be a better attribute name. |
http://www.rottentomatoes.com/m/django_unchained_2012/trailers/ gets selected by some other nested metadata, right? |
Also: Maybe we make example urls go in a structure instead of adding an attribute? <selects>
<example_url url=" .... " />
<example_url url=" .... " />
</selects>
<never_selects>
<example_url url="...." />
</never_selects> |
No, it is not selected by any metadata, at least for now. We don't have a Best Regards, On Mon, Mar 4, 2013 at 9:03 PM, Tom White [email protected] wrote:
|
This looks more readable. The only issue is that it will not be compatible Best Regards, On Mon, Mar 4, 2013 at 9:04 PM, Tom White [email protected] wrote:
|
Do example urls belong reasonably in the selector? :\ |
the selector is a good place to put them for authoring. i think we already have at least one program for collecting those it seems sensible to me as a structure. andruid On Tue, Mar 5, 2013 at 9:15 PM, Tom White [email protected] wrote:
andruid kerne, ph.d. http://facebook.com/ecologylab Interfaces are the multidimensional border zones through which the |
Was just thinking, the names and <never_selects> seem misleading if they're in the selector. <selector url_regex="http://www.rottentomatoes.com/m/[^/]*/" domain="rottentomatoes.com"/>
<example_url url="http://www.rottentomatoes.com/m/inglourious_basterds/" /> Maybe we could prefer this? <selector url_regex="http://www.rottentomatoes.com/m/[^/]*/" domain="rottentomatoes.com">
<should_select>
<example_url url="http://www.rottentomatoes.com/m/inglourious_basterds/" />
</should_select>
<should_never_select>
<example_url url="...." />
</should_never_select>
</selector> Since "should" conveys a heuristic, rather than tempting someone into thinking "If I have a bunch of it'll just select the things I give it." |
This needs some discussion before proceeding.
A future step will be to automatically correct extraction rules using cached source page and metadata. |
A new tag will be used for bad example urls. |
This can be done by something like:
Also, the unit test that exercises and verifies all example_urls being selected by the corresponding selector should be updated, acknowledging this new attribute.
One use case is: the <imdb_movie> selector should not select this URL (it used to): http://www.rottentomatoes.com/m/django_unchained_2012/trailers/
The text was updated successfully, but these errors were encountered: