Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Structured Extract with French Language #870

Closed
KuriaMaingi opened this issue Nov 5, 2024 · 4 comments
Closed

[Bug] Structured Extract with French Language #870

KuriaMaingi opened this issue Nov 5, 2024 · 4 comments
Assignees
Labels
blocked bug Something isn't working question Further information is requested

Comments

@KuriaMaingi
Copy link

Describe the Bug
Attempting to use the structured extract of a French language site

class ExtractSchema(BaseModel):
image: str
product_title: str
product_description: str
price: float
age: str
ean_or_productcode: str
brand: str
format: str
number_of_players: str
length_or_width: str
height: str
depth: str
playing_time: str
mechanisms: str
price_currency: str

1st Link Fails:
Link 1
Results: 'extract': 'ogLocaleAlternate:|google:notranslate'

2nd Link Successful:

Link 2
Results: 'extract': "ogTitle:Acheter Nexcube 3x3 Classic - MoYu - Casse-têtes|ogDescription:'Avec Nexcube 3x3 Classic, faites tourner les cases de ce Cube jusqu''à ce que chaque côté du cube ait une couleur uniforme. Un casse-tête ergonomique conçu pour la compétition.'|ogImage:https://cdn1.philibertnet.com/517165-large_default/nexcube-3x3-classic.jpg|ogLocaleAlternate:|ogSiteName:Philibert|og:title:Acheter Nexcube 3x3 Classic - MoYu - Casse-têtes|og:site_name:Philibert|og:description:'Avec Nexcube 3x3 Classic, faites tourner les cases de ce Cube jusqu''à ce que chaque côté du cube ait une couleur uniforme. Un casse-tête ergonomique conçu pour la compétition.'|og:type:product|og:image:https://cdn1.philibertnet.com/517165-large_default/nexcube-3x3-classic.jpg|google-site-verification:eOyJ7NyAZOoDK45PX0O9qnGLhUd3ebBikLzZOD7D-Ic"},

To Reproduce
Steps to reproduce the issue:
firecrawl_client.scrape_url( url, params={'formats': ['extract'], 'extract': {'schema':extract_schema}, 'location': {'country': 'FR'} }

Expected Behavior
I would expect the LLM to be able to translate between the two languages given the location param.

If the issue isn't the language but rather the site vs. the schema, would be good to know as well

Environment (please complete the following information):

  • OS: [Windows]
  • Firecrawl Version: [e.g. 1.4.0]
@KuriaMaingi KuriaMaingi added the bug Something isn't working label Nov 5, 2024
@nickscamara
Copy link
Member

Hey @KuriaMaingi, we are taking a look. Are you self hosting or using the cloud service?

@nickscamara nickscamara added the question Further information is requested label Dec 20, 2024
@linear linear bot added the blocked label Dec 20, 2024
@linear linear bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 30, 2024
@KuriaMaingi
Copy link
Author

Hey @nickscamara sorry didn't see the notification. It was cloud

@nickscamara
Copy link
Member

Oh okay! Thanks! @KuriaMaingi. ccing @tomkosm here and re-opening.

@nickscamara nickscamara reopened this Dec 31, 2024
@yodakaEngineer
Copy link

I am facing the same problem.

data = app.scrape_url(url, 
  params={
    'location': {
      'country': 'JP',
      'languages': ['ja-JP']
    },
    'formats': ['json'],
    'jsonOptions': {
        'schema': ExtractSchema.model_json_schema(),
    },
  })

Scraping is success, but It returns json of english content.
ja-JP and JP are not working.
I use the cloud service by API KEY.

@linear linear bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants