Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/start #19

Open
tolcipularang02 opened this issue Jan 5, 2025 · 14 comments
Open

/start #19

tolcipularang02 opened this issue Jan 5, 2025 · 14 comments

Comments

@tolcipularang02
Copy link

tolcipularang02 commented Jan 5, 2025

#NIK

@tolcipularang02
Copy link
Author

#NIK

@tolcipularang02
Copy link
Author

#NIK

@An0n-xen
Copy link

An0n-xen commented Jan 7, 2025

Hello I'm trying to run the code but I keep running into the error
TypeError: Cannot read properties of undefined (reading 'stories')
at C:\Users\Hawis\Documents\Projects\Personal\trendFinder\src\services\scrapeSources.ts:128:33
at Generator.next ()
at fulfilled (C:\Users\Hawis\Documents\Projects\Personal\trendFinder\src\services\scrapeSources.ts:5:58)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
[nodemon] clean exit - waiting for changes before restart

@An0n-xen
Copy link

An0n-xen commented Jan 7, 2025

Is this due to the twitter api key, I'm using a free X api key

@An0n-xen
Copy link

An0n-xen commented Jan 7, 2025

Never mind I've fixed it

@hsmnzaydn
Copy link

Never mind I've fixed it

How did you resolved it?

@An0n-xen
Copy link

An0n-xen commented Jan 7, 2025

Ok, the todayStories in scrapeSources.ts was returning none giving me that error, Initially I thought I set the wrong fire crawl api key but it was the right I api key

so I just added and if block to check if I receive any data

I will send a code sample

@An0n-xen
Copy link

An0n-xen commented Jan 7, 2025

if (todayStories && todayStories.stories) {
console.log(
Found ${todayStories.stories.length} stories from ${source}
);
combinedText.stories.push(...todayStories.stories);
} else {
console.log(No valid stories data found from ${source});
}

@hsmnzaydn
Copy link

@An0n-xen thanks so much you saved my day <3

@An0n-xen
Copy link

An0n-xen commented Jan 7, 2025

but in my firecrawl is show it made those requests, however I still receive none response
crawl

@hsmnzaydn
Copy link

but in my firecrawl is show it made those requests, however I still receive none response crawl

I have same problem :( If I will fix share my solution

@An0n-xen
Copy link

An0n-xen commented Jan 7, 2025

after adding the if check to fix the none issue this is what I get
check

@An0n-xen
Copy link

An0n-xen commented Jan 7, 2025

but in my firecrawl is show it made those requests, however I still receive none response crawl

I have same problem :( If I will fix share my solution

sure would really appreciate that

@i-am-henri
Copy link

I'm facing the same problem @An0n-xen and @hsmnzaydn. There is an problem with the extract methode, but you can use another alternative to this, so this is the new solution:

import FirecrawlApp from '@mendable/firecrawl-js';
import dotenv from 'dotenv';
// Removed Together import
import { z } from 'zod';
// Removed zodToJsonSchema import since we no longer enforce JSON output via Together

dotenv.config();

// Initialize Firecrawl
const app = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });

// 1. Define the schema for our expected JSON
const StorySchema = z.object({
  headline: z.string().describe("Story or post headline"),
  link: z.string().describe("A link to the post or story"),
  date_posted: z.string().describe("The date the story or post was published"),
});

const StoriesSchema = z.object({
  stories: z.array(StorySchema).describe(
    "A list of today's AI or LLM-related stories"
  ),
});

export async function scrapeSources(sources: string[]) {
  const num_sources = sources.length;
  console.log(`Scraping ${num_sources} sources...`);

  let combinedText: { stories: any[] } = { stories: [] };

  // Configure these if you want to toggle behavior
  const useTwitter = false;
  const useScrape = true;

  for (const source of sources) {
    // --- 1) Handle x.com (Twitter) sources ---
    if (source.includes("x.com")) {
      if (useTwitter) {
        const usernameMatch = source.match(/x\.com\/([^\/]+)/);
        if (usernameMatch) {
          const username = usernameMatch[1];

          // Build the search query for tweets
          const query = `from:${username} has:media -is:retweet -is:reply`;
          const encodedQuery = encodeURIComponent(query);

          // Get tweets from the last 24 hours
          const startTime = new Date(
            Date.now() - 24 * 60 * 60 * 1000
          ).toISOString();
          const encodedStartTime = encodeURIComponent(startTime);

          // x.com API URL
          const apiUrl = `https://api.x.com/2/tweets/search/recent?query=${encodedQuery}&max_results=10&start_time=${encodedStartTime}`;

          // Fetch recent tweets from the Twitter API
          const response = await fetch(apiUrl, {
            headers: {
              Authorization: `Bearer ${process.env.X_API_BEARER_TOKEN}`,
            },
          });

          if (!response.ok) {
            throw new Error(`Failed to fetch tweets for ${username}: ${response.statusText}`);
          }

          const tweets = await response.json();

          if (tweets.meta?.result_count === 0) {
            console.log(`No tweets found for username ${username}.`);
          } else if (Array.isArray(tweets.data)) {
            console.log(`Tweets found from username ${username}`);
            const stories = tweets.data.map((tweet: any) => {
              return {
                headline: tweet.text,
                link: `https://x.com/i/status/${tweet.id}`,
                date_posted: startTime,
              };
            });
            combinedText.stories.push(...stories);
          } else {
            console.error(
              "Expected tweets.data to be an array:",
              tweets.data
            );
          }
        }
      }
    }
    // --- 2) Handle all other sources with Firecrawl extract ---
    else {
      if (useScrape) {
        // Firecrawl will both scrape and extract for you
        // Provide a prompt that instructs Firecrawl what to extract
        const currentDate = new Date().toLocaleDateString();
        const promptForFirecrawl = `
        Return only today's AI or LLM related story or post headlines and links in JSON format from the page content.
        They must be posted today, ${currentDate}. The format should be:
        {
          "stories": [
            {
              "headline": "headline1",
              "link": "link1",
              "date_posted": "YYYY-MM-DD"
            },
            ...
          ]
        }
        If there are no AI or LLM stories from today, return {"stories": []}.
        
        The source link is ${source}. 
        If a story link is not absolute, prepend ${source} to make it absolute. 
        Return only pure JSON in the specified format (no extra text, no markdown, no \`\`\`). 
        `;
        console.log("get the post")
        // !! new method
        const scrapeResult = await app.scrapeUrl(source, {
          formats: ["extract"],
          extract: {
            prompt: promptForFirecrawl,
            schema: StoriesSchema
          }
        });

        if (!scrapeResult.success || !scrapeResult.extract?.stories) {
          throw new Error(`Failed to scrape: ${scrapeResult.error}`);
        }

        // The structured data
        const todayStories = scrapeResult.extract;
        console.log(todayStories)
        if (todayStories && todayStories.stories) {
          console.log(
            `Found ${todayStories.stories.length} stories from ${source}`
          );
          combinedText.stories.push(...todayStories.stories);
        } else {
          console.log(`No valid stories data found from ${source}`);
        }
      }
    }
  }

  // Return the combined stories from all sources
  const rawStories = combinedText.stories;
  console.log(rawStories);
  return rawStories;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants