-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
/start #19
Comments
#NIK |
|
Hello I'm trying to run the code but I keep running into the error |
Is this due to the twitter api key, I'm using a free X api key |
Never mind I've fixed it |
How did you resolved it? |
Ok, the todayStories in scrapeSources.ts was returning none giving me that error, Initially I thought I set the wrong fire crawl api key but it was the right I api key so I just added and if block to check if I receive any data I will send a code sample |
if (todayStories && todayStories.stories) { |
@An0n-xen thanks so much you saved my day <3 |
I'm facing the same problem @An0n-xen and @hsmnzaydn. There is an problem with the extract methode, but you can use another alternative to this, so this is the new solution: import FirecrawlApp from '@mendable/firecrawl-js';
import dotenv from 'dotenv';
// Removed Together import
import { z } from 'zod';
// Removed zodToJsonSchema import since we no longer enforce JSON output via Together
dotenv.config();
// Initialize Firecrawl
const app = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });
// 1. Define the schema for our expected JSON
const StorySchema = z.object({
headline: z.string().describe("Story or post headline"),
link: z.string().describe("A link to the post or story"),
date_posted: z.string().describe("The date the story or post was published"),
});
const StoriesSchema = z.object({
stories: z.array(StorySchema).describe(
"A list of today's AI or LLM-related stories"
),
});
export async function scrapeSources(sources: string[]) {
const num_sources = sources.length;
console.log(`Scraping ${num_sources} sources...`);
let combinedText: { stories: any[] } = { stories: [] };
// Configure these if you want to toggle behavior
const useTwitter = false;
const useScrape = true;
for (const source of sources) {
// --- 1) Handle x.com (Twitter) sources ---
if (source.includes("x.com")) {
if (useTwitter) {
const usernameMatch = source.match(/x\.com\/([^\/]+)/);
if (usernameMatch) {
const username = usernameMatch[1];
// Build the search query for tweets
const query = `from:${username} has:media -is:retweet -is:reply`;
const encodedQuery = encodeURIComponent(query);
// Get tweets from the last 24 hours
const startTime = new Date(
Date.now() - 24 * 60 * 60 * 1000
).toISOString();
const encodedStartTime = encodeURIComponent(startTime);
// x.com API URL
const apiUrl = `https://api.x.com/2/tweets/search/recent?query=${encodedQuery}&max_results=10&start_time=${encodedStartTime}`;
// Fetch recent tweets from the Twitter API
const response = await fetch(apiUrl, {
headers: {
Authorization: `Bearer ${process.env.X_API_BEARER_TOKEN}`,
},
});
if (!response.ok) {
throw new Error(`Failed to fetch tweets for ${username}: ${response.statusText}`);
}
const tweets = await response.json();
if (tweets.meta?.result_count === 0) {
console.log(`No tweets found for username ${username}.`);
} else if (Array.isArray(tweets.data)) {
console.log(`Tweets found from username ${username}`);
const stories = tweets.data.map((tweet: any) => {
return {
headline: tweet.text,
link: `https://x.com/i/status/${tweet.id}`,
date_posted: startTime,
};
});
combinedText.stories.push(...stories);
} else {
console.error(
"Expected tweets.data to be an array:",
tweets.data
);
}
}
}
}
// --- 2) Handle all other sources with Firecrawl extract ---
else {
if (useScrape) {
// Firecrawl will both scrape and extract for you
// Provide a prompt that instructs Firecrawl what to extract
const currentDate = new Date().toLocaleDateString();
const promptForFirecrawl = `
Return only today's AI or LLM related story or post headlines and links in JSON format from the page content.
They must be posted today, ${currentDate}. The format should be:
{
"stories": [
{
"headline": "headline1",
"link": "link1",
"date_posted": "YYYY-MM-DD"
},
...
]
}
If there are no AI or LLM stories from today, return {"stories": []}.
The source link is ${source}.
If a story link is not absolute, prepend ${source} to make it absolute.
Return only pure JSON in the specified format (no extra text, no markdown, no \`\`\`).
`;
console.log("get the post")
// !! new method
const scrapeResult = await app.scrapeUrl(source, {
formats: ["extract"],
extract: {
prompt: promptForFirecrawl,
schema: StoriesSchema
}
});
if (!scrapeResult.success || !scrapeResult.extract?.stories) {
throw new Error(`Failed to scrape: ${scrapeResult.error}`);
}
// The structured data
const todayStories = scrapeResult.extract;
console.log(todayStories)
if (todayStories && todayStories.stories) {
console.log(
`Found ${todayStories.stories.length} stories from ${source}`
);
combinedText.stories.push(...todayStories.stories);
} else {
console.log(`No valid stories data found from ${source}`);
}
}
}
}
// Return the combined stories from all sources
const rawStories = combinedText.stories;
console.log(rawStories);
return rawStories;
} |
#NIK
The text was updated successfully, but these errors were encountered: