LoreonLabsPlatform
DocsHome
  • Overview

Intelligence

  • Markets
  • Builders
  • Research
  • Ecosystems
  • Launchpads
  • Search
Ecosystems

FLUX

warc-crawler

Process web archives (WARC format) with StormCrawler and index content into OpenSearch

FLUXEmergingapache-stormopensearchstormcrawlerwarc
GitHub
Stars
9
Forks
1
Contributors
1
Last push
17d ago

Recent commits

Latest commits.

  • Upgrade to StormCrawler 3.6.0
    a1f08cdSebastian Nagel25d ago
  • Add GitHub workflow to build the project
    170f45aSebastian Nagel3mo ago
  • Remove Solr index topology
    576b128Sebastian Nagel3mo ago
  • Upgrade to StormCrawler 3.5.1
    48dc6a1Sebastian Nagel3mo ago
  • Extend parsefilters to match Tika 2.x metadata
    1790689Sebastian Nagel56mo ago
Add Tika parser bolt to topologies
c80abdcSebastian Nagel56mo ago
  • Upgrade to Stormcrawler 2.2 (snapshot) and Storm 2.3.0
    ea374b8Sebastian Nagel56mo ago
  • Upgrade to Stormcrawler 1.18
    b2aa929Sebastian Nagel56mo ago
  • Top contributors

    Builders behind this project.

    sebastian-nagel
    12 commits