LoreonLabsPlatform
DocsHome
  • Overview

Intelligence

  • Markets
  • Builders
  • Research
  • Ecosystems
  • Launchpads
  • Search
Ecosystems

Other

web-languages

Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ for the code

OtherEmerging
GitHubWebsite
Stars
1
Forks
—
Contributors
8
Last push
9d ago

Recent commits

Latest commits.

  • add region details
    9d53514Greg Lindahl12d ago
  • Improve Somali (som) URL list
    7e1738ckhaledyusuf4413d ago
  • Include link to Romanian language dictionaries
    95a321aAndrei GUDIU13d ago
  • fix: links to Basque, Catalan, and Galician pages (#203)
    cbd557aMichael Paris15d ago
  • fix: too many slashes in URL for French Ministry of Culture (#202)
    f0b0df6Michael Paris15d ago
Add universities
fc925d7Jordi Mas19d ago
  • Add more Catalan language resources across the different categories plus Andorra
    2a791ebJordi Mas19d ago
  • docs: added walloon linguistic resources (#199)
    f6f40c4shibu20d ago
  • Top contributors

    Builders behind this project.

    e-Winnie
    66 commits
    thunderpoot
    33 commits
    evanpacini
    22 commits
    wumpus
    10 commits
    Nativeatom
    8 commits
    twagoo
    7 commits
    swaptr
    6 commits
    BitsandGits
    5 commits