One of the problem is that even on Gutenberg, we don't have all the most important books of the French litterature. Generate zimwriterfs-friendly folder of static HTML files based on templates and list of books.Generate a static folder repository of all ePUB files.Download the books based on filters (formats, languages).Query the database to reflect filters and get list of books.Loop through folder/files and parse RDF.Git clone git://.net/p/kiwix/other kiwix-other Sudo apt-get install libzim-dev liblzma-dev libmagic-dev autoconf automake The best Goobuntu packaged option seems to be: If you can somehow filter which books to fetch (language-only, book-range), that will be convenient So a on-disk-caching, robots-obeying url-retriever needs to be made/reused. So a caching fetch-by-url seems more convenient, the rdf-file contains the timestamp, which could be compared so updates to a book will be caught. To get epub+text+html, you'll need both rsync-trees, which seems quite inconvenient. If I cd gutenberg-generated, there is stuff like: If you want to share files, use a tools to share files, it will provides better features (listing of files, download resume. Rsync -av -del /var/www/gutenberg-generated We don't want kiwix-serve being stunck by a download of a zim file, so use external tools. Gutenberg supports rsync ( rsync -av -del /var/That was source, the generated data: Kiwix-Serve is provided as a self-contained binary in the Kiwix-Tools suite, which you can download below. Wget works, contains 30k directories with each an rdf-file: every directory has 1 file with the rdf-description of one book.Įmmanuel suggests the scraper should download everything into one dir, then converting the data into an output dir, then zim-ifying that directory. Simply start Kiwix-Serve on your machine, and your content will be available for anybody through their web browser. Work done by didier chez and cniekel chez Run zimwriterfs to create the corresponding ZIM file of your target directory.Fill the HTML templates with the data from the XML/RDF and write the index pages in a target directory.Create the necessary templates of the index web pages (For the search/filter feature, a javascript client side solution should be tried).Download the necessary HTML+EPUB data from based on the XML/RDF Catalog in a target directory.The full list of programs grouped by operating systems can be found. Download Kiwix and install it on your device. Parse the XML/RDF and put the data in a structured manner (memory or local DB) The solution to this problem is very simple.Retrieve the list of books is published by the Gutenberg project in XML/RDF format.The ZIM should provide a simple filtering/search solution to find content (by author, language, title.The texts should be available in HTML and EPUB.A script (python/perl/nodejs) able to create quickly a ZIM file with all books in all languages.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |