Bay 12 Games Forum

Dwarf Fortress => DF Gameplay Questions => DF Wiki Discussion => Topic started by: telamon on February 18, 2012, 01:35:42 am

Title: Obtaining a dump of the wiki
Post by: telamon on February 18, 2012, 01:35:42 am
I spend a lot of time without internet access, and DF is such a hardcore game that I find myself severely hobbled without access to the glorious wiki to provide me with the raw information I need. I'm almost unable to play DF without either immediate access to the wiki, or printouts of the pages I require. Since I'm getting back into the game right after a new version's release (and I stopped playing around 31.18, so there are a lot of features I have yet to learn), I'm in constant need of the wiki to familiarize myself, so I basically can't play DF offline.

Wikipedia has the facility to dump (http://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia) a massive number of pages, minus talks, users and old revisions, to a reader's hard drive, and the WikiTaxi (http://www.wikitaxi.org/delphi/doku.php/products/wikitaxi/index) tool is capable of interpreting those dumps (ie the mediawiki database format) and displaying them in a convenient reader for offline use. This would totally solve my problem, and we can assume that I have enough hard drive space to handle the volume. Is it at all possible to get this dump service for an entire namespace of the DF wiki, or is that too big a task for the admin team to worry about right now?
Title: Re: Obtaining a dump of the wiki
Post by: Locriani on February 18, 2012, 01:50:28 pm
We recently added a toolserver to the mix.  I can't promise when it will get done, but we will look into providing automated dumps for the wiki for download, as soon as we get image uploads working again.
Title: Re: Obtaining a dump of the wiki
Post by: umaxtu on February 24, 2012, 11:12:53 pm
A dump of the wiki would be nice.  8) But its not urgent so no rush :D

Edit: What is a toolserver? (Just curious)
Title: Re: Obtaining a dump of the wiki
Post by: GalenEvil on March 15, 2012, 02:35:26 am
I think this is a pretty good idea as well, would prevent me from having to go to each individual page and then save it via CTRL+S and save as a whole webpage... I wonder though, would links be functional between pages within the dumped wiki?
Title: Re: Obtaining a dump of the wiki
Post by: Kogut on March 15, 2012, 02:37:57 am
Edit: What is a toolserver? (Just curious)
Server hosting bots and various useful tools. Description of Wikipedia toolserver: http://en.wikipedia.org/wiki/Wikipedia:Toolserver
Title: Re: Obtaining a dump of the wiki
Post by: umaxtu on March 17, 2012, 01:05:51 pm
Quote
Server hosting bots and various useful tools. Description of Wikipedia toolserver: http://en.wikipedia.org/wiki/Wikipedia:Toolserver

Thank you. :D
Title: Re: Obtaining a dump of the wiki
Post by: badwin on March 30, 2012, 12:06:35 am
Is this a possible thing to do now? I don't really know much about wikis, so sorry if it's still not an implemented feature.
Title: Re: Obtaining a dump of the wiki
Post by: telamon on March 30, 2012, 12:18:39 am
afaik, it's very poor etiquette to spider a website and download all the pages manually since you take up a lot of the site's bandwidth if it's big (tools exist that can do that, check out httrack or the classic wget). since the wiki, to the best of my knowledge, has not yet added a page to access this feature, i can only assume that it's not yet available.
Title: Re: Obtaining a dump of the wiki
Post by: Locriani on May 03, 2012, 09:50:21 am
Your wish is my command.
http://dwarffortresswiki.org/images/dump.xml.gz (http://dwarffortresswiki.org/images/dump.xml.gz)

http://dwarffortresswiki.org/images/dump.xml.bz2 (http://dwarffortresswiki.org/images/dump.xml.bz2)
This dump is automagically updated daily.
Title: Re: Obtaining a dump of the wiki
Post by: telamon on May 03, 2012, 04:29:52 pm
Thank you so much! Works perfectly with wikitaxi.

Note for other people trying to use the dump: because it's gzipped, wikitaxi cannot interpret it (I was surprised that it couldn't... really if you can support one of the archive formats, another one shouldn't be too far away. But whatever, it works). On unix systems you probably know of a compression utility to convert to bz2; on windows you'll need 7zip.

Download the dump and open it in 7zip, then extract the xml file inside. Create a new archive in 7zip and make sure the archive format is bzip2. Add the dump xml to this archive (not the gz file you downloaded, but the xml file itself).

You need wikitaxi (http://www.wikitaxi.org/delphi/doku.php/products/wikitaxi/index#import_the_xml_dump_into_a_wikitaxi_database) to read the dump. Taxi leaves some things to be desired and it's a bit rudimentary, but it's basically the only program available that's capable of doing this job. Extract wikitaxi into any folder of your choice (it's a self contained prog and needs no installation) then run the wikitaxi importer. Select the bz2 archive you just made with 7zip. Wikitaxi importer will convert the bz2 archive into a .taxi file that wikitaxi uses to interpret the wiki database, so you also need to tell the importer where to output the taxi file. Then run the importer to create the taxi file itself.

Finally, open up wikitaxi and go to options, then select "open .taxi file" and select the taxi file you just created. You will now have access to the DF wiki offline! You can search the wiki using the bar at the top of wikitaxi window. Because the DF wiki is so reliant on redirecting general links to a particular namespace (for example when you go to wiki/noble, it's actually going to redirect you to wiki/DF2012:Noble), those redirects don't seem to play perfectly with wikitaxi, so you'll have to rely on the search bar for most of your reading. For example, if you're looking for the nobles page, your best bet is to search DF2012:noble. Nevertheless, I can confirm that it works!

EDIT: some alternatives to taxi can be found here (http://en.wikipedia.org/wiki/Wikipedia:Database_download#Dynamic_HTML_generation_from_a_local_XML_database_dump) but I have yet to try any of them. In general most of these programs listed are server frontends for the mediawiki software, so they're not exactly introductory to install. I have a bit of experience with XAMPP so I could try most of them (I bet I could even load up mediawiki on a local server and just run the dump out of that somehow), but wikitaxi is still prob the easiest solution since it wraps the code in a pretty little reader interface.
Title: Re: Obtaining a dump of the wiki
Post by: Quietust on May 03, 2012, 07:51:05 pm
If wikitaxi needs the file to be bzip2-compressed, I'm sure Locriani can switch the daily process to use bzip2 instead of gzip - it'll probably be smaller, too, saving a bit of bandwidth.
Title: Re: Obtaining a dump of the wiki
Post by: Locriani on May 04, 2012, 09:17:02 am
It turns out we have a problem with our toolserver, I reverted the dump back to a version from late march.  It will probably be a week before I can get it corrected again.

Dump is back up at http://dwarffortresswiki.org/images/dump.xml.bz2 (http://dwarffortresswiki.org/images/dump.xml.bz2)
Title: Re: Obtaining a dump of the wiki
Post by: umaxtu on May 05, 2012, 03:40:10 pm
I found another program like WikiTaxi that looks better called Kiwix. Problem is it only accepts .zim dumps. I'm currently setting up a local MediaWiki with the dump of this wiki and will use the Collection Extension to convert it into a .zim dump. I will report back here with results.
Title: Re: Obtaining a dump of the wiki
Post by: telamon on May 05, 2012, 03:50:15 pm
It looks really good, but I'm personally too lazy to change my dump format. Format implementation details here (http://www.openzim.org/Main_Page) for anyone who is interested
Title: Re: Obtaining a dump of the wiki
Post by: NYDwarf on July 10, 2012, 12:18:30 pm
Did you ever make the .zim file?

Thanks!
Title: Re: Obtaining a dump of the wiki
Post by: xaldin on August 12, 2012, 02:00:17 pm
So far looks like no ZIM file. Kinda unfortunate given that the only tools that I can find to read the current format cannot follow a link/redirect. Given the extreme redirect nature of the wiki with all the version namespaces it makes it very frustrating to use.
Title: Re: Obtaining a dump of the wiki
Post by: Dr. Hellno on September 13, 2012, 02:14:46 am
If WikiTaxi is giving you redirect problems, I have an imperfect suggestion. Unzip the bz2 file, and open the xml in notepad or whatever. Then find-replace all instances of
"#REDIRECT [[cv"
with
"#REDIRECT [[DF2012"
(don't include the quotation marks)
then zip that sucker back up with 7zip or whatever you like.

WikiTaxi handles redirects fine, just not that "CV" part. I guess it's a variable that the DF Wiki replaces with the current version, DF2012 at the moment. Wikitaxi just reads it as a non-existent namespace.
(I know nothing about wikis. what's a namespace even?)

This should fix most (though not all) broken redirects, and makes browsing a hell of a lot more natural.
Tabbed browsing would be so nice though. But I guess that's just greedy.
Title: Re: Obtaining a dump of the wiki
Post by: Nosthula on October 10, 2012, 11:26:18 am
The current dump is for 34.07.  Is there a plan to update the dump files to the current release?
Title: Re: Obtaining a dump of the wiki
Post by: katwithk on October 12, 2012, 07:44:01 am
I used HTTrack before reading this thread because I couldn't figure out the whole dumps thing.

(I often have no intertron access, which is a recent thing so I've been exploring offline computing. I've also begun packing DF onto a flashdrive to play at work. It's been slow at work.)

The wiki is bigger than I expected, so it took forever (HTTrack downloads at a low bitrate, I guess)
1.97 GB after cutting out user pages and pages from the previous versions.

All the internal links seem to work great, however, the search does not work at all - which is a little frustrating.

Being as I burned so much time and bandwidth on this download, I would like to be able to fix that if anyone knows how - otherwise I suppose the only option is this Wikitaxithing which unfortunately relies on waiting to be updated to stay current. (edit)I think I read it updates every day all automaticlike(/edit)
Title: Re: Obtaining a dump of the wiki
Post by: katwithk on October 19, 2012, 08:29:29 am
So wikitaxi is a much less frustrating, much more practical, and much smaller solution to offlien wiki-browsing than HTTrack.

However
(there is always a but, isn't there?)

The xml dump seems to be for last version's wiki.

Which, for most purposes, is fine - but I would like to be operating on intel I am sure is as close to accurate as I can get it.

So I guess my question here is thus: Who do I whine at incessantly to get an updated dump?
Title: Re: Obtaining a dump of the wiki
Post by: Locriani on October 27, 2012, 02:06:41 pm
Are you sure? The dump is autogenerated daily and the one I just downloaded has all the main namespaces included
Title: Re: Obtaining a dump of the wiki
Post by: katwithk on October 29, 2012, 07:38:02 am
I'll redump, but at the time of that posting the dump I received was for .31.25, not .34.xx

EDIT:

Not only does my new dump seem to be STILL .31.25 (as a search for 'minecart' or 'wheelbarrow' yieds nothing), but I also have to fix all my redirects again. Thanks Locriani.
Title: Re: Obtaining a dump of the wiki
Post by: Locriani on October 30, 2012, 09:39:57 am
There appears to be a problem with the process that moves completed dumps over to the download location.  I manually initiated the process, but I will have to find a better solution later this week when I have a chance.

I don't really appreciate the snark - the wiki has and will be a free resource that I devote significant amounts of my own time and money to maintain.  I understand it's frustrating, but you could have explained what had occurred without pissing me off in the process.
Title: Re: Obtaining a dump of the wiki
Post by: Locriani on October 30, 2012, 09:44:43 am
To be clear - the dump link is current again for today.  Automatic updates are on hold until I figure out why the update task is not atomically moving the files in question, but should be fixed when I have time and Emily's assistance in figuring out what borked.
Title: Re: Obtaining a dump of the wiki
Post by: katwithk on October 30, 2012, 09:48:26 am
*sigh* my apologies for any snarkiness.
Title: Re: Obtaining a dump of the wiki
Post by: Locriani on October 30, 2012, 09:51:05 am
It's ok.  For what it's worth, I am sorry you are having issues at all with this.  Also, I may attempt to include your find-replace suggestion as an automagic processing step on the dump here as soon as I can figure out what the file mv issue is >.>
Title: Re: Obtaining a dump of the wiki
Post by: katwithk on October 30, 2012, 09:56:29 am
You really are doing good work. I've just been trying to get a lot of both DF and non related things working and have been hitting my head against a lot of walls. It come out unintentionally sometimes :/
Title: Re: Obtaining a dump of the wiki
Post by: Babarix on October 30, 2012, 06:13:36 pm
I was looking for a offline Version and found this thread.
And i got a simple Idea.
A copy the DF Wiki Database running on your own local Wiki with XAMPP.

Ideally you just need to download a *.7zip and run a *.bat an you can access the wiki on your local machine.

I got everything set up, i just need the Database.
So is it possible to get a MySql dump from the wiki?
Title: Re: Obtaining a dump of the wiki
Post by: Nosthula on October 31, 2012, 09:50:30 am
To be clear - the dump link is current again for today.  Automatic updates are on hold until I figure out why the update task is not atomically moving the files in question, but should be fixed when I have time and Emily's assistance in figuring out what borked.

Thank you so much for updating the dump!  :D
Title: Re: Obtaining a dump of the wiki
Post by: Locriani on November 02, 2012, 04:11:54 pm
I was looking for a offline Version and found this thread.
And i got a simple Idea.
A copy the DF Wiki Database running on your own local Wiki with XAMPP.

Ideally you just need to download a *.7zip and run a *.bat an you can access the wiki on your local machine.

I got everything set up, i just need the Database.
So is it possible to get a MySql dump from the wiki?
No, for several reasons.
1. We use Postgres.
2. There would have to be an extensive data sanitization step (email addresses, deleted pages, etc are all stored in cleartext).
3. We use a number of custom mods so setting up a media wiki clone is not trivial.
4. the size of the database dump is HUGE. The XML dump is small because it only nabs current revisions and skips some namespaces.  The database dump for backups runs approximately 20 gb of working space.
5. The current backup system uses PITR backup, so the data is literally useless without the exact same machine architecture and compiled Postgres version, with all the same do settings.
6. The number and size of images stored is excessive.  I wouldn't be able to prune to just images currently on a page, which already is about 1.5 gb of images.  We currently have something like 8gb of images stored.
Title: Re: Obtaining a dump of the wiki
Post by: MoonSheep on February 11, 2013, 10:38:39 pm
Any way to read a dump on an iPad?
Title: Re: Obtaining a dump of the wiki
Post by: rmblr on November 04, 2013, 05:28:30 am
Is it possible to get a dump of just a subset of the wiki?

Basically I'd like to get JUST the DF2012 reference pages+media without Talk pages, etc.
Title: Re: Obtaining a dump of the wiki
Post by: expwnent on July 03, 2014, 09:10:55 pm
I downloaded the link you put on the previous page, but when I import it into WikiTaxi almost everything seems to be missing. Am I doing something wrong? dfwiki.taxi is 20612 KB and dump.xml.bz2 is 11155KB. Is that right or am I missing some stuff? Or does it just not work with wikitaxi?

edit: I can get individual pages fine, but almost every link is broken and I have to search it manually for each page.
Title: Re: Obtaining a dump of the wiki
Post by: lethosor on July 03, 2014, 09:38:08 pm
The broken links are most likely a result of a couple extensions we use on the wiki, which WikiTaxi doesn't know how to handle, namely:
* Links in versioned namespaces link to pages in that namespace - e.g. a link to "iron" on any v0.34 page links to "v0.34:iron"
* Pages in the main namespace that don't exist, e.g. iron (http://dwarffortresswiki.org/index.php?title=Iron&redirect=no), are automatically redirected to the current version page
I'm assuming WikiTaxi, being unaware of the first change, tries to link to "iron" instead of "v0.34:iron", for example, and fails because it doesn't exist. I've never used WikiTaxi, but this should be fairly simple to fix if it's a browser-like application that supports Javascript. If not, there was someone else that posted a python-based implementation of the wiki, which we can try to adapt to use the XML dump and handle links correctly.

Edit: link (http://www.bay12forums.com/smf/index.php?topic=125494.0) (it's a couple months out of date, but I'll see if I can make it use the XML dump when I get a chance.)
Title: Re: Obtaining a dump of the wiki
Post by: expwnent on July 04, 2014, 05:10:06 am
The hyperlinks in that one don't work for me either in chrome or firefox.
Title: Re: Obtaining a dump of the wiki
Post by: lethosor on July 04, 2014, 08:44:11 am
Really? They work for me. It's a lot larger than the XML dump, however, so I'd like to find a better way (I thought someone had made an offline wiki viewer written in Python, but I can't seem to find it). 
Edit: here (http://sebsauvage.net/wiki/doku.php?id=df_portable_wiki).
Title: Re: Obtaining a dump of the wiki
Post by: expwnent on July 04, 2014, 05:46:09 pm
Code: [Select]
file:///C:/Users/myusername/Desktop/dfwiki/df_wiki_v01_DF2012/df_wiki_v01_DF2012/articles/a/b/o/Dwarf_Fortress:About.html
does not exist, but

Code: [Select]
file:///C:/Users/myusername/Desktop/dfwiki/df_wiki_v01_DF2012/df_wiki_v01_DF2012/articles/a/b/o/Dwarf_Fortress_About.html
does.
Title: Re: Obtaining a dump of the wiki
Post by: lethosor on July 04, 2014, 07:43:17 pm
Huh, it works for me with the colon. Might be a Windows-specific issue. I've had better luck with the Python-based one (it can use the most recent dump), although it doesn't support a lot of templates since it parses the wikitext itself. HTML dumps are probably more reliable, since the wiki uses a lot of unique extensions (AutoRedirect, DFRawFunctions, etc.) that confuse offline wiki programs, but they're harder to keep up-to-date due to their size and generation time.
Title: Re: Obtaining a dump of the wiki
Post by: Locriani on July 04, 2014, 08:21:19 pm
We tried HTML dumps at one point; they took something like 80 hours to generate the HTML dump for the wiki. We can't afford to have a server constantly spinning those dumps so we killed it in favor of the XML dump.
Title: Re: Obtaining a dump of the wiki
Post by: expwnent on July 05, 2014, 02:35:25 am
Oh, I completely understand. XML dumps are a better way of doing things. I just can't get it to work. The python version works well enough for my purposes. Thanks for your help.
Title: Re: Obtaining a dump of the wiki
Post by: utunnels on November 04, 2014, 10:02:49 pm
I tried xowa today and it looks really good.
Though there seem to be some problems parsing templates
I guess the dump is mising something?
--------------

Edit*

Never mind, I see the wiki is using some extensions so that makes sense.
Title: Re: Obtaining a dump of the wiki
Post by: xaldin on April 04, 2015, 11:09:08 pm
Has anyone found/use an offline wiki reader on the Ipad that works with the data from the DF wiki? I've been trying for ages to find a way to read the wiki while on planes/traveling/etc from my ipad.

Title: Re: Obtaining a dump of the wiki
Post by: lethosor on April 05, 2015, 08:22:22 am
Ramblurr came up with a way to generate HTML dumps, so we're working on getting that set up. I'm not sure if it'll work on mobile devices at this point, but it's possible.
Title: Re: Obtaining a dump of the wiki
Post by: rokoeh on August 04, 2016, 11:07:39 pm
So I tried to download the dump and use it with the Taxi but the links/search engine are broken (as reported before) and the dump version seems to be from DF 0.34.x


I looked at this topic: http://www.bay12forums.com/smf/index.php?topic=125494.0 , but the last post is from 2013...


Any news? There is how to get the DF wiki (I want the for DF 0.43.x) for offline reading with a doable size(up to 5 Gbyte in my case)?

What about http://www.httrack.com/?
Title: Re: Obtaining a dump of the wiki
Post by: Overspeculated on August 10, 2016, 11:43:42 am
Really? They work for me. It's a lot larger than the XML dump, however, so I'd like to find a better way (I thought someone had made an offline wiki viewer written in Python, but I can't seem to find it). 
Edit: here (http://sebsauvage.net/wiki/doku.php?id=df_portable_wiki).
Thanks for the link, Wiki-Taxi does not work on macintosh computers but this does.

However it is a bit broken. Are there any other options for me to open the xml dump? Or a way to convert it to a .zim file?