Bay 12 Games Forum
Dwarf Fortress => DF Gameplay Questions => DF Wiki Discussion => Topic started by: telamon on February 18, 2012, 01:35:42 am
-
I spend a lot of time without internet access, and DF is such a hardcore game that I find myself severely hobbled without access to the glorious wiki to provide me with the raw information I need. I'm almost unable to play DF without either immediate access to the wiki, or printouts of the pages I require. Since I'm getting back into the game right after a new version's release (and I stopped playing around 31.18, so there are a lot of features I have yet to learn), I'm in constant need of the wiki to familiarize myself, so I basically can't play DF offline.
Wikipedia has the facility to dump (http://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia) a massive number of pages, minus talks, users and old revisions, to a reader's hard drive, and the WikiTaxi (http://www.wikitaxi.org/delphi/doku.php/products/wikitaxi/index) tool is capable of interpreting those dumps (ie the mediawiki database format) and displaying them in a convenient reader for offline use. This would totally solve my problem, and we can assume that I have enough hard drive space to handle the volume. Is it at all possible to get this dump service for an entire namespace of the DF wiki, or is that too big a task for the admin team to worry about right now?
-
We recently added a toolserver to the mix. I can't promise when it will get done, but we will look into providing automated dumps for the wiki for download, as soon as we get image uploads working again.
-
A dump of the wiki would be nice. 8) But its not urgent so no rush :D
Edit: What is a toolserver? (Just curious)
-
I think this is a pretty good idea as well, would prevent me from having to go to each individual page and then save it via CTRL+S and save as a whole webpage... I wonder though, would links be functional between pages within the dumped wiki?
-
Edit: What is a toolserver? (Just curious)
Server hosting bots and various useful tools. Description of Wikipedia toolserver: http://en.wikipedia.org/wiki/Wikipedia:Toolserver
-
Server hosting bots and various useful tools. Description of Wikipedia toolserver: http://en.wikipedia.org/wiki/Wikipedia:Toolserver
Thank you. :D
-
Is this a possible thing to do now? I don't really know much about wikis, so sorry if it's still not an implemented feature.
-
afaik, it's very poor etiquette to spider a website and download all the pages manually since you take up a lot of the site's bandwidth if it's big (tools exist that can do that, check out httrack or the classic wget). since the wiki, to the best of my knowledge, has not yet added a page to access this feature, i can only assume that it's not yet available.
-
Your wish is my command.
http://dwarffortresswiki.org/images/dump.xml.gz (http://dwarffortresswiki.org/images/dump.xml.gz)
http://dwarffortresswiki.org/images/dump.xml.bz2 (http://dwarffortresswiki.org/images/dump.xml.bz2)
This dump is automagically updated daily.
-
Thank you so much! Works perfectly with wikitaxi.
Note for other people trying to use the dump: because it's gzipped, wikitaxi cannot interpret it (I was surprised that it couldn't... really if you can support one of the archive formats, another one shouldn't be too far away. But whatever, it works). On unix systems you probably know of a compression utility to convert to bz2; on windows you'll need 7zip.
Download the dump and open it in 7zip, then extract the xml file inside. Create a new archive in 7zip and make sure the archive format is bzip2. Add the dump xml to this archive (not the gz file you downloaded, but the xml file itself).
You need wikitaxi (http://www.wikitaxi.org/delphi/doku.php/products/wikitaxi/index#import_the_xml_dump_into_a_wikitaxi_database) to read the dump. Taxi leaves some things to be desired and it's a bit rudimentary, but it's basically the only program available that's capable of doing this job. Extract wikitaxi into any folder of your choice (it's a self contained prog and needs no installation) then run the wikitaxi importer. Select the bz2 archive you just made with 7zip. Wikitaxi importer will convert the bz2 archive into a .taxi file that wikitaxi uses to interpret the wiki database, so you also need to tell the importer where to output the taxi file. Then run the importer to create the taxi file itself.
Finally, open up wikitaxi and go to options, then select "open .taxi file" and select the taxi file you just created. You will now have access to the DF wiki offline! You can search the wiki using the bar at the top of wikitaxi window. Because the DF wiki is so reliant on redirecting general links to a particular namespace (for example when you go to wiki/noble, it's actually going to redirect you to wiki/DF2012:Noble), those redirects don't seem to play perfectly with wikitaxi, so you'll have to rely on the search bar for most of your reading. For example, if you're looking for the nobles page, your best bet is to search DF2012:noble. Nevertheless, I can confirm that it works!
EDIT: some alternatives to taxi can be found here (http://en.wikipedia.org/wiki/Wikipedia:Database_download#Dynamic_HTML_generation_from_a_local_XML_database_dump) but I have yet to try any of them. In general most of these programs listed are server frontends for the mediawiki software, so they're not exactly introductory to install. I have a bit of experience with XAMPP so I could try most of them (I bet I could even load up mediawiki on a local server and just run the dump out of that somehow), but wikitaxi is still prob the easiest solution since it wraps the code in a pretty little reader interface.
-
If wikitaxi needs the file to be bzip2-compressed, I'm sure Locriani can switch the daily process to use bzip2 instead of gzip - it'll probably be smaller, too, saving a bit of bandwidth.
-
It turns out we have a problem with our toolserver, I reverted the dump back to a version from late march. It will probably be a week before I can get it corrected again.
Dump is back up at http://dwarffortresswiki.org/images/dump.xml.bz2 (http://dwarffortresswiki.org/images/dump.xml.bz2)
-
I found another program like WikiTaxi that looks better called Kiwix. Problem is it only accepts .zim dumps. I'm currently setting up a local MediaWiki with the dump of this wiki and will use the Collection Extension to convert it into a .zim dump. I will report back here with results.
-
It looks really good, but I'm personally too lazy to change my dump format. Format implementation details here (http://www.openzim.org/Main_Page) for anyone who is interested
-
Did you ever make the .zim file?
Thanks!
-
So far looks like no ZIM file. Kinda unfortunate given that the only tools that I can find to read the current format cannot follow a link/redirect. Given the extreme redirect nature of the wiki with all the version namespaces it makes it very frustrating to use.
-
If WikiTaxi is giving you redirect problems, I have an imperfect suggestion. Unzip the bz2 file, and open the xml in notepad or whatever. Then find-replace all instances of
"#REDIRECT [[cv"
with
"#REDIRECT [[DF2012"
(don't include the quotation marks)
then zip that sucker back up with 7zip or whatever you like.
WikiTaxi handles redirects fine, just not that "CV" part. I guess it's a variable that the DF Wiki replaces with the current version, DF2012 at the moment. Wikitaxi just reads it as a non-existent namespace.
(I know nothing about wikis. what's a namespace even?)
This should fix most (though not all) broken redirects, and makes browsing a hell of a lot more natural.
Tabbed browsing would be so nice though. But I guess that's just greedy.
-
The current dump is for 34.07. Is there a plan to update the dump files to the current release?
-
I used HTTrack before reading this thread because I couldn't figure out the whole dumps thing.
(I often have no intertron access, which is a recent thing so I've been exploring offline computing. I've also begun packing DF onto a flashdrive to play at work. It's been slow at work.)
The wiki is bigger than I expected, so it took forever (HTTrack downloads at a low bitrate, I guess)
1.97 GB after cutting out user pages and pages from the previous versions.
All the internal links seem to work great, however, the search does not work at all - which is a little frustrating.
Being as I burned so much time and bandwidth on this download, I would like to be able to fix that if anyone knows how - otherwise I suppose the only option is this Wikitaxithing which unfortunately relies on waiting to be updated to stay current. (edit)I think I read it updates every day all automaticlike(/edit)
-
So wikitaxi is a much less frustrating, much more practical, and much smaller solution to offlien wiki-browsing than HTTrack.
However
(there is always a but, isn't there?)
The xml dump seems to be for last version's wiki.
Which, for most purposes, is fine - but I would like to be operating on intel I am sure is as close to accurate as I can get it.
So I guess my question here is thus: Who do I whine at incessantly to get an updated dump?
-
Are you sure? The dump is autogenerated daily and the one I just downloaded has all the main namespaces included
-
I'll redump, but at the time of that posting the dump I received was for .31.25, not .34.xx
EDIT:
Not only does my new dump seem to be STILL .31.25 (as a search for 'minecart' or 'wheelbarrow' yieds nothing), but I also have to fix all my redirects again. Thanks Locriani.
-
There appears to be a problem with the process that moves completed dumps over to the download location. I manually initiated the process, but I will have to find a better solution later this week when I have a chance.
I don't really appreciate the snark - the wiki has and will be a free resource that I devote significant amounts of my own time and money to maintain. I understand it's frustrating, but you could have explained what had occurred without pissing me off in the process.
-
To be clear - the dump link is current again for today. Automatic updates are on hold until I figure out why the update task is not atomically moving the files in question, but should be fixed when I have time and Emily's assistance in figuring out what borked.
-
*sigh* my apologies for any snarkiness.
-
It's ok. For what it's worth, I am sorry you are having issues at all with this. Also, I may attempt to include your find-replace suggestion as an automagic processing step on the dump here as soon as I can figure out what the file mv issue is >.>
-
You really are doing good work. I've just been trying to get a lot of both DF and non related things working and have been hitting my head against a lot of walls. It come out unintentionally sometimes :/
-
I was looking for a offline Version and found this thread.
And i got a simple Idea.
A copy the DF Wiki Database running on your own local Wiki with XAMPP.
Ideally you just need to download a *.7zip and run a *.bat an you can access the wiki on your local machine.
I got everything set up, i just need the Database.
So is it possible to get a MySql dump from the wiki?
-
To be clear - the dump link is current again for today. Automatic updates are on hold until I figure out why the update task is not atomically moving the files in question, but should be fixed when I have time and Emily's assistance in figuring out what borked.
Thank you so much for updating the dump! :D
-
I was looking for a offline Version and found this thread.
And i got a simple Idea.
A copy the DF Wiki Database running on your own local Wiki with XAMPP.
Ideally you just need to download a *.7zip and run a *.bat an you can access the wiki on your local machine.
I got everything set up, i just need the Database.
So is it possible to get a MySql dump from the wiki?
No, for several reasons.
1. We use Postgres.
2. There would have to be an extensive data sanitization step (email addresses, deleted pages, etc are all stored in cleartext).
3. We use a number of custom mods so setting up a media wiki clone is not trivial.
4. the size of the database dump is HUGE. The XML dump is small because it only nabs current revisions and skips some namespaces. The database dump for backups runs approximately 20 gb of working space.
5. The current backup system uses PITR backup, so the data is literally useless without the exact same machine architecture and compiled Postgres version, with all the same do settings.
6. The number and size of images stored is excessive. I wouldn't be able to prune to just images currently on a page, which already is about 1.5 gb of images. We currently have something like 8gb of images stored.
-
Any way to read a dump on an iPad?
-
Is it possible to get a dump of just a subset of the wiki?
Basically I'd like to get JUST the DF2012 reference pages+media without Talk pages, etc.
-
I downloaded the link you put on the previous page, but when I import it into WikiTaxi almost everything seems to be missing. Am I doing something wrong? dfwiki.taxi is 20612 KB and dump.xml.bz2 is 11155KB. Is that right or am I missing some stuff? Or does it just not work with wikitaxi?
edit: I can get individual pages fine, but almost every link is broken and I have to search it manually for each page.
-
The broken links are most likely a result of a couple extensions we use on the wiki, which WikiTaxi doesn't know how to handle, namely:
* Links in versioned namespaces link to pages in that namespace - e.g. a link to "iron" on any v0.34 page links to "v0.34:iron"
* Pages in the main namespace that don't exist, e.g. iron (http://dwarffortresswiki.org/index.php?title=Iron&redirect=no), are automatically redirected to the current version page
I'm assuming WikiTaxi, being unaware of the first change, tries to link to "iron" instead of "v0.34:iron", for example, and fails because it doesn't exist. I've never used WikiTaxi, but this should be fairly simple to fix if it's a browser-like application that supports Javascript. If not, there was someone else that posted a python-based implementation of the wiki, which we can try to adapt to use the XML dump and handle links correctly.
Edit: link (http://www.bay12forums.com/smf/index.php?topic=125494.0) (it's a couple months out of date, but I'll see if I can make it use the XML dump when I get a chance.)
-
The hyperlinks in that one don't work for me either in chrome or firefox.
-
Really? They work for me. It's a lot larger than the XML dump, however, so I'd like to find a better way (I thought someone had made an offline wiki viewer written in Python, but I can't seem to find it).
Edit: here (http://sebsauvage.net/wiki/doku.php?id=df_portable_wiki).
-
file:///C:/Users/myusername/Desktop/dfwiki/df_wiki_v01_DF2012/df_wiki_v01_DF2012/articles/a/b/o/Dwarf_Fortress:About.html
does not exist, but
file:///C:/Users/myusername/Desktop/dfwiki/df_wiki_v01_DF2012/df_wiki_v01_DF2012/articles/a/b/o/Dwarf_Fortress_About.html
does.
-
Huh, it works for me with the colon. Might be a Windows-specific issue. I've had better luck with the Python-based one (it can use the most recent dump), although it doesn't support a lot of templates since it parses the wikitext itself. HTML dumps are probably more reliable, since the wiki uses a lot of unique extensions (AutoRedirect, DFRawFunctions, etc.) that confuse offline wiki programs, but they're harder to keep up-to-date due to their size and generation time.
-
We tried HTML dumps at one point; they took something like 80 hours to generate the HTML dump for the wiki. We can't afford to have a server constantly spinning those dumps so we killed it in favor of the XML dump.
-
Oh, I completely understand. XML dumps are a better way of doing things. I just can't get it to work. The python version works well enough for my purposes. Thanks for your help.
-
I tried xowa today and it looks really good.
Though there seem to be some problems parsing templates
I guess the dump is mising something?
--------------
Edit*
Never mind, I see the wiki is using some extensions so that makes sense.
-
Has anyone found/use an offline wiki reader on the Ipad that works with the data from the DF wiki? I've been trying for ages to find a way to read the wiki while on planes/traveling/etc from my ipad.
-
Ramblurr came up with a way to generate HTML dumps, so we're working on getting that set up. I'm not sure if it'll work on mobile devices at this point, but it's possible.
-
So I tried to download the dump and use it with the Taxi but the links/search engine are broken (as reported before) and the dump version seems to be from DF 0.34.x
I looked at this topic: http://www.bay12forums.com/smf/index.php?topic=125494.0 , but the last post is from 2013...
Any news? There is how to get the DF wiki (I want the for DF 0.43.x) for offline reading with a doable size(up to 5 Gbyte in my case)?
What about http://www.httrack.com/?
-
Really? They work for me. It's a lot larger than the XML dump, however, so I'd like to find a better way (I thought someone had made an offline wiki viewer written in Python, but I can't seem to find it).
Edit: here (http://sebsauvage.net/wiki/doku.php?id=df_portable_wiki).
Thanks for the link, Wiki-Taxi does not work on macintosh computers but this does.
However it is a bit broken. Are there any other options for me to open the xml dump? Or a way to convert it to a .zim file?