Bay 12 Games Forum

Finally... => Creative Projects => Topic started by: Parisbre56 on October 10, 2013, 11:45:43 am

Title: Bay 12 Thread Downloader and Filter: New version: 1/4/2014
Post by: Parisbre56 on October 10, 2013, 11:45:43 am
Here's a Java program that transcribes any thread of this forum into a single file for offline reading! It processes about 260 pages a minute. Time might vary depending on connection speed and site traffic, as well as your preferences (check the menu).

The program can also be ordered to only keep the posts of certain users, great for reading only the GM's posts or finding the Toad's posts in the Future of the Fortress thread.

The program can combine multiple threads into one, arranging the posts chronologically.

If the output file ends in .html, the program will split the output into multiple pages based on user preferences (check the menu).
If the output file ends in .txt, the program will keep the output as a single plain text file, great for reading in an e-book or for processing with regular expressions or programs like grep.

Here's the program: B12_PostProcessor.jar (http://goo.gl/x7obSt) (10 MB)
Should work on most common OSs (Windows, Linux, Mac) and architectures (x86, x86_64) with an up to date Java VM installed.

Spoiler: To do list (click to show/hide)

Here (http://goo.gl/gZLMEK)'s the source code if anyone wants to mess with it. (30 MB, Eclipse project)

Changelog:
1/4/2014:New version can combine multiple files into one based on time. Unfortunately, it can only process times in the %a %d-%m-%Y, %H:%M:%S format. So make sure to log in and change your date format to that if you want to use it until I get around and fix it.
28/3/2014: Ability to split output to multiple files. New options menu. Various bugfixes.
2/11/2013: New version can save output as a "lightweight" .txt file.
1/11/2013: Fixed an out of memory error that occurred in 32-bit windows JVMs when processing more than 1069 pages
23/10/2013: New version can login and can download images and forum theme images. It also utilizes multiple downloader threads to reduce download time. Finally, it has a better GUI.
11/10/2013: New version should work on most common OSs (Windows, Linux, Mac) and architectures (x86, x86_64) with an up to data Java VM installed.
10/10/2013: Made the program create a window that acts as a terminal. This means that you can now just double click the file instead of having to launch it from the terminal. Bad thing is, it only works on 64-bit Linux now. I'll fix that tomorrow. Should be an easy fix (famous last words).
Title: Re: Bay 12 Thread Filter Thingy
Post by: Xantalos on October 10, 2013, 12:17:10 pm
Hells yeahs PTW.
Title: Re: Bay 12 Thread Filter Thingy
Post by: miauw62 on October 10, 2013, 12:46:04 pm
Does this work for all Simple-Machine powered forums? (or could it be editted to do so?)

Either way, this is fucking awesome.
Title: Re: Bay 12 Thread Filter Thingy
Post by: Lectorog on October 10, 2013, 12:51:26 pm
Man oh man. This has so much impractical use. PTW. I'll be actually using it if optional theme formatting and image downloading is implemented.
Title: Re: Bay 12 Thread Filter Thingy
Post by: Parisbre56 on October 10, 2013, 04:06:53 pm
Does this work for all Simple-Machine powered forums? (or could it be edited to do so?)
Theoretically, with a bit of modifying, yeah, but I haven't tried it yet.

Changed the first post.

EDIT: A new version is up!

EDIT2: Updated with some speed ups and bug fixing.
Title: Re: Bay 12 Thread Downloader and Filter
Post by: kisame12794 on October 10, 2013, 07:55:06 pm
You glorious bastard. PTW.
Title: Re: Bay 12 Thread Downloader and Filter
Post by: Parisbre56 on October 11, 2013, 07:19:39 am
New version is up. It should now work on most common OSs (Windows, Linux, Mac) and architectures (x86, x86_64) with an up to data Java VM installed.
Tested it on 64-bit Windows and Linux. Tell me if it doesn't work on your system and I'll see what I can do.

It took some messing around with ANT and the help of those two pages:
http://mchr3k.github.io/swtjar/ (http://mchr3k.github.io/swtjar/)
http://timeme.eclipselabs.org.codespot.com/git.wiki/SWT.wiki (http://timeme.eclipselabs.org.codespot.com/git.wiki/SWT.wiki)
Check them out if you're planning on creating a cross platform Java app that uses SWT to generate its interface. Or just copy my source code and rewrite the Main class to suit your needs.

Now to start rewriting the program to make it "smarter".
Title: Re: Bay 12 Thread Downloader and Filter
Post by: Person on October 11, 2013, 04:37:18 pm
I think I'll be having some forum games to read through once this thing does images.
Title: Re: Bay 12 Thread Downloader and Filter
Post by: Elephant Parade on October 13, 2013, 07:17:38 pm
This does look interesting.

Edit: Now to use this to read the Homestuck thread. Except not, because that would take far too long. I like how it preserves the formatting.
Title: Re: Bay 12 Thread Downloader and Filter
Post by: Parisbre56 on October 14, 2013, 12:16:39 pm
Spoiler (click to show/hide)
Version 2 is coming along nicely. At this rate, I should be ready for a release near the end of the week or sometime in the next week. Most of my work so far has gone to designing a simple GUI and a login system. Next up is upgrading the downloader itself to download images, downloading through multiple connections to increase download speed and making it smarter so that it can filter messages according to their content.

Speaking of filters, as of right now, I'm working on two filters. One that removes out of character content by removing any text enclosed in (( )) and another that checks for types of text like Italics, Bold, Underlined, Coloured, etc. Any ideas for other filters you would like to have?
Title: Re: Bay 12 Thread Downloader and Filter
Post by: sjm9876 on October 14, 2013, 12:19:44 pm
This looks very interesting. PTW
Title: Re: Bay 12 Thread Downloader and Filter
Post by: miauw62 on October 14, 2013, 12:20:32 pm
Removal of all the shit inbetween posts. Just a name and a link to the profile, maybe an avatar instead of a name, an image, contact details, the topic reply name, the date of the reply, the signature, the little report to moderator and IP logged thingie in the bottom left etc etc.
Title: Re: Bay 12 Thread Downloader and Filter
Post by: Armok on October 15, 2013, 01:04:29 am
This is great

If we're making wishlists, what I could really use is if it striped away all the formatting and signatures and junk and outputted a plain textfile viewable on kindle. Just name, post date, main text of post, repeat.
Title: Re: Bay 12 Thread Downloader and Filter
Post by: LordSlowpoke on October 15, 2013, 05:39:30 am
PTW
Title: Re: Bay 12 Thread Downloader and Filter
Post by: Parisbre56 on October 21, 2013, 04:36:01 pm
Bug squashing took more time than expected (mostly due to personal issues and my lack of experience with working with SWT), but lo and behold:
Spoiler (click to show/hide)
A speed of about 260 pages per minutes (which translates to 3800 posts per minute with 15 posts per page, speed should increase with higher settings) with 6 threads downloading and processing data (I'll probably add an option to increase the number of threads).
Now all that's left is the ability to download images and it'll be ready for the next release.
Title: Re: Bay 12 Thread Downloader and Filter: New version: 23/10/2013
Post by: Parisbre56 on October 23, 2013, 11:42:46 am
New version is up. It downloads images (both those posted by users and the images the forum theme uses), it can login, it's about 5 times faster than the old one and it has a better GUI.

Please tell me if it doesn't work for you or if you have any questions/suggestions.

EDIT: Turns out it doesn't work in Windows again. Probably screwed something up in the file names. I'll fix it tomorrow.
Title: Re: Bay 12 Thread Downloader and Filter: New version: 23/10/2013
Post by: Parisbre56 on October 28, 2013, 07:36:56 pm
Found the source of my Windows problems. Turns out java, for some strange reason, has decided that the 32-bit version of the JVM must have a standard heap size of 250MB. And guess which version of the JVM gets automatically installed in any Windows machine, both 32 and 64 bit? Yep, you guessed it, the 32-bit one.

This means that as soon as the program starts using more than 250 MB of memory, it crashes, unless you've downloaded and installed the 64-bit version for some reason. This means that you're fine as long as the pages of the thread are less than 1069 (in 15 posts per page mode).

Java itself gives you no way to alter the heap size from inside the program itself. I could tell you "You can use the command line to increase your heap size", but that reduces the ease of use, which was the whole reason for making a GUI. I can create different executable for each OS, but that defeats the purpose of using Java to make this cross platform. So it seems like the only thing I can do now is rewrite the program to download the data into temp files.

Yay, problems caused by arbitrary limits. At least I am learning stuff.
Title: Re: Bay 12 Thread Downloader and Filter: New version: 1/11/2013
Post by: Parisbre56 on October 31, 2013, 08:29:15 pm
Added the use of temp files to solve the memory problems. It should be working fine now.
Title: Re: Bay 12 Thread Downloader and Filter: New version: 1/11/2013
Post by: Xantalos on October 31, 2013, 08:43:57 pm
Eeeexcellent. Time to read through ER.
Title: Re: Bay 12 Thread Downloader and Filter: New version: 2/11/2013
Post by: Parisbre56 on November 01, 2013, 03:47:56 pm
New version is up. It can save output as a "lightweight" .txt file, perfect for devices with low memory or e-book readers.
Either click "Browse" and select the "Plain Text" option or change the output file's extension from ".html" to ".txt" before pressing start.

I had some problems with quotes and spoilers, but I think I solved them now. Tell me if you notice anything wrong.
And tell me if you have any suggestions about the plain text output (more/less info, options you'd like to have, etc).

I've noticed that some output files are just too large, so I've added the ability to separate a file into multiple files to my long term goals.
Title: Re: Bay 12 Thread Downloader and Filter: New version: 2/11/2013
Post by: Person on December 17, 2013, 09:15:06 pm
Are you still working on this?
Title: Re: Bay 12 Thread Downloader and Filter: New version: 2/11/2013
Post by: Parisbre56 on December 18, 2013, 07:35:20 am
Are you still working on this?
Sometimes, although not as much as I'd like to. Got some work done with altering the page a bit (removing sigs and avatars and stuff) but I haven't finished it yet, that's why I haven't released an update.

Any specific feature you really want to see? If it's not something too complicated I could get it done in a week or two.
Title: Re: Bay 12 Thread Downloader and Filter: New version: 2/11/2013
Post by: Person on December 18, 2013, 05:05:51 pm
I don't have anything in mind other than what you're already working on, but there might be a few people that would prefer not having the images, most likely for bandwidth reasons. Having a toggle for that in new versions might be nice.

Edit: Thought of something. When downloading larger threads, having to scroll down forever is somewhat of a problem, especially when it comes to remembering where you left off. Having a way to split the threads back into pages would be great if possible, ideally in a customizable way.
Title: Re: Bay 12 Thread Downloader and Filter: New version: 2/11/2013
Post by: Fayrik on December 24, 2013, 10:00:58 pm
I remember reading this thread and pondering how useful this program could possibly be one day...

Here (https://docs.google.com/uc?export=download&id=0B_gVhTtpkIUUdEVBdDQ5bS1lNVE)'s the source code if anyone wants to mess with it. (10 MB, Eclipse project)
And now, I might just need this source code to make a branch program to spider search for rule violations on a forum I moderate.
If I succeed, I'll post up the branch here.
Title: Re: Bay 12 Thread Downloader and Filter: New version: 2/11/2013
Post by: Parisbre56 on December 25, 2013, 01:06:32 pm
I don't have anything in mind other than what you're already working on, but there might be a few people that would prefer not having the images, most likely for bandwidth reasons. Having a toggle for that in new versions might be nice.

Edit: Thought of something. When downloading larger threads, having to scroll down forever is somewhat of a problem, especially when it comes to remembering where you left off. Having a way to split the threads back into pages would be great if possible, ideally in a customizable way.
Yeah, the first one is easy, just changing one line of code into another. I'll add a button to do that in the next version.
The second one, I've been thinking about this myself. It's next on my list. I'll get working on it after the holidays.

I remember reading this thread and pondering how useful this program could possibly be one day...

Here (https://docs.google.com/uc?export=download&id=0B_gVhTtpkIUUdEVBdDQ5bS1lNVE)'s the source code if anyone wants to mess with it. (10 MB, Eclipse project)
And now, I might just need this source code to make a branch program to spider search for rule violations on a forum I moderate.
If I succeed, I'll post up the branch here.
Just want to warn you that at best, what I have here could be used as an inspiration, if that. I don't think it will be much help to you at its current state. I've written this with the philosophy of "get it working and get it out there as soon as possible" not "make the code easy to understand and extend".

If you have access to the forum's database, I suggest using that to do your search. If it uses something like SQL, it should be much much easier.
I can also give you a list of all of the tools and packages I used, if you're interested. If I had those tools from the start, I might had done a better job. I'll probably rewrite it completely if I find some time.
Title: Re: Bay 12 Thread Downloader and Filter: New version: 28/3/2014
Post by: Parisbre56 on March 28, 2014, 05:56:31 pm
New version can split the output into multiple files.

Simply click the "Stuff" menu and then the "Other Options" menu item. The posts per page is the option you want.
Option only works for html output. Plain text output is still one huge file. I should probably add an option to enable it or disable it.

You can also mess with the other options there if you want. Although the checkboxes there don't do anything yet.


Stopped the program from failing when downloading image links with no file type (like this one: https://i.chzbgr.com/maxW500/7047413504/h45184DA5/ (https://i.chzbgr.com/maxW500/7047413504/h45184DA5/))
Fixed a divided by zero error that happened in some rare cases.



Title: Re: Bay 12 Thread Downloader and Filter: New version: 1/4/2014
Post by: Parisbre56 on March 31, 2014, 07:31:37 pm
New version can combine multiple files into one and order the posts based on time.
Unfortunately, it can only process times in the %a %d-%m-%Y, %H:%M:%S format.

So make sure to log in and change your date format to that if you want to use it until I get around to fixing it.
The time format setting is in Profile -> Look and Layout.

Here (http://goo.gl/cukFqQ)'s the entire ER subforum (http://www.bay12forums.com/smf/index.php?board=30.0), if you wanna see what the output looks like.
It's about 100 MBs and exactly 1000 pages long in 50 posts per page format.