I was thinking that it would be best to keep the tile size the same. The reason is that we have the walls and other items that need to be continuous. We could take the outer pixels from the creature tile, but I have the impression that some artists use the entire tile. What I was thinking as an alternative is to use only the corners of the tile. 4corners*4channels(RGBA)*255 gives us plenty of room to encode information. What do you think?
I believe it is a bad idea because it becomes impossible to use those corners for actual graphics (how would you know what is supposed to be there?) - which is already going to cause issues with some of the examples you've already shown (e.g. reindeer in Spacefox).
Even if you cheated and said you only reserved the alpha channel for this information... that's still going to make the graphics visibly different, which is going to make it difficult to properly imagine how it's going to look.
I imagine that very few people will want to manually add the metadata we're talking about - they'd much rather draw the graphics, generate some raws, modify those, then run the process in reverse. They have the *option*, sure, since it's all just pixels with defined colors, but it's a bit abstract and not something I imagine most people are up for (except perhaps if they're just modifying an existing sprite for a one-off, in which case they can copy the border as well).
For the case where things have to join up... those also have to be able to join up with *themselves*, and you can't really check that anyway in a grid format like this.
The idea is that if you have, say, a 16x16 set of tiles, the generated image from all of this uses a grid of 17x17* - 1 pixel in each dimension is used for the border, and then the remaining 16 pixels are the actual tile graphics.
If the 16x16 tile graphics are completely blank - i.e., all pixels transparent - then we ignore that tile, because it's not specified, otherwise we cut out that 16x16 tile, assemble them all into whatever image layout the game wants.
The border is useful because it makes the separation between each tile more clear -
More clear for whom? It's trivial for a computer to just count the 16 pixels, it doesn't need to 'see'. And eventually the whole process from start to finish would be automated, so nobody else would need to distinguish the tiles either.
For a person who is using the grid to figure out what they need to create. It's just as trivial for the computer to count 17 (or 18) pixels, but it's much easier for a human to make sure they're "staying within the lines" if they have lines to follow for those cases where you're doing one-off edits; otherwise it might be difficult to see you're accidentally going one pixel into another tile (which just happened to be a transparent area, so you couldn't tell easily).
It's also a tiny bit easier for the computer when it needs to figure out whether to skip a given tile (because you're looking for a tile where all pixels are transparent, as opposed to pixels where some are and some aren't).
metadata embedding
I don't see the reasoning with that. Essentially, it seems that between the first input and the final output, you suggest to store all the information in the image - I'm wondering why not store all the information in a plain text database instead? That seems easier, more compatible and more transparent.
Taking the existing tilesets, separating them, adding a border with embedded data, then taking that and removing it again to use in DF seems a bit roundabout. Why not transfer the metadata directly in the final definition file?
EDIT: Just realized, the "datasets" I'm talking about, are the creature definition files.
You can certainly use a secondary file for the metadata if you'd prefer, and it'd probably be more efficient since you'd potentially be able to avoid stitching the image together - but it's my impression that for graphics sets, the amount of customization needed is going to be quite limited post-standardization anyway. You have to analyze the image anyway to tell whether or not a given tile is present (and therefore, whether it should be pointed to in the generated raws)... so it's not much extra effort to analyze a couple extra pixels to get the metadata you need at the same time.
You could also just choose to leave all metadata in the RAWs and only use this to unify the positions for a given creature. Depends what you want to be able to do, of course... but if that's your entire goal, then it seems overkill to add all of the grid labels.