Monthly ArchiveJuly 2007



beagle & soc07 26 Jul 2007 11:00 pm

Thunderbird backend is out!

So, here it is! I just committed the code and you can grab it with:

svn co http://svn.gnome.org/svn/beagle/branches/beagle-tbird-soc07

With this beagle should be able to index your data and put it in its right place. There are some issues still though:

  • Removing data in Thunderbird does not remove data from beagle (yet)
  • There are some encoding issues that I’m working on (you will most likely notice this)
  • beagle-search is not updated with the latest URI handlers, so you won’t be able to open your data (just search it)

Also, you need the mozilla-thunderbird-dev package to build beagle with the Thunderbird backend (the extension needs this). So make sure you install it before compiling. Check out this page for more information about compiling beagle (in case you never have done this before).

I just want to point out that you should not expect too much from this yet as I need a couple of days to clear things up. But I encourage you to try it out and come with input of course :-)

Note: I’m not uploading any more .xpi files as you will find your very own .xpi file in the thunderbird-extension directory once you’ve built beagle.

beagle & soc07 25 Jul 2007 11:42 pm

Backend is nearly complete

I’ve been working quite intense on the beagle backend the last two days and I come with good news: I’ve almost finished it :-) As of today it indexes emails and RSS content (the indexed data is of course searchable). Extra data, i.e. mail body or RSS description, is indexed as well so it’s possible to query that data too if it’s available.

There’s still a few things to fix but most things are there. I’ve made some updates to the extension as well. Found a few bugs. Had to add a few new properties to the metafiles for instance. Everything will be released tomorrow as I’m a bit tired right now. But you should be able to search your Thunderbird data as of tomorrow :-)

beagle & soc07 18 Jul 2007 06:12 pm

beagle Thunderbird extension is out!

As promised yesterday: here it is :-) If you intend to try this extension out, continue read this post as it contains a lot of important information. How the extension works will be explained to the extent needed when testing.

How it’s done (just the very basics)

For starters, this extension works similar to the Firefox extension. The extension itself is not aware of beagle. It does not communicate with the beagle daemon. Instead the extension produces small metafiles in a specially selected directory. This directory will be monitored by beagle and files stored/created in this directory will be parsed and indexed at some point by beagle.

The “destination directory”, to which the extension will write files, is stored inside the beagle Thunderbird index directory and is called ToIndex. Most users will find this directory in ~/.beagle/Indexes/ThunderbirdIndex/ToIndex. When new data is indexed, this is were it will be stored until beagle does its magic.

Keeping track of indexed data

Once data has been indexed it’s marked as indexed. This makes sure we don’t index the same data set twice and also speeds the process a lot. Speeding things up is especially important when using an extension since we can only index data while Thunderbird is running. The marking system used allows the extension to start of where it last left off. Once everything has been indexed a first time, then there’s only immediate updates. Which is nice. Another nice thing is that we can index data when beagle isn’t running.

But how does Thunderbird know that beagle has all of its data? Thunderbird can’t be sure. This can’t be totally ignored however (and it isn’t). I’ve put a small check into the extension that figures out if the ToIndex directory exists. If if does, then the extension will assume that everything is fine and continue with the indexing process normally. But if it doesn’t exist however, then all data is marked as “not indexed” before indexing. The directory will be created as well. This solves two of the bigger issues: when ~/.beagle or ~/.beagle/Indexes/ThunderbirdIndex are removed. Everything will be re-indexed in case you decide to do any of it.

Now it’s time to add a note about the meaning of “indexed”. It is very important to understand what it means in this context. What’s happening when something is indexed in Thunderbird is that these small metafiles are produced (as explained above) and they will be processed by beagle when beagle has time to do so. This could be within a few seconds but also a few minutes or even hours. The normal case will probably be within a few seconds/minutes once the initial indexing phase is over. The initial indexing phase ends once Thunderbird has created metafiles for all your data and beagle has indexed all of them. So, expect that it might take some time before things ends up in beagle before this phase has ended.

Note: There’s no Thunderbird backend in beagle yet, so no data will end up in beagle as of this moment. Above is just theory. Creating this backend is the next step and I will begin working on this within the next couple of days.

Using the extension

The extension adds a couple of things to the table once installed. It is automatically enabled and will begin to index your data and in most cases you don’t have to do anything. But there’s a few features built-in worth knowing about. Mainly privacy features.

In the right bottom corner you’ll see the famous beagle dog (if the installation was successful). It will indicate if the extension is enabled or disabled and you’ll clearly see what state you are currently in. Just click this icon if you want to enable or disable indexing.

You can right-click any folder and select Never index this folder if you want that folder never to be indexed. No data is removed from beagle when you do this, so you’ll manually have to remove anything already indexed. Just right-click the same folder and select Remove folder from index to do so. Be sure to answer No in the dialog window that pops up, as if you answer Yes the Never index this folder flag will be removed as well. This applies in general when removing content from the index. Options similar to these will be available for individual emails and others once I figure out how to do this (I’m having some problems with this overlay in particular).

Lastly, there’s a small settings dialog that you can use to change various settings. Just go to Tools->Beagle indexing settings to show it. You have three pages: Indexing, privacy and status. Here’s a small explanation of each page:

  • Indexing - You can use this page to enable or disable the indexing process. Just check or uncheck the check box. But more importantly: you can change the indexing speed from here. Just change to whatever suits your need. Beta testers should play around here a bit, more information at the end of the post.
  • Privacy - In case you want to disable an entire source, i.e. you don’t want to index any POP3 emails, you can do that from here. But you’ll also find some potential rescue options here too. In case you want to remove everything from beagle’s index, just press the Drop everything button. Note that the backend will immediately begin the indexing process after you’ve done this, so make sure you disable the indexing process before pressing the button if you don’t want anything to end up in beagle again. The Reset index status is quite useful if you want re-index everything without dropping things from beagle’s index.
  • Status - This page will display some information about the indexing process. Amount of items added and/or removed from beagle’s index will appear here as well as how many things that are currently queued up. You’ll also see if the extension is idle or if there’s more things to index. Great way to see if the initial indexing process has completed (Indexing status should say Idle).

Known issues

There are currently a few known issues that you guys don’t have to report as bugs:

  • The main loop is currently running at all time looking for data to index. This isn’t expensive in any way but it will make Thunderbird wake up a lot and adds to power consumption (laptop users)
  • The pages in the preference dialog might appear mixed up. I don’t know why this is happening because it shouldn’t. They all show up correctly in CVS version, so I’ll just hope everything works correct in the next Thunderbird version.
  • There’s currently no way of excluding individual items from the indexing process, like with the folders. This is of course planned but I just can’t get the menu items to show up.
  • When removing or unindexing content, a small window will pop-up and show the progress (since it can take a couple of seconds with a lot of content). Unfortunately this window isn’t threaded so it will only show up after everything is done (you might see a small window flash by right after installing the extension, that’s this window) and won’t show any progress.
  • Thunderbird currently lacks implementations for notifications about when folders are renamed and messages are removed from the Trash-folder. I cannot provide these feature as of today but David Bienvenu over at the mozilla project is looking into this and he’ll implement this some time (don’t know when, maybe it’s already implemented?).
  • No about box…

My intention is of course to fix all these issues but I will give them lower priority until the end of summer. There are more important things to deal with right now (like creating the backend so that data ends up in beagle at all).
Notes to testers (important)

In order to get this extension work, you’ll have to make sure that the ~/.beagle/Indexes/ThunderbirdIndex directory exists. It won’t start if it doesn’t. Just either create it with your favourite file manager or by typing the following command from a terminal (this directory will be created by beagle in the end, so it’s just the temporary solution):

mkdir -p ~/.beagle/Indexes/ThunderbirdIndex

The Error console is your friend and it’s the first place you should check out in case you are suspecting something is wrong. You’ll find it in the Tools menu. You can also enable the dump function which will print some things to the terminal. The easiest way to do that is to just open the Error console and paste the following line into the text box and pressing enter (this is one line):

Components.classes [’@mozilla.org/preferences;1′].getService (Components.interfaces.nsIPref).SetBoolPref (’browser.dom.window.dump.enabled’, true);

Note that you won’t get any notification about success here. When you want to disable this, do the same thing but change true at the end of the line to false. In order to see the messages you must run Thunderbird from a terminal. Just open a terminal and run thunderbird or mozilla-thunderbird (which it is depends on distribution). If you have downloaded Thunderbird from mozilla.org and run it from a standalone directory, just cd into that directory and type ./thunderbird to get yourself going.

A request from me to all testers is that I would like it if you tried out various indexing speed settings (found in the preference dialog). The values I’ve used are totally arbitrary and not good at all. I can for instance run the unrecommended setting Instant with no apparent CPU usage at all but it’s not a very fast setting (not as fast as the title claims to be at least). You can mixture with the settings by selecting Custom. These are the settings you’ll see when doing so:

  • Batch count - The amount of objects to process each time the main loop is working. You can think of it as: “every Batch delay process Batch count items”.
  • Queue count - Amount of items that needs to be in the queue before it automatically empties itself.
  • Batch delay - Described above. Measured in seconds.

Try out various settings and see what happens. You can use the Reset index status button in the Indexing page when you want to restart the entire indexing process (when trying settings out). The Status page will tell you when the everything is done.

Be sure to also check out the metafiles created to make sure they contain what they should (they are stored in the ToIndex directory mentioned earlier). An ordinary object looks something like this:

<MailMessage>
<Author>Some author (some.author@some.domain.com)</Author>
<Charset>ISO-8859-1</Charset>
<Date>1165271435</Date>
<Folder>Inbox</Folder>
<FolderURL>imap://user@server/INBOX</FolderURL>
<HasOffline>false</HasOffline>
<MessageId>some-id@some.domain.com</MessageId>
<MessageSize>4085</MessageSize>
<OfflineSize>0</OfflineSize>
<Recipients>Some user (some.user@some.domain.com)</Recipients>
<Subject>Some subject</Subject>
<Uri>imap-message://user@server/INBOX#1</Uri>
</MailMessage>

RSS feed entries looks exactly the same, but FeedItem is used instead of MailMessage to tell them apart. When removing a folder, it may look something like this:

<DeleteFolder>
<FolderURL>imap://user@server/INBOX</FolderURL>
</DeleteFolder>

Deleting messages looks similar too, but DeleteHdr is used instead of DeleteFolder and Uri is used instead of FolderURL. I can add that when removing everything in a folder, only one file (like the one above) will be created instead of one file for each object in the folder since this is much more efficient. You can manually remove the ToIndex directory to force a re-index and you can also remove all files inside the ToIndex directory if you want to (nothing bad will happen if you do at this time, you should not do this when the backend is implemented however). Might be a good idea to clean out every once in a while when trying out smaller things, like moving message/folders around or removing something just to see if the correct files are generated.
Download

The XPI that you need in order to try this extension out is available below. The source code is available in beagle’s SVN tree here. The minimum required version of Thunderbird is set to 2.0, so you’ll need that.

Thunderbird extension v0.1

A few words at the end

I’m leaving for a couple of days tomorrow morning and I will have limited Internet access these days. Feel free to comment this post with bugs, requests or whatever you feel like it’s worth for me and others to know about the extension. I’ll check in when I get the chance to do so.

Good luck and happy testing :-)

Update

Here’s the updated version of the extension. Hopefully it will fix some of the bugs you’ve got so far:

Thunderbird extension v0.1 update

beagle & soc07 17 Jul 2007 09:26 pm

Extension release expected shortly

The extension has finally reached a usable state after a couple of hectic days. I still have a few things on my TODO-list but they are only minor issues compared to what I’ve been dealing with. Despite that I still want to hold on the extension for one more day (I’ll release it tomorrow no matter what) ’cause I want to figure out a small overlay issue that I’m having. I also need some time to write a small “usage” documentation explaining everything you might need to know.

So, if you are up for it: check in tomorrow, I’ll release the extension to the public. Everyone that wants to test should not miss this :-)

beagle & soc07 16 Jul 2007 12:37 am

There are more to it than I thought

I’m usually quite time optimistic and it seems like I’ve done it again… :-( My “todo” list currently have seven tasks that I must complete before even thinking of making a release. These tasks have popped up today as I’ve been working (which I’ve done pretty much since I woke up this morning) and I didn’t realize how “much” there’s left to do. I really want to make a release ASAP but I don’t want to release anything until all vital parts are there. The GUI code and some of its “hard-to-find”-features are the main reason for the delay. It took me a while to find some of the functions that I really need. But I’ll continue working all day tomorrow and make another post about my progress then. Time to get some rest.

beagle & soc07 11 Jul 2007 11:33 am

Things are coming together

Just wanted to give a tiny report on what’s currently going on. The “core”, if you will, of the extension is nearly complete. I’ve come across a small bug that makes Thunderbird segfault every now and then (not very often though). I’m building a debug build of Thunderbird as I’m writing this to track it down (Thunderbird should never crash). This is also a request from David as he wants to help track the bug down. I’m also going to try out some code that I need to complete my extension which he is the author of (code that is not yet in trunk). Without this code I won’t be able to act upon renaming of folders or when something is thrown out of the trash bin. Once this is done I’ll start with the GUI part, which hopefully won’t take more than a day.

I’ll throw up an .xpi for easy testing as soon as I’ve completed the GUI part, since that’s when it’s appropriate to begin testing it. Performing a full index is totally possible at this stage too, the data wouldn’t end up in beagle until I’ve finished the backend, though.

beagle & soc07 03 Jul 2007 06:45 pm

Playing around with XPCOM

I think I’ve decided to go with an extension now. Been playing around with Thunderbird all day, trying to figure out how they work. So far I’ve been able to create the extension base (so that I can build and install a working extension) and I think I understand the basics well enough now. I finally know what XPCOM is and how it works too, only that is mind blowing ;-)

The biggest break through so far is that I’ve actually managed to figure out how the account manager works. I can now handle when accounts are added, changed or removed. Very basic, but it’s something.

beagle & soc07 03 Jul 2007 11:16 am

Rethinking the whole idea (I’m a bit split)

I’m starting to question myself if the current approach I’m taking is really worth all the fuss. We will always have the problems with parsing Mork files and the backend itself is going to be incredibly complex all in all. Also got a response from David Bienvenu yesterday and he suggests writing an extension instead of taking the path I’m currently walking. This is the way both Spotlight (which David has been involved with) for Mac OS X and Google Desktop does it. It seems very rational and would make things a lot easier. We would lose the capability to index when Thunderbird isn’t running, but we would actually be able to “index” things when beagle isn’t running (the design would be similar to the IndexingService backend), if I decide to implement it this way.

I know that some people read this by now and I would like your input in the matter. Should I go on as I intended from the beginning or take the alternative road and write an extension (which would require a less complex backend) instead?

Also, David is putting some effort in adding support for loading individual mails in Thunderbird from the command line, which is one of the things I’m going to need in the future. We’ll see how this evolves, but it seems promising.

beagle & soc07 02 Jul 2007 06:07 pm

Mork in numbers

So, I’m not totally done with the Mork implementation yet - but there’s not much left. It’s just some basic things that I’m still unsure about (I’ve actually mailed a guy over at Mozilla about that) and some clean ups. But I still thought I would post some heap-buddy numbers to show the progress. Here’s the heap-buddy output from the old implementation:

Allocated Bytes: 55,6M
Allocated Objects: 943118
GCs: 17
Resizes: 12
Final heap size: 25,3M

Distinct Types: 114
Backtraces: 2269

As you can see, these numbers are very bad. Final heap size at 25,3MB isn’t that ideal (the file I’m parsing is 1,2MB in size). There’s also a lot of allocated objects, many more than necessary. This is not strange in any way since the old implementation used strings and regular expressions to work. The new implementation has ditched this concept totally and use only byte arrays. Now, here’s the current heap-buddy output:

Allocated Bytes: 4,8M
Allocated Objects: 150837
GCs: 15
Resizes: 11
Final heap size: 3,0M

Distinct Types: 92
Backtraces: 976

The numbers above are more rational than the first ones, but I’m still not 100% satisfied with them. They could be lower. But I can’t whine too much about it, it’s decent since the parser is fully implemented in managed code (=decreasing the speed and adding some memory overhead compared to native code and my code isn’t ideal either). And it’s also much much better than the previous implementation. Maybe I find a way to decrease the memory usage more later on, who knows. I also just want to point out that the numbers displayed here are preliminary and don’t reflect the finished implementation. There might be some variation, but I can’t know that yet.

I haven’t done any scientific speed measurings as of this time, but I do have a Beagle.Util.Stopwatch that gives me an idea of how fast the parser and database is. The old implementation currently take about 0,9 seconds to parse the same file I took the heap-buddy numbers from above. The new implementation needs only about 0,37 seconds, so it’s more than twice as fast. Which is nice.