Category Archives: Software

Default Argument Value Does Not Refresh Between Function Calls

Something struck me as unexpected today while working in Python. I had a function to take a datetime object and convert it into epoch milliseconds:

import datetime
import time

this_tz = 'US/Eastern'

def get_epch_ms(dttm=datetime.datetime.now(pytz.timezone(this_tz))):
    # Returns milliseconds since epoch for datetime object passed.
    # If no argument is passed, uses *now* as time basis.
    # DOES NOT APPEAR TO REFRESH 'dttm' BETWEEN EXECUTIONS.

    return int(time.mktime(dttm.astimezone(pytz.timezone(this_tz)).timetuple()) * 1000.0 + round(dttm.microsecond / 1000.0))

This function works fine: call it with get_epch_ms() and the epoch millisecond value for *now* is returned; however, I noticed during subsequent calls to the function within the same execution of the broader application that the value of dttm did not update each time. I.e., it appears as if the logic used to populate a default value – dttm=datetime.datetime.now(pytz.timezone(this_tz)) – was executed only during the first call to the function, and that same value was used for subsequent calls. It took me a bit to track this down, not sure if it’s just something I’ve never come up against before.

The fix is simple enough, though involved a couple of additional lines of code:

import datetime
import time

this_tz = 'US/Eastern'

def get_epch_ms(dttm=None):
    # Returns milliseconds since epoch for datetime object passed.
    # If no argument is passed, uses *now* as time basis.
    # Refreshes 'dttm' between calls to this function.

    if dttm is None:
        dttm = datetime.datetime.now(pytz.timezone(this_tz))

    return int(time.mktime(dttm.astimezone(pytz.timezone(this_tz)).timetuple()) * 1000.0 + round(dttm.microsecond / 1000.0))

The updated function properly provides an updated timestamp at each invocation, when called as get_epch_ms().

Web Browser Cookies Between Sessions (IE, Firefox, Chrome)

Was looking into this for a client, and I’ve come to the following conclusion based on various reading across the ‘tubes:

How cookies are handled between browser instances varies between web browsers. Why do we care? Well various web applications are going to get wonky if you try opening multiple instances of them when those instances share cookies. And by “wonky” I mean it’s just not going to work. So isolating browser instances allows us to have multiple sessions of that web application open simultaneously.

Internet Explorer

IE7 does *not* share cookies if you start another instance of it (e.g. double-clicking on the icon when an instance is already open) but will share them across tabs or if you use “New Window” to open new window.

IE8 *does* share cookies between instances by default, but it can be made to not do this by either:

– Going to File–>New Session

– Starting IE8 with “iexplore -nomerge” (custom shortcut)

Mozilla Firefox

It would appear Firefox shares cookies between tabs and windows if those windows are created under the same Firefox profile. If you’re like me (and using Firefox), you probably only have one Firefox profile setup for yourself. You can force Firefox to use a different profile by creating a custom shortcut that looks like this:

firefox.exe -no-remote -p “myProfile2″

where myProfile2 is the name of the profile you want Firefox to use. If the profile does not exist, Mozilla will bring up the profile management tool which will let you create it. From then on you can then open two instances of Firefox, running under two different profiles, which will *not* share cookies and, thus, will allow you to run two simultaneous sessions of your favorite web application (I know what mine is).

Chrome

Allegedly, Chrome shares cookies between instances unless you use its Incognito feature by clicking on the wrench and going to “New Incognito Window” (Ctl-Shift-N).

PDF: Windows vs Linux File Size

I’ve recently switched to Linux (Ubuntu 8.10) as my main operating system. I find it’s a more effective workspace for most of my tasks. Check it out if you haven’t already; Linux really is growing up. I do keep Windows around for a couple tasks, mainly gaming, but Linux is closing the gap on that, too, through the latest implementations of Wine.

One thing I’ve noticed, though, that I haven’t been able to pin down a reason for, is that PDF file sizes in Linux seem high compared to those generated in Windows. I know, this is a somewhat generic statement given the fact that, Linux or Windows, the process is dependent on the software doing the compression. Yet there seems to be a consistent discrepancy between the two operating systems when it comes to PDF file sizes. Looking around online, my observations seem to be somewhat validated. A popular solution on forums is to use the DjVu compression scheme, but I’d prefer sticking with the fairly universal PDF file format. To its credit, DjVu seems to match or better PDF when it comes to black-and-white documents, but it falls behind in grayscale.

So I ran a little test, scanning the front page of my offer letter for my new job. It consists of a company logo at the top and a full page of text. It is somewhat indicative of what I archive. All scans were done in black-and-white or grayscale. Results (file size in bytes):

18474 150dpiLinuxDjVu-BW.djvu
241812 150dpiLinuxDjVu-Gray.djvu
55298 150dpiLinuxLZW-BW.pdf
813876 150dpiLinuxLZW-Gray.pdf
50213 150dpiWin-BW.pdf
29172 150dpiWinG4-BW.tif
34410 150dpiWinG4-Gray.tif
58947 150dpiWin-Gray.pdf
47280 150dpiWinLZW-BW.tif
1304736 150dpiWinLZW-Gray.tif
29229 300dpiLinuxDjVu-BW.djvu
688967 300dpiLinuxDjVu-Gray.djvu
113726 300dpiLinuxLZW-BW.pdf
2670089 300dpiLinuxLZW-Gray.pdf
81978 300dpiWin-BW.pdf
59188 300dpiWinG4-BW.tif
73842 300dpiWinG4-Gray.tif
114967 300dpiWin-Gray.pdf
5024631 300dpiWin-Gray-300dpiPDF.pdf
5024632 300dpiWin-Gray-600dpiPDF.pdf
5040863 300dpiWin-GrayThenPDF.pdf
8955576 300dpiWin-Gray.tif
132170 300dpiWinLZW-BW.tif
5577814 300dpiWinLZW-Gray.tif
759067 CNNLinux.pdf
237794 CNNWin600dpi.pdf

In order of size:

18474 150dpiLinuxDjVu-BW.djvu
29172 150dpiWinG4-BW.tif
29229 300dpiLinuxDjVu-BW.djvu
34410 150dpiWinG4-Gray.tif
47280 150dpiWinLZW-BW.tif
50213 150dpiWin-BW.pdf
55298 150dpiLinuxLZW-BW.pdf
58947 150dpiWin-Gray.pdf
59188 300dpiWinG4-BW.tif
73842 300dpiWinG4-Gray.tif
81978 300dpiWin-BW.pdf
113726 300dpiLinuxLZW-BW.pdf
114967 300dpiWin-Gray.pdf
132170 300dpiWinLZW-BW.tif
237794 CNNWin600dpi.pdf
241812 150dpiLinuxDjVu-Gray.djvu
688967 300dpiLinuxDjVu-Gray.djvu
759067 CNNLinux.pdf
813876 150dpiLinuxLZW-Gray.pdf
1304736 150dpiWinLZW-Gray.tif
2670089 300dpiLinuxLZW-Gray.pdf
5024631 300dpiWin-Gray-300dpiPDF.pdf
5024632 300dpiWin-Gray-600dpiPDF.pdf
5040863 300dpiWin-GrayThenPDF.pdf
5577814 300dpiWinLZW-Gray.tif
8955576 300dpiWin-Gray.tif

Make note of the file extensions; there are actually three different file types in those listings. The file names lead with resolution, with the exception of the two starting with “CNN.” Those two were PDF’s created by printing cnn.com’s cover page to PDF in Linux and Windows (using PDF Creator). The cover page contained slightly different content but not enough to explain the file size difference. After the resolution in the file name comes the operating system, followed by compression algorithm where applicable. Immediately after the hyphen is the grayscale/black-and-white indicactor and in those cases where there is a second hyphen, it indicates the file was post-processed with a PDF printer at the stated resolution.

For Windows, where a compression algorithm is not listed, I used the software included with my Canon LiDE 50 scanner, which saves directly to PDF. In Linux, I used the popular gscan2pdf GUI. Having OCR on or off did not seem to make much of a difference, as far as file size. For gscan2pdf, the file was also processed with Unpaper, which should optimize the file further (it also creates blockiness in the document’s whitespace that is undesirable to me, but it’s fine for archiving documents).

So there you go. The difference is significant. One would have to dig into the underpinnings of the software, I think, to expose the reason for this, but I’m definitely curious. Again, DjVu pulls close and surpasses PDF when it comes to black-and-white scanning, but even it falls short when using grayscale (which happens to by my method of choice). I’ll admit I don’t relish the idea of booting into Windows simply to archive documents.

Windows Explorer Folder Shortcuts

I sometimes like to make shortcuts to various folders on my Windows machine. I am annoyed, though, that when executed, this shortcut brings up an explorer window without the folder tree on the left. I found the solution to this here:

http://pure-essence.net/2007/01/29/shortcut-with-folder-tree/

In short, the command line in your shortcut should read

%SystemRoot%\EXPLORER.EXE /n,/e,d:\

where “d:\” should be replaced by the path to the file.

Hyperlink to Specific Page of PDF

I was recently posed with the question of whether or not you could hyperlink (yeah, I’m using the term as a verb) to a specific page of a PDF. In looking around, I am under the impression this can be done if the PDF resides on a web server by adding “#page=2″ to the hyperlink (for Page 2, that is).

http://foo.com/file.pdf#page=2

nule mentioned this might be browser specific, but I have not run into that.

The situation becomes more complicated if the hyperlink points to a PDF residing on a mapped network drive, etc. I have read of solutions involving VBA scripts for this particular case, though I did not delve into it (nor do I intend to).