SharePoint 403s and the PermissionMask Mystery
The below is posted to help anyone else facing the same issue with 403's or PermissionMask messages in ULS.
After some routine maintenance on our SP2010 SP2 farm (which involved a reboot of servers) we noticed an intermittent 403 error on the main intranet page. This was quickly identified as an issue on one of the web front end servers (which was always failing, but being covered by the load balancer). It also only affected the root site collection from that server, and other site collections were fine.
Some blog articles mentioned 403's being caused by inadequate permissions on the bin directory of the site, but we tried this without success.
We even rolled back the VM image of the server to before the maintenance, but this also made no difference.
Looking in the ULS logs for the ailing WFE, there were a series of strange errors regarding PermissionMask (verbose logging had been enabled);
"Verbose PermissionMask check failed. asking for 0x00000015, have 0x00000000"
This had me thinking that perhaps the authentication or authorization process were blocked somewhere.
Next up, logging in with a normal user account turned up something interesting; A normal user could see much of the homepage, except Navigation and CQWP were missing. Site Collection Admins saw only the 403. This made us think of the Object Cache, as faulty configuration would strim out elements such as Navigation, and CQWP.
On this basis we turned up this article; http://blogs.technet.com/b/pfelatam/archive/2011/12/01/performance-sharepoint-2010-cache.aspx
Alas it wasn't quite right. No CBA in this site, but certainly something wrong with the credentials or authorization mechanism.
Going into the Cache settings we hit a clue. We got access denied for the Output Cache config page on the faulty server, even for the Farm Admin account. On the working WFE's the setting page could be accessed. Disabling Output cache made no difference to the error.
We were getting a bit stumped at this point. However as output cache was implicated we began poking around in IIS for potential clues. Then we found the issue; some genius had configured the sites App Pool to run under a different account on this WFE compared to the others. Changing it to match the other WFE solved all the issues. It had been working, and it seems the re-boot had made it lose its authentication token or whatever. Anyway, legacy misconfiguration was to blame. :(
It's always puzzling until you find it!