On the one hand I had thought it was likely to be a software problem - either with the driver or the firmware (and I'd hoped it'd be in the firmware that's uploaded to the card when the computer boots, rather than in non-rewritable memory).
I was guessing this because if it was hardware then you'd expect dropping the connection and reconnecting wouldn't work - if the PHY was truly locked up then it might need a full power-on reset. However, after some thought, perhaps it is something like an interrupt or DMA in the embedded micro-controller locking up, which cannot be overcome with a software fix and it's thus a fundamental hardware design flaw. When the hardware locks up, the driver can force a reset of that hardware which allows it to reconnect? You couldn't routinely use this if the driver noticed a sudden drop in traffic as it might take too long?
it is odd that the connection can work for some indeterminate time, which appears unrelated to the volume of traffic points, and suggests some unusual condition happening, perhaps a corrupted frame due to congestion? Mine just failed having received just 10883 packets (freshly booted laptop) and sent 384. No errors reported:
$ ifconfig wlan0
wlan0 Link encap:Ethernet HWaddr 7c:7a:91:xx:xx:xx
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:10883 errors:0 dropped:0 overruns:0 frame:0
TX packets:384 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1294045 (1.2 MB) TX bytes:117637 (117.6 KB)
that was on 2.4GHz, thus:
$ iwconfig wlan0
wlan0 IEEE 802.11abgn ESSID:"xxxxxxxxxxxxxx"
Mode:Managed Frequency:2.412 GHz Access Point: E0:46:9A:xx:xx:xx
Bit Rate=216 Mb/s Tx-Power=16 dBm
Retry long limit:7 RTS thr:off Fragment thr:off
Encryption key:off
Power Management:on
Link Quality=61/70 Signal level=-49 dBm
Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
Tx excessive retries:0 Invalid misc:32 Missed beacon:0
I've personally seen it now on four of these cards across two different makes of laptops, so it's not a specific rare problem, so this is pervasive. What amazes me is that Intel say they cannot reproduce this problem, which I actually don't believe. My current working assumption therefore is that they are stalling for time, hoping they can find a work-round that hides the problem which can't be fixed properly in software/firmware.
It's time for Intel to 'fess up and do the right thing. Produce a version 2 of the card which works properly and have a full recall/replacement program; it won't be cheap but it would be The Right Thing To Do. I'm expecting Intel to delete this message and maintain their stance of plausible deniability.
intel_joe, it's time to tell us what Intel have actually found and are going to do about it?