Proxy chaining with selenium

Proxy chaining with selenium was the topic of a recent conversation with a colleague,he wanted to know more about proxy chaining and selenium, this has lead me to write this up. Again this was an issue we ran into early on with FitNesse and selenium , and its something we are currently tackling with Twist.

Selenium uses a proxy.pac files to configure the browser’s proxy configuration.

An example selenium proxy.pac:


function FindProxyForURL(url, host) {
if(shExpMatch(url, '/selenium-server/')) {
return 'PROXY localhost:4444; DIRECT'
}
}

In this example the browser will automatically forward any requests containing “/selenium-server/” to the selenium server however all other requests are un proxied and go DIRECT to the requested host.

We work behind a corporate proxy, so we need to be able to send request via the proxy, buy not for all the hosts.

We are using Selenium-RC and we specify some proxy settings on the command line at start-up:


java -Dhttp.proxyHost=proxy.ourdomain.dom -Dhttp.proxyPort=8080 -Dhttp.nonProxyHosts=*dev.ourdomain.dom*^|*qa.ourdomain.dom*^|*staging.ourdomain.dom

when you use start Selenium-RC in this way it generates a proxy.pac file like this:


function FindProxyForURL(url, host) {
return 'PROXY localhost:4444; PROXY proxy.ourdomain.dom:8080';
}

The problem my colleague faced was that no matter what he put on the command line, the browser wasn’t honouring his proxy configuration, or so he thought.

The selenium documentation suggest that the way we invoke selenium creates a proxy chain, but this isn’t the case, if you look at the proxy.pac file selenium generated it just creates a fail-over chain, where the browser will try and use Selenium RC as the proxy and if it fails it will try and use the proxy you specified on the command line. bugger.

But fear not, there is an additional command line parameter that can be invoked (like a magic incantation) when starting Selenium-RC, its -avoidProxy:


java -Dhttp.proxyHost=proxy.ourdomain.dom -Dhttp.proxyPort=8080 -Dhttp.nonProxyHosts=*dev.ourdomain.dom*^|*qa.ourdomain.dom*^|*staging.ourdomain.dom -jar selenium-server.jar -avoidProxy

Adding the -avoidProxy flag, causes Selenium-RC to generate a proxy.pac file like this:


function FindProxyForURL(url, host) {
if(shExpMatch(url, '*/selenium-server/*')) {
return 'PROXY localhost:4444; PROXY proxy.ourdomain.dom:8080';
} else if (shExpMatch(host, '*dev.ourdomain.dom*')) {
return 'DIRECT';
} else if (shExpMatch(host, '*qa.ourdomain.dom*')) {
return 'DIRECT';
} else if (shExpMatch(host, '*staging.ourdomain.dom*')) {
return 'DIRECT';
} else {
return 'PROXY proxy.ourdomain.dom:8080';
}
}

What this does is use selenium for anything that has /selenium-server/ in the url, else it uses the corporate proxy, unless the host is one of the ones specified in which case it goes direct to that host. Eureka!

Well almost. Enter the same origin policy.

Back to my colleague. He was using *chrome in his tests (note this has nothing to do with google chrome, its firefox but with more schwartz) and if you use one of these “experimental browsers” as Selenium calls them, (*chrome, *iehta) then you need to set your browser’s proxy settings manually and just specify the path to your browser as if it were an unsupported browser.

For example, you can launch Firefox with a custom configuration like this:


*custom c:Program FilesMozilla Firefoxfirefox.exe

When the browser is started like this you have to manually configure the proxy settings to use Selenium Server as a proxy. This just means opening the browser preferences and specifying “localhost:4444” as the HTTP proxy.

I have also used the experimental browser *pifirefox thats proxy inject firefox with good results.

Proxy chaining with selenium

Proxy chaining with selenium was the topic of a recent conversation with a colleague,he wanted to know more about proxy chaining and selenium, this has lead me to write this up. Again this was an issue we ran into early on with FitNesse and selenium , and its something we are currently tackling with Twist.

Selenium uses a proxy.pac files to configure the browser’s proxy configuration.

An example selenium proxy.pac:


function FindProxyForURL(url, host) {
if(shExpMatch(url, '/selenium-server/')) {
return 'PROXY localhost:4444; DIRECT'
}
}

In this example the browser will automatically forward any requests containing “/selenium-server/” to the selenium server however all other requests are un proxied and go DIRECT to the requested host.

We work behind a corporate proxy, so we need to be able to send request via the proxy, buy not for all the hosts.

We are using Selenium-RC and we specify some proxy settings on the command line at start-up:


java -Dhttp.proxyHost=proxy.ourdomain.dom -Dhttp.proxyPort=8080 -Dhttp.nonProxyHosts=*dev.ourdomain.dom*^|*qa.ourdomain.dom*^|*staging.ourdomain.dom

when you use start Selenium-RC in this way it generates a proxy.pac file like this:


function FindProxyForURL(url, host) {
return 'PROXY localhost:4444; PROXY proxy.ourdomain.dom:8080';
}

The problem my colleague faced was that no matter what he put on the command line, the browser wasn’t honouring his proxy configuration, or so he thought.

The selenium documentation suggest that the way we invoke selenium creates a proxy chain, but this isn’t the case, if you look at the proxy.pac file selenium generated it just creates a fail-over chain, where the browser will try and use Selenium RC as the proxy and if it fails it will try and use the proxy you specified on the command line. bugger.

But fear not, there is an additional command line parameter that can be invoked (like a magic incantation) when starting Selenium-RC, its -avoidProxy:


java -Dhttp.proxyHost=proxy.ourdomain.dom -Dhttp.proxyPort=8080 -Dhttp.nonProxyHosts=*dev.ourdomain.dom*^|*qa.ourdomain.dom*^|*staging.ourdomain.dom -jar selenium-server.jar -avoidProxy

Adding the -avoidProxy flag, causes Selenium-RC to generate a proxy.pac file like this:


function FindProxyForURL(url, host) {
if(shExpMatch(url, '*/selenium-server/*')) {
return 'PROXY localhost:4444; PROXY proxy.ourdomain.dom:8080';
} else if (shExpMatch(host, '*dev.ourdomain.dom*')) {
return 'DIRECT';
} else if (shExpMatch(host, '*qa.ourdomain.dom*')) {
return 'DIRECT';
} else if (shExpMatch(host, '*staging.ourdomain.dom*')) {
return 'DIRECT';
} else {
return 'PROXY proxy.ourdomain.dom:8080';
}
}

What this does is use selenium for anything that has /selenium-server/ in the url, else it uses the corporate proxy, unless the host is one of the ones specified in which case it goes direct to that host. Eureka!

Well almost. Enter the same origin policy.

Back to my colleague. He was using *chrome in his tests (note this has nothing to do with google chrome, its firefox but with more schwartz) and if you use one of these “experimental browsers” as Selenium calls them, (*chrome, *iehta) then you need to set your browser’s proxy settings manually and just specify the path to your browser as if it were an unsupported browser.

For example, you can launch Firefox with a custom configuration like this:


*custom c:\Program Files\Mozilla Firefox\firefox.exe

When the browser is started like this you have to manually configure the proxy settings to use Selenium Server as a proxy. This just means opening the browser preferences and specifying “localhost:4444” as the HTTP proxy.

I have also used the experimental browser *pifirefox thats proxy inject firefox with good results.

Running Selenium with proxy exceptions

The way we have our QA environment set-up we use a DNS server this allows us to mimic the live environment (the customer facing server names are the same as in the production environment). It sounds ideal however it isn’t without problems. We cant set-up a banner server in our environment (because we use a third party banner service) nor can we set-up a WA server (we use multiple WA vendors) so we have to use proxy exceptions that excludes everything from going through the proxy that exists in inside the QA environment.

This obviously means we can end up with a pretty big proxy exceptions list in our browser. Firefox isn’t a problem as there are a couple of plug ins available that can manage multiple proxy settings. However IE requires us to use a hand crafted batch file to update registry settings. Not really a problem, and we are comfortable with how it works. It rarely catches us out.

Now when we cam to automate our tests with selenium we ran into a problem. We needed to tell selenium to use our corporate proxy so it can proxy request to the outside world, and when selenium starts the browser it tell the browser to use a particular PAC file that is generated at test run time. The PAC file is quite simple and tells the browser to use Selenium for anything that lies within the SUT else go external. sound’s great except that it doesn’t work.
We read a few forum posts, and blog posts and scratched our heads. Finally we opened the Selenium source code and found that we could in fact pass in a list of hosts to apply to the proxy exceptions list. We were feeling pretty jaded are spending hours of google time on it, why its not documented clearly anywhere i don’t know.

Okay, so our startup command for selenium looked like this

java -Dhttp.proxyHost=proxy.ourdomain.co.uk -Dhttp.proxyPort=8080 -jar selenium-server.jar –avoidProxy

To add proxy avoidance you specify the hosts you want to avoid through the proxy and delimit with the pipe character |, but we then found we had to delimit the pipe character with the caret ^

-Dhttp.nonProxyHosts=*www.ourdomain.co.uk*^|*search.ourdomain.co.uk*

So our new selenium startup string looked like this

java -Dhttp.proxyHost=proxy.ourdomain.co.uk -Dhttp.proxyPort=8080 -Dhttp.nonProxyHosts=*www.ourdomain.co.uk*^|*search.ourdomain.co.uk* -jar selenium-server.jar –avoidProxy

This means that if the request is for anything other than www. or search . on our domain, selenium forwards it through our corporate proxy.

Running Selenium with proxy exceptions

The way we have our QA environment set-up we use a DNS server this allows us to mimic the live environment (the customer facing server names are the same as in the production environment). It sounds ideal however it isn’t without problems. We cant set-up a banner server in our environment (because we use a third party banner service) nor can we set-up a WA server (we use multiple WA vendors) so we have to use proxy exceptions that excludes everything from going through the proxy that exists in inside the QA environment.

This obviously means we can end up with a pretty big proxy exceptions list in our browser. Firefox isn’t a problem as there are a couple of plug ins available that can manage multiple proxy settings. However IE requires us to use a hand crafted batch file to update registry settings. Not really a problem, and we are comfortable with how it works. It rarely catches us out.

Now when we cam to automate our tests with selenium we ran into a problem. We needed to tell selenium to use our corporate proxy so it can proxy request to the outside world, and when selenium starts the browser it tell the browser to use a particular PAC file that is generated at test run time. The PAC file is quite simple and tells the browser to use Selenium for anything that lies within the SUT else go external. sound’s great except that it doesn’t work.
We read a few forum posts, and blog posts and scratched our heads. Finally we opened the Selenium source code and found that we could in fact pass in a list of hosts to apply to the proxy exceptions list. We were feeling pretty jaded are spending hours of google time on it, why its not documented clearly anywhere i don’t know.

Okay, so our startup command for selenium looked like this

java -Dhttp.proxyHost=proxy.ourdomain.co.uk -Dhttp.proxyPort=8080 -jar selenium-server.jar –avoidProxy

To add proxy avoidance you specify the hosts you want to avoid through the proxy and delimit with the pipe character |, but we then found we had to delimit the pipe character with the caret ^

-Dhttp.nonProxyHosts=*www.ourdomain.co.uk*^|*search.ourdomain.co.uk*

So our new selenium startup string looked like this

java -Dhttp.proxyHost=proxy.ourdomain.co.uk -Dhttp.proxyPort=8080 -Dhttp.nonProxyHosts=*www.ourdomain.co.uk*^|*search.ourdomain.co.uk* -jar selenium-server.jar –avoidProxy

This means that if the request is for anything other than www. or search . on our domain, selenium forwards it through our corporate proxy.