前面说过了,httpWebRequest的好处在于轻量,不需要界面,缺点在于无法执行javascript。这里再归纳一些问题。

1. 设置代理

1) httpWebRequest不支持https的代理,也就是说用不了某些vpn,你懂的。

2) 一般的写法:

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);

request.Proxy = new WebProxy(proxyUrl, true); //如:http://123.123.123.123:80

3) 使用Pac(自动配置代理脚本):

这个比较麻烦,需要win32 api,下面是一个类以及调用方法,有详细的注释,不用说是抄来的:

	public class Win32Api
{
#region AutoProxy Constants
/// <summary>
/// Applies only when setting proxy information
/// </summary>
public const int WINHTTP_ACCESS_TYPE_DEFAULT_PROXY = 0;
/// <summary>
/// Internet accessed through a direct connection
/// </summary>
public const int WINHTTP_ACCESS_TYPE_NO_PROXY = 1;
/// <summary>
/// Internet accessed using a proxy
/// </summary>
public const int WINHTTP_ACCESS_TYPE_NAMED_PROXY = 3;
/// <summary>
/// Attempt to automatically discover the URL of the
/// PAC file using both DHCP and DNS queries to the local network.
/// </summary>
public const int WINHTTP_AUTOPROXY_AUTO_DETECT = 0x00000001;
/// <summary>
/// Download the PAC file from the URL in the WINHTTP_AUTOPROXY_OPTIONS structure.
/// </summary>
public const int WINHTTP_AUTOPROXY_CONFIG_URL = 0x00000002;
/// <summary>
/// Executes the Web Proxy Auto-Discovery (WPAD) protocol in-process instead of
/// delegating to an out-of-process WinHTTP AutoProxy Service, if available.
/// This flag must be combined with one of the other flags
/// </summary>
public const int WINHTTP_AUTOPROXY_RUN_INPROCESS = 0x00010000;
/// <summary>
/// By default, WinHTTP is configured to fall back to auto-discover a proxy
/// in-process. If this fallback behavior is undesirable in the event that
/// an out-of-process discovery fails, it can be disabled using this flag.
/// </summary>
public const int WINHTTP_AUTOPROXY_RUN_OUTPROCESS_ONLY = 0x00020000;
/// <summary>
/// Use DHCP to locate the proxy auto-configuration file.
/// </summary>
public const int WINHTTP_AUTO_DETECT_TYPE_DHCP = 0x00000001;
/// <summary>
/// Use DNS to attempt to locate the proxy auto-configuration file at a
/// well-known location on the domain of the local computer
/// </summary>
public const int WINHTTP_AUTO_DETECT_TYPE_DNS_A = 0x00000002;
#endregion #region Proxy Structures
/// <summary>
/// The structure is used to indicate to the WinHttpGetProxyForURL
/// function whether to specify the URL of the Proxy Auto-Configuration
/// (PAC) file or to automatically locate the URL with DHCP or DNS
/// queries to the network
/// </summary>
[StructLayout(LayoutKind.Sequential, CharSet=CharSet.Unicode)]
public struct WINHTTP_AUTOPROXY_OPTIONS {
/// <summary>
/// Mechanisms should be used to obtain the PAC file
/// </summary>
[MarshalAs(UnmanagedType.U4)]
public int dwFlags;
/// <summary>
/// If dwflags includes the WINHTTP_AUTOPROXY_AUTO_DETECT flag,
/// then dwAutoDetectFlags specifies what protocols are to be
/// used to locate the PAC file. If both the DHCP and DNS auto
/// detect flags are specified, then DHCP is used first;
/// if no PAC URL is discovered using DHCP, then DNS is used.
/// If dwflags does not include the WINHTTP_AUTOPROXY_AUTO_DETECT
/// flag, then dwAutoDetectFlags must be zero.
/// </summary>
[MarshalAs(UnmanagedType.U4)]
public int dwAutoDetectFlags;
/// <summary>
/// If dwflags includes the WINHTTP_AUTOPROXY_CONFIG_URL flag, the
/// lpszAutoConfigUrl must point to a null-terminated Unicode string
/// that contains the URL of the proxy auto-configuration (PAC) file.
/// If dwflags does not include the WINHTTP_AUTOPROXY_CONFIG_URL flag,
/// then lpszAutoConfigUrl must be NULL.
/// </summary>
public string lpszAutoConfigUrl;
/// <summary>
/// Reserved for future use; must be NULL.
/// </summary>
public IntPtr lpvReserved;
/// <summary>
/// Reserved for future use; must be zero.
/// </summary>
[MarshalAs(UnmanagedType.U4)]
public int dwReserved;
/// <summary>
/// Specifies whether the client's domain credentials should be automatically
/// sent in response to an NTLM or Negotiate Authentication challenge when
/// WinHTTP requests the PAC file.
/// If this flag is TRUE, credentials should automatically be sent in response
/// to an authentication challenge. If this flag is FALSE and authentication
/// is required to download the PAC file, the WinHttpGetProxyForUrl fails.
/// </summary>
public bool fAutoLoginIfChallenged; } /// <summary>
/// The structure contains the session or default proxy configuration.
/// </summary>
[StructLayout(LayoutKind.Sequential, CharSet=CharSet.Unicode)]
public struct WINHTTP_PROXY_INFO {
/// <summary>
/// Unsigned long integer value that contains the access type
/// </summary>
[MarshalAs(UnmanagedType.U4)]
public int dwAccessType;
/// <summary>
/// Pointer to a string value that contains the proxy server list
/// </summary>
public string lpszProxy;
/// <summary>
/// Pointer to a string value that contains the proxy bypass list
/// </summary>
public string lpszProxyBypass;
}
#endregion #region WinHttp
/// <summary>
/// This function implements the Web Proxy Auto-Discovery (WPAD) protocol
/// for automatically configuring the proxy settings for an HTTP request.
/// The WPAD protocol downloads a Proxy Auto-Configuration (PAC) file,
/// which is a script that identifies the proxy server to use for a given
/// target URL. PAC files are typically deployed by the IT department within
/// a corporate network environment. The URL of the PAC file can either be
/// specified explicitly or WinHttpGetProxyForUrl can be instructed to
/// automatically discover the location of the PAC file on the local network.
/// </summary>
/// <param name="hSession">The WinHTTP session handle returned by the WinHttpOpen function</param>
/// <param name="lpcwszUrl">A pointer to a null-terminated Unicode string that contains the
/// URL of the HTTP request that the application is preparing to send.</param>
/// <param name="pAutoProxyOptions">A pointer to a WINHTTP_AUTOPROXY_OPTIONS structure that
/// specifies the auto-proxy options to use.</param>
/// <param name="pProxyInfo">A pointer to a WINHTTP_PROXY_INFO structure that receives the
/// proxy setting. This structure is then applied to the request handle using the
/// WINHTTP_OPTION_PROXY option.</param>
/// <returns></returns>
[DllImport("winhttp.dll", SetLastError=true, CharSet=CharSet.Unicode)]
public static extern bool WinHttpGetProxyForUrl(
IntPtr hSession,
string lpcwszUrl,
ref WINHTTP_AUTOPROXY_OPTIONS pAutoProxyOptions,
ref WINHTTP_PROXY_INFO pProxyInfo); /// <summary>
/// The function initializes, for an application, the use of WinHTTP
/// functions and returns a WinHTTP-session handle
/// </summary>
/// <param name="pwszUserAgent">A pointer to a string variable that contains the name of the
/// application or entity calling the WinHTTP functions.</param>
/// <param name="dwAccessType">Type of access required. This can be one of the following values</param>
/// <param name="pwszProxyName"> A pointer to a string variable that contains the name of the
/// proxy server to use when proxy access is specified by setting dwAccessType to
/// WINHTTP_ACCESS_TYPE_NAMED_PROXY. The WinHTTP functions recognize only CERN type proxies for HTTP.
/// If dwAccessType is not set to WINHTTP_ACCESS_TYPE_NAMED_PROXY, this parameter must be set
/// to WINHTTP_NO_PROXY_NAME</param>
/// <param name="pwszProxyBypass">A pointer to a string variable that contains an optional list
/// of host names or IP addresses, or both, that should not be routed through the proxy when
/// dwAccessType is set to WINHTTP_ACCESS_TYPE_NAMED_PROXY. The list can contain wildcard characters.
/// Do not use an empty string, because the WinHttpOpen function uses it as the proxy bypass list.
/// If this parameter specifies the "<local>" macro as the only entry, this function bypasses
/// any host name that does not contain a period. If dwAccessType is not set to WINHTTP_ACCESS_TYPE_NAMED_PROXY,
/// this parameter must be set to WINHTTP_NO_PROXY_BYPASS.</param>
/// <param name="dwFlags">Unsigned long integer value that contains the flags that indicate various options
/// affecting the behavior of this function</param>
/// <returns>Returns a valid session handle if successful, or NULL otherwise</returns>
[DllImport("winhttp.dll", SetLastError=true, CharSet=CharSet.Unicode)]
public static extern IntPtr WinHttpOpen(
string pwszUserAgent,
int dwAccessType,
IntPtr pwszProxyName,
IntPtr pwszProxyBypass,
int dwFlags
); /// <summary>
/// The function closes a single HINTERNET handle
/// </summary>
/// <param name="hInternet">Valid HINTERNET handle to be closed.</param>
/// <returns>Returns TRUE if the handle is successfully closed, or FALSE otherwise</returns>
[DllImport("winhttp.dll", SetLastError=true, CharSet=CharSet.Unicode)]
public static extern bool WinHttpCloseHandle(IntPtr hInternet); #endregion [DllImport("kernel32.dll")]
public static extern int GetLastError();
        private string getProxyForUrlUsingPac(string DestinationUrl, string PacUri)
{ IntPtr WinHttpSession = Win32Api.WinHttpOpen("User", Win32Api.WINHTTP_ACCESS_TYPE_DEFAULT_PROXY, IntPtr.Zero, IntPtr.Zero, 0); Win32Api.WINHTTP_AUTOPROXY_OPTIONS ProxyOptions = new Win32Api.WINHTTP_AUTOPROXY_OPTIONS();
Win32Api.WINHTTP_PROXY_INFO ProxyInfo = new Win32Api.WINHTTP_PROXY_INFO(); ProxyOptions.dwFlags = Win32Api.WINHTTP_AUTOPROXY_CONFIG_URL;
ProxyOptions.dwAutoDetectFlags = (Win32Api.WINHTTP_AUTO_DETECT_TYPE_DHCP | Win32Api.WINHTTP_AUTO_DETECT_TYPE_DNS_A);
ProxyOptions.lpszAutoConfigUrl = PacUri; // Get Proxy
bool IsSuccess = Win32Api.WinHttpGetProxyForUrl(WinHttpSession, DestinationUrl, ref ProxyOptions, ref ProxyInfo); Win32Api.WinHttpCloseHandle(WinHttpSession); if (IsSuccess)
{
return ProxyInfo.lpszProxy;
}
else
{
Console.WriteLine("Error: {0}", Win32Api.GetLastError());
return null;
}
}

  使用时,request.Proxy = new WebProxy(getProxyForUrlUsingPac(url, pac));

这里要注意一点,HttpWebRequest设置代理后,不要设置太多的Http Header,否则容易出问题。

3. 读取cookieContainer里的cookie

Hashtable table = (Hashtable)cookie.GetType().InvokeMember("m_domainTable",
BindingFlags.NonPublic |
BindingFlags.GetField |
BindingFlags.Instance,
null,
cookie,
new object[] { }); foreach (var tableKey in table.Keys)
{
String str_tableKey = (string)tableKey; if (str_tableKey[0] == '.')
{
str_tableKey = str_tableKey.Substring(1);
} SortedList list = (SortedList)table[tableKey].GetType().InvokeMember("m_list",
BindingFlags.NonPublic |
BindingFlags.GetField |
BindingFlags.Instance,
null,
table[tableKey],
new object[] { }); foreach (var listKey in list.Keys)
{
String uri = "https://" + str_tableKey + (string)listKey;
foreach (Cookie c in cookie.GetCookies(new Uri(uri)))
{//取cookie的Name, Value等属性,上面是https的domain,不难改写,增加支持http }
}
}

  

浏览器自动化的一些体会8 HttpWebRequest的几个问题的更多相关文章

  1. 浏览器自动化的一些体会2 webBrowser控件之ajax

    上个帖子简要讨论了浏览器自动化的几种方法.现在讨论webBrowser控件使用中的一些问题.基本的操作就不详细说了,随便网上找个帖子或找本书都有介绍的.这里只写点网上似乎少有人总结过的内容,以及自己的 ...

  2. 浏览器自动化的一些体会9 访问angular页面的一个问题

    发现浏览器自动化有一个重要方面没有提及,即所谓的无页面浏览器,不过最近没有需求,不想尝试,先记上一笔,以后有需求时,可以有个思路. 大约一两个月前(现在比较懒散,时间不知不觉过去,连今天是几号有时候都 ...

  3. 浏览器自动化的一些体会9 webBrowser控件之零碎问题3

    WebBrowser控件最大的优点是可以轻松嵌入win form程序中,但是微软好像对这个控件没什么兴趣,这么多年了还没有改进,结果造成一堆问题. 1. 不支持https 2. 缺省模拟ie 7,如果 ...

  4. 浏览器自动化的一些体会6 增强的webBrowser控件

    这里谈两点 1.支持代理服务器切换 一种方法是修改注册表,不是太好的做法,而且,只能改全局设置,不能改局部(比如只让当前的webBrowser控件使用代理,而其他应用不用代理) 另外一个较好的方法,示 ...

  5. 浏览器自动化的一些体会3 webBrowser控件之零碎问题

    1. 一般需要执行这一句:webBrowser1.ScriptErrorsSuppressed = true; 主要目的是禁止跳出javascript错误的对话框,否则会导致程序无法正确地跑下去.缺点 ...

  6. 浏览器自动化的一些体会11 webclient的异步操作

    原来的代码大致如下: private void foo(string url) { using (WebClient client = new WebClient()) { client.Downlo ...

  7. 浏览器自动化的一些体会7 selenium webdriver的一些问题

    1. 下载图片 这个链接说得最好,差不多所有可能的方法都列举了,除了没有提到用URLDownloadToFile,不过这和用WebClient差不多. https://stackoverflow.co ...

  8. 浏览器自动化的一些体会5 webBrowser控件之winform和webBrowser的交互

    从winform访问webBrowser,大致就是利用webBrowser提供的解析dom的方法以及用InvokeScript方法执行javascript.这个相对比较简单. 从webBrowser访 ...

  9. 浏览器自动化的一些体会4 webBrowser控件之零碎问题2

    1. DocumentCompleted的多次执行问题 有的网页,会多次触发DocumentCompleted事件,由于它是异步的,不会阻塞,所以如果不恰当处理,会造成某些代码被错误地多次执行,造成意 ...

随机推荐

  1. zabbix自定义监控(当会话登录超过三个就报警)

    安装过程在此省略. 1.agent端去修改配置文件 2.调用自定义内容 vim /etc/zabbix/zabbix_agentd.d/login.conf UserParameter=login-u ...

  2. 题解 洛谷 P2086 【[NOI2012]魔幻棋盘】

    先考虑只有一维的情况,要求支持区间加和求区间 \(\gcd\),根据 \(\gcd\) 的性质,发现: \[ \gcd(a_1,a_2,a_3,\ldots a_n)=\gcd(a_i,a_2-a_1 ...

  3. 题解 洛谷 P5324 【[BJOI2019]删数】

    先考虑对于一个序列,能使其可以删空的的修改次数. 首先可以发现,序列的排列顺序是没有影响的,所以可以将所有数放到桶里来处理. 尝试对一个没有经过修改的可以删空的序列来进行删数,一开始删去所有的\(n\ ...

  4. web自动化 -- HTMLreport(一)测试报告自定义测试用例名,重写ddt

    一.需求痛点 1.HTMLreport测试报告的用例名不明确 2.希望可以自定义HTMLreport测试报告的用例名 3.痛点截图 二.解决办法 1.原因分析 HTMLreport测试报告中的用例名是 ...

  5. yum下载软件包

    方法一: downloadonly插件有一个yum的插件叫做downloadonly,就是只下载不安装的意思.1. 安装插件yum install yum-download2. 下载yum updat ...

  6. 用友U8API 8.9-15.0接口开发前提,选好开发方式

    在用友接口开发这条路上,走走停停过了好几年.对于如何选择哪种方式,目前总结几点, 对于开发,目前可以实现的有三种方式       一.是通过用友官方提供的(EAI/API)接口     这种方式的优点 ...

  7. 一步步教你用Prometheus搭建实时监控系统系列(二)——详细分析拉取和推送两种不同模式

    前言 本系列着重介绍Prometheus以及如何用它和其周边的生态来搭建一套属于自己的实时监控告警平台. 本系列受众对象为初次接触Prometheus的用户,大神勿喷,偏重于操作和实战,但是重要的概念 ...

  8. Unicode 字符串

    Unicode 字符串 Python 中定义一个 Unicode 字符串和定义一个普通字符串一样简单:高佣联盟 www.cgewang.com >>> u'Hello World ! ...

  9. map进程数量和reduce进程数量

    1-map task的并发数量是由切片的数量决定的,有多少个切片就有启动多少个map task: 2-切片是一个逻辑的概念,指的是文件中数据的偏移量范围: 3-切片的具体大小应该根据所处理的文件大小来 ...

  10. 浅谈Mybatis持久化框架在Spring、SSM、SpringBoot整合的演进及简化过程

    前言 最近开始了SpringBoot相关知识的学习,作为为目前比较流行.用的比较广的Spring框架,是每一个Java学习者及从业者都会接触到一个知识点.作为Spring框架项目,肯定少不了与数据库持 ...