一、说明

背景:最近在做同步京东商品信息时遇到一个问题,同步后的商品详情无法在富文本中修改,强制修改会导致图片无法正常显示,研究发现详情中的图片是在css的作为背景图指定的。

解决:经过多次尝试,最后使用自定义HTML标签模板,提取css样式中background-image:url的图片地址和尺寸,并替换到自定义的模板中

技术:Java语言、正则表达式

二、代码

public static void main(String[] args) {
StringBuilder stringBuilder = new StringBuilder();
//商品详情
String goodsDesc = "<div cssurl='//sku-market-gw.jd.com/css/pc/100002519219.css?t=1581586700014'></div><div id='zbViewModulesH' value='4797'></div><input id='zbViewModulesHeight' type='hidden' value='4797'/><div skudesign=\\\"100010\\\"></div><div class=\\\"ssd-module-wrap\\\" >\\n <div id=\\\"ssd-vc-goods\\\" class=\\\"ssd-module ssd-module-goods M15541052686741\\\" data-id=\\\"M15541052686741\\\">\\n <ul class=\\\"ssd-goods-4\\\">\\n <li>\\n <a >\\n <div class=\\\"ssd-good-item\\\">\\n <div class=\\\"ssd-good-img\\\">\\n <img src=\\\"http://img30.360buyimg.com/n1/jfs/t26977/50/1495537803/456791/ca60d3de/5be4e374Nf8e94aa9.jpg\\\" alt=\\\"印尼进口 Tango威化饼干 进口威化饼干 零食礼盒 零食大礼包 潘多拉礼盒684g\\\"/>\\n </div>\\n <div class=\\\"ssd-good-info\\\">\\n <p class=\\\"ssd-good-name\\\">\\n 印尼进口 Tango威化饼干 进口威化饼干 零食礼盒 零食大礼包 潘多拉礼盒684g\\n </p>\\n </div>\\n </div>\\n </a>\\n </li>\\n <li>\\n <a >\\n <div class=\\\"ssd-good-item\\\">\\n <div class=\\\"ssd-good-img\\\">\\n <img src=\\\"http://img30.360buyimg.com/n1/jfs/t23938/203/1285847551/405421/27964aa9/5b57e55eN969c2d3f.jpg\\\" alt=\\\"印尼进口 Tango威化饼干 休闲零食 饼干 咔芙尔焦糖威化饼干73.5g\\\"/>\\n </div>\\n <div class=\\\"ssd-good-info\\\">\\n <p class=\\\"ssd-good-name\\\">\\n 印尼进口 Tango威化饼干 休闲零食 饼干 咔芙尔焦糖威化饼干73.5g\\n </p>\\n </div>\\n </div>\\n </a>\\n </li>\\n <li>\\n <a >\\n <div class=\\\"ssd-good-item\\\">\\n <div class=\\\"ssd-good-img\\\">\\n <img src=\\\"http://img30.360buyimg.com/n1/jfs/t28057/267/707899178/312718/1054f7be/5bfbda66Ne622ae83.jpg\\\" alt=\\\"印尼进口 Tango威化饼干 休闲零食 乳酪夹心威化饼干160g\\\"/>\\n </div>\\n <div class=\\\"ssd-good-info\\\">\\n <p class=\\\"ssd-good-name\\\">\\n 印尼进口 Tango威化饼干 休闲零食 乳酪夹心威化饼干160g\\n </p>\\n </div>\\n </div>\\n </a>\\n </li>\\n <li>\\n <a >\\n <div class=\\\"ssd-good-item\\\">\\n <div class=\\\"ssd-good-img\\\">\\n <img src=\\\"http://img30.360buyimg.com/n1/jfs/t24409/100/1278216587/342196/7f15ac48/5b580b36Nb9007958.jpg\\\" alt=\\\"印尼进口 Tango威化饼干 休闲零食 巧克力夹心威化饼干125g\\\"/>\\n </div>\\n <div class=\\\"ssd-good-info\\\">\\n <p class=\\\"ssd-good-name\\\">\\n 印尼进口 Tango威化饼干 休闲零食 巧克力夹心威化饼干125g\\n </p>\\n </div>\\n </div>\\n </a>\\n </li>\\n <li>\\n <a >\\n <div class=\\\"ssd-good-item\\\">\\n <div class=\\\"ssd-good-img\\\">\\n <img src=\\\"http://img30.360buyimg.com/n1/jfs/t1/26596/26/9557/317836/5c7f4fedE8e6d5730/940a4d2112e62fc3.jpg\\\" alt=\\\"印尼进口 Tango威化饼干 休闲零食 巧克力咔咔脆组合装320g(160g*2盒)\\\"/>\\n </div>\\n <div class=\\\"ssd-good-info\\\">\\n <p class=\\\"ssd-good-name\\\">\\n 印尼进口 Tango威化饼干 休闲零食 巧克力咔咔脆组合装320g(160g*2盒)\\n </p>\\n </div>\\n </div>\\n </a>\\n </li>\\n <li>\\n <a >\\n <div class=\\\"ssd-good-item\\\">\\n <div class=\\\"ssd-good-img\\\">\\n <img src=\\\"http://img30.360buyimg.com/n1/jfs/t1/16950/37/10436/362577/5c8741b5E238f9c4a/ad91f31e0b26302c.jpg\\\" alt=\\\"印尼进口 Tango威化饼干 休闲零食 咔咔脆威化饼干 泡泡糖味80g/盒\\\"/>\\n </div>\\n <div class=\\\"ssd-good-info\\\">\\n <p class=\\\"ssd-good-name\\\">\\n 印尼进口 Tango威化饼干 休闲零食 咔咔脆威化饼干 泡泡糖味80g/盒\\n </p>\\n </div>\\n </div>\\n </a>\\n </li>\\n <li>\\n <a >\\n <div class=\\\"ssd-good-item\\\">\\n <div class=\\\"ssd-good-img\\\">\\n <img src=\\\"http://img30.360buyimg.com/n1/jfs/t20488/87/2361646474/244765/b67e1c77/5b503ba8N075a3501.jpg\\\" alt=\\\"印尼进口 Tango威化饼干 休闲零食 咔咔脆威化饼干 牛奶味160g\\\"/>\\n </div>\\n <div class=\\\"ssd-good-info\\\">\\n <p class=\\\"ssd-good-name\\\">\\n 印尼进口 Tango威化饼干 休闲零食 咔咔脆威化饼干 牛奶味160g\\n </p>\\n </div>\\n </div>\\n </a>\\n </li>\\n <li>\\n <a >\\n <div class=\\\"ssd-good-item\\\">\\n <div class=\\\"ssd-good-img\\\">\\n <img src=\\\"http://img30.360buyimg.com/n1/jfs/t1/12175/32/10619/337857/5c8741e3E45420cc9/b3dab30dd73a7d8a.jpg\\\" alt=\\\"印尼进口 Tango威化饼干 休闲零食 咔咔脆威化饼干 草莓味80g/盒\\\"/>\\n </div>\\n <div class=\\\"ssd-good-info\\\">\\n <p class=\\\"ssd-good-name\\\">\\n 印尼进口 Tango威化饼干 休闲零食 咔咔脆威化饼干 草莓味80g/盒\\n </p>\\n </div>\\n </div>\\n </a>\\n </li>\\n </ul>\\n</div><div class=\\\"ssd-module M15518471203811 animate-M15518471203811\\\" data-id=\\\"M15518471203811\\\">\\n \\n</div>\\n<div class=\\\"ssd-module M15518471298134 animate-M15518471298134\\\" data-id=\\\"M15518471298134\\\">\\n \\n</div>\\n<div class=\\\"ssd-module M15518471291853 animate-M15518471291853\\\" data-id=\\\"M15518471291853\\\">\\n \\n</div>\\n<div class=\\\"ssd-module M15518471283932 animate-M15518471283932\\\" data-id=\\\"M15518471283932\\\">\\n \\n</div>\\n\\n</div>\\n<!-- 2019-07-01 10:02:50 --> \\n<style>.ssd-module-wrap{position:relative;margin:0 auto;width:750px;text-align:left;background-color:#fff}.ssd-module-wrap .ssd-module,.ssd-module-wrap .ssd-module-heading{width:750px;position:relative;overflow:hidden}.ssd-module-wrap .ssd-module{background-repeat:no-repeat;background-position:left top;background-size:100% 100%}.ssd-module-wrap .ssd-module-heading{background-repeat:no-repeat;background-position:left center;background-size:100% 100%}.ssd-module-wrap .ssd-module-heading .ssd-module-heading-layout{display:inline-block}.ssd-module-wrap .ssd-module-heading .ssd-widget-heading-ch{float:left;display:inline-block;margin:0 6px 0 15px;height:100%}.ssd-module-wrap .ssd-module-heading .ssd-widget-heading-en{float:left;display:inline-block;margin:0 15px 0 6px;height:100%}.ssd-module-wrap .ssd-widget-pic,.ssd-module-wrap .ssd-widget-text,.ssd-module-wrap .ssd-widget-line,.ssd-module-wrap .ssd-widget-rectangle,.ssd-module-wrap .ssd-widget-circle,.ssd-module-wrap .ssd-widget-triangle,.ssd-module-wrap .ssd-widget-table{position:absolute;overflow:hidden}.ssd-module-wrap .ssd-widget-rectangle{box-sizing:border-box;-moz-box-sizing:border-box;-webkit-box-sizing:border-box}.ssd-module-wrap .ssd-widget-table table{width:100%;height:100%}.ssd-module-wrap .ssd-widget-table td{position:relative;white-space:pre-line;word-break:break-all}.ssd-module-wrap .ssd-widget-pic img{display:block;width:100%;height:100%}.ssd-module-wrap .ssd-widget-text{line-height:1.5;word-break:break-all}.ssd-module-wrap .ssd-widget-text span{display:block;overflow:hidden;width:100%;height:100%;padding:0;margin:0;word-break:break-all;word-wrap:break-word;white-space:normal}.ssd-module-wrap .ssd-widget-link{position:absolute;left:0;top:0;width:100%;height:100%;background:transparent;z-index:100}.ssd-module-wrap .ssd-cell-text{position:absolute;top:0;left:0;right:0;width:100%;height:100%;overflow:auto}.ssd-module-wrap .M15541052686741{width:750px; height:492px}\\n.ssd-module-wrap .M15541052686741 ul {\\n padding: 5px;\\n line-height: 1.15;\\n background: #F3F4F7;\\n overflow: hidden;\\n}\\n\\n.ssd-module-wrap .M15541052686741 li {\\n list-style-type: none;\\n padding: 5px;\\n float: left;\\n -moz-box-sizing: border-box;\\n box-sizing: border-box;\\n}\\n\\n.ssd-module-wrap .M15541052686741 .ssd-goods-1 li {\\n width: 100%;\\n}\\n\\n.ssd-module-wrap .M15541052686741 .ssd-goods-2 li {\\n width: 50%;\\n}\\n\\n.ssd-module-wrap .M15541052686741 .ssd-goods-3 li {\\n width: 33.33%;\\n}\\n\\n.ssd-module-wrap .M15541052686741 .ssd-goods-4 li {\\n width: 25%;\\n}\\n\\n.ssd-module-wrap .M15541052686741 a {\\n display: block;\\n overflow: hidden;\\n}\\n\\n.ssd-module-wrap .M15541052686741 .ssd-good-item {\\n background-color: #fff;\\n}\\n\\n.ssd-module-wrap .M15541052686741 .ssd-good-img {\\n position: relative;\\n padding-top: 100%;\\n}\\n\\n.ssd-module-wrap .M15541052686741 .ssd-good-img img {\\n position: absolute;\\n top: 0;\\n left: 0;\\n width: 100%;\\n height: 100%;\\n}\\n\\n.ssd-module-wrap .M15541052686741 .ssd-good-info {\\n padding: 10px;\\n margin: 0;\\n}\\n\\n.ssd-module-wrap .M15541052686741 .ssd-good-name {\\n margin: 0;\\n height: 36px;\\n line-height: 18px;\\n font-size: 14px;\\n color: #333333;\\n display: -webkit-box;\\n overflow: hidden;\\n text-overflow: ellipsis;\\n -webkit-line-clamp: 2;\\n -webkit-box-orient: vertical; \\n}.ssd-module-wrap .M15518471203811{width:750px; background-color:#e9e9e9; background-size:100% 100%; background-image:url(http://img30.360buyimg.com/sku/jfs/t1/31717/2/4671/349535/5c7f4f07E899abe1e/9dd81eaf2aac0863.jpg); height:1083px}\\n.ssd-module-wrap .M15518471298134{width:750px; background-color:#e9e9e9; background-size:100% 100%; background-image:url(http://img30.360buyimg.com/sku/jfs/t1/14459/14/9500/215997/5c7f4f06E886e02de/9de0bdce8ff65b3c.jpg); height:786px}\\n.ssd-module-wrap .M15518471291853{width:750px; background-color:#e9e9e9; background-size:100% 100%; background-image:url(http://img30.360buyimg.com/sku/jfs/t1/25970/3/9647/494996/5c7f4f07E79829fc4/41a47699929ca408.jpg); height:1416px}\\n.ssd-module-wrap .M15518471283932{width:750px; background-color:#e9e9e9; background-size:100% 100%; background-image:url(http://img30.360buyimg.com/sku/jfs/t1/79600/29/3390/325766/5d196888Ee80899ac/6b260d5e4eab426d.jpg); height:1020px}\\n</style>";
//商品详情模板
String goodsDescTemplate = "<p><img src=%s data-width=750 data-height=%s /></p>"; //定义提取图片URL和height值的正则表达式,提取的字段用group的()语法
Pattern pattern = Pattern.compile("background-image:url\\((https?://.*)\\).*height:(\\d+)"); //研究原串后,先以尺寸进行分组
String[] split = goodsDesc.split("px}");
for (String s : split) {
if (s.contains("background-image:url")){ //过去掉不含背景图片的数据
Matcher matcher = pattern.matcher(s); //指定匹配器
while (matcher.find()){ //进行查找,并判断是否匹配
System.out.println("匹配到的字符串:"+ matcher.group());
System.out.println("提取的图片地址:"+ matcher.group(1));
System.out.println("提取的height值:"+ matcher.group(2));
stringBuilder.append(String.format(goodsDescTemplate, matcher.group(1), matcher.group(2)));
}
}
} System.out.println("拼接的字符串:"+ stringBuilder);
}

三、打印日志

匹配到的字符串:background-image:url(http://img30.360buyimg.com/sku/jfs/t1/31717/2/4671/349535/5c7f4f07E899abe1e/9dd81eaf2aac0863.jpg); height:1083
提取的图片地址:http://img30.360buyimg.com/sku/jfs/t1/31717/2/4671/349535/5c7f4f07E899abe1e/9dd81eaf2aac0863.jpg
提取的height值:1083
匹配到的字符串:background-image:url(http://img30.360buyimg.com/sku/jfs/t1/14459/14/9500/215997/5c7f4f06E886e02de/9de0bdce8ff65b3c.jpg); height:786
提取的图片地址:http://img30.360buyimg.com/sku/jfs/t1/14459/14/9500/215997/5c7f4f06E886e02de/9de0bdce8ff65b3c.jpg
提取的height值:786
匹配到的字符串:background-image:url(http://img30.360buyimg.com/sku/jfs/t1/25970/3/9647/494996/5c7f4f07E79829fc4/41a47699929ca408.jpg); height:1416
提取的图片地址:http://img30.360buyimg.com/sku/jfs/t1/25970/3/9647/494996/5c7f4f07E79829fc4/41a47699929ca408.jpg
提取的height值:1416
匹配到的字符串:background-image:url(http://img30.360buyimg.com/sku/jfs/t1/79600/29/3390/325766/5d196888Ee80899ac/6b260d5e4eab426d.jpg); height:1020
提取的图片地址:http://img30.360buyimg.com/sku/jfs/t1/79600/29/3390/325766/5d196888Ee80899ac/6b260d5e4eab426d.jpg
提取的height值:1020
拼接的字符串:<p><img src=http://img30.360buyimg.com/sku/jfs/t1/31717/2/4671/349535/5c7f4f07E899abe1e/9dd81eaf2aac0863.jpg data-width=750 data-height=1083 /></p><p><img src=http://img30.360buyimg.com/sku/jfs/t1/14459/14/9500/215997/5c7f4f06E886e02de/9de0bdce8ff65b3c.jpg data-width=750 data-height=786 /></p><p><img src=http://img30.360buyimg.com/sku/jfs/t1/25970/3/9647/494996/5c7f4f07E79829fc4/41a47699929ca408.jpg data-width=750 data-height=1416 /></p><p><img src=http://img30.360buyimg.com/sku/jfs/t1/79600/29/3390/325766/5d196888Ee80899ac/6b260d5e4eab426d.jpg data-width=750 data-height=1020 /></p>

使用正则提取字符串中URL等信息的更多相关文章

  1. python3 re模块正则匹配字符串中的时间信息

    匹配时间: # -*- coding:utf-8 -*- import re def parseDate(l): patternForTime = r'(\d{4}[\D]\d{1,2}[\D]\d{ ...

  2. java 正则提取字符串中的电话号码

    public static void test2() { String str = "张三:13539558064,李四:15626829748,赵六:13718952204"; ...

  3. php通过正则从字符串中获取所有图片url地址

    /** * 提取字符串中图片url地址 * @param type $str * @return type */ function getimgs($str) { $reg = '/((http|ht ...

  4. 【面试题】JS使用parseInt()、正则截取字符串中数字

    JS使用parseInt()和正则截取字符串中数字 点击打开视频讲解更加详细 parseInt() 函数 定义和用法 parseInt() 函数可解析一个字符串,并返回一个整数. 当参数 radix ...

  5. C++ 提取字符串中的数字

    C++ 提取字符串中的数字 #include <iostream> using namespace std; int main() { ] = "1ab2cd3ef45g&quo ...

  6. 正则去除字符串中的html标签,但不去除<br>标签

    一.去除html标签 filterHTMLTag(msg) { var msg = msg.replace(/<\/?[^>]*>/g, ''); //去除HTML Tag msg ...

  7. PHP用正则匹配字符串中的特殊字符防SQL注入

    本文出至:新太潮流网络博客 /** * [用正则匹配字符串中的特殊字符] * @E-mial wuliqiang_aa@163.com * @TIME 2017-04-07 * @WEB http:/ ...

  8. fortran中提取字符串中可见字符的索引

    fortran中常常需要提取字符串中可见字符的索引,下面是个小例子: !============================================================= su ...

  9. PHP提取字符串中的手机号正则表达式怎么写

    0. 简介 PHP通过正则表达式提取字符串中的手机号并判断运营商,简单快速方便,能提取多个手机号. 1. 代码 <?php header("content-type:text/plai ...

随机推荐

  1. Zabbbix之十二------Zabbix实现微信报警通知及创建聚合图形

    实战一:实现zabbix监控微信报警 1.在企业微信上注册账号 1.注册企业微信,管理员需要写上自己的真实姓名,扫描以下的二维码,与微信关联真实姓名. 2.登陆企业微信,然后创建一个微信故障通知应用 ...

  2. Spark学习之路 (七)Spark 运行流程[转]

    Spark中的基本概念 (1)Application:表示你的应用程序 (2)Driver:表示main()函数,创建SparkContext.由SparkContext负责与ClusterManag ...

  3. LOJ#508. 「LibreOJ NOI Round #1」失控的未来交通工具

    题意 一个带边权无向图,有两种操作:加边以及询问在\(x,x+b,...,x+(c-1)b\)这些数中,有多少个数存在至少一条与之模\(m\)同余的从\(u\)到\(v\)的路径(可以不是简单路径). ...

  4. SDN-数据控制分离

    严格来说,控制面与数据面分离并不是SDN的专利.从一个chassis角度看,传统路由器其实控制面和转发面也是分离的.Route-enginee和line card分别负责控制面板和转发面.但是传统网络 ...

  5. PAT (Advanced Level) Practice 1019 General Palindromic Number (20 分) (进制转换,回文数)

    A number that will be the same when it is written forwards or backwards is known as a Palindromic Nu ...

  6. 2级搭建类204-Oracle 12cR2 SI ASM 图形化搭建(RHEL7.6)

    红帽RHEL 7.6上搭建Oracle 12cR2 ASM单实例 我给你们说,不是自家的产品,那贼麻烦,你是不是觉得在 红帽 7.6 上搞 12c ASM 觉得应该/好像/可能/或许/貌似/大概/也许 ...

  7. 二叉树(5)HuffmanTree

    构建一棵 HuffmanTree. 测试代码 main.cpp: #include <iostream> #include "HuffmanTree.h" using ...

  8. 02-SV数据类型

    1.数据类型 内建数据类型:逻辑(logic)类型.双状态数据类型(bit,byte,shortint,int,longint).四状态数据类型(integer,time,real) 其他:定宽数组. ...

  9. K3/cloud执行计划插件示例

    public class AutoCheckInventory : IScheduleService { /// <summary>        /// 实际运行的Run 方法      ...

  10. ASP.NET Razor 常用示例

    1.在网页中显示@符号 使用@@即可使编译器不切换到c#,这样在网页中会显示一个@符号. 2.隐式表达式 也就是正常的razor语法,不能包含空格.(除了await 如:<p>@await ...