需求概要

1.读取文件,文件内包可含英文字符,及常见标点,空格级换行符。

2.统计英文单词在本文件的出现次数

3.将统计结果排序

4.显示排序结果

分析

1.读取文件可使用BufferedReader类按行读取

2.针对读入行根据分隔符拆分出单词,使用java.util工具提供的Map记录单词和其出现次数的信息,HashMap和TreeMap均可,如果排序结果按字母序可选用TreeMap,本例选择用HashMap。

3.将Map中存储的键值对,存入List中,List中的元素为二维字符串数组,并将单词出现次数作为比对依据,利用list提供的sort方法排序,需要实现Comparator.compare接口方法。

4.循环遍历List中的String[],输出结果。

部分功能实现

输出结果

 public void printSortedWordGroupCount(String filename) {
  List<String[]> result = getSortedWordGroupCount(filename);
  if (result == null) {
    System.out.println("no result");
  return;
  }
  for (String[] sa : result) {
    System.out.println(sa[1] + ": " + sa[0]);
  }
}

统计

 public Map<String, Integer> getWordGroupCount(String filename) {
  try {
  FileReader fr = new FileReader(filename);
  BufferedReader br = new BufferedReader(fr);
  String content = "";
  Map<String, Integer> result = new HashMap<String, Integer>();
  while ((content = br.readLine()) != null) {
    StringTokenizer st = new StringTokenizer(content, "!&(){}+-= ':;<> /\",");
    while (st.hasMoreTokens()) {
    String key = st.nextToken();
    if (result.containsKey(key))
      result.put(key, result.get(key) + 1);
    else
      result.put(key, 1);
    }
  }
  br.close();
  fr.close();
  return result;   } catch (FileNotFoundException e) {
    System.out.println("failed to open file:" + filename);
    e.printStackTrace();
  } catch (Exception e) {
    System.out.println("some expection occured");
    e.printStackTrace();
  }
    return null;
}

排序

     public List<String[]> getSortedWordGroupCount(String filename) {
Map<String, Integer> result = getWordGroupCount(filename);
if (result == null)
return null;
List<String[]> list = new LinkedList<String[]>();
Set<String> keys = result.keySet();
Iterator<String> iter = keys.iterator();
while (iter.hasNext()) {
String key = iter.next();
String[] item = new String[2];
item[1] = key;
item[0] = "" + result.get(key);
list.add(item);
}
list.sort(new Comparator<String[]>() {
public int compare(String[] s1, String[] s2) {
return Integer.parseInt(s2[0])-Integer.parseInt(s1[0]); }
});
return list;
}

针对本程序的简易测试用例自动生成

使用随机数产生随机单词,空格,并在随机位置插入回车

 public boolean gernerateWordList(String filePosition, long wordNum) {
  FileWriter fw = null;
  StringBuffer sb = new StringBuffer("");
  try {
    fw = new FileWriter(filePosition);     for (int j = 0; j < wordNum; j++) {
    int length = (int) (Math.random() * 10 + 1);
    for (int i = 0; i < length; i++) {
      char ch = (char) ('a' + (int) (Math.random() * 26));
      sb.append(ch);
    }
    fw.write(sb.toString());
    fw.write(" ");
    sb = new StringBuffer("");
    if (wordNum % (int) (Math.random() * 8 + 4) == 0)
      fw.write("\n");
    }
  } catch (IOException e) {
    e.printStackTrace();
    return false;
  } finally {
    try {
      fw.close();
    } catch (IOException e) {
      e.printStackTrace();
      System.out.println("failed to close file");
      return false;
    }
   }
   return false;
  }
}

实际用例结果

源自维基百科specification词条节选

"Specification" redirects here. For other uses, see Specification (disambiguation).
There are different types of specifications, which generally are mostly types of documents, forms or orders or relates to information in databases. The word specification is defined as "to state explicitly or in detail" or "to be specific". A specification may refer to a type of technical standard (the main topic of this page). Using a word "specification" without additional information to what kind of specification you refer to is confusing and considered bad practice within systems engineering. A requirement specification is a set of documented requirements to be satisfied by a material, design, product, or service.[1] A functional specification is closely related to the requirement specification and may show functional block diagrams. A design or product specification describes the features of the solutions for the Requirement Specification, referring to the designed solution or final produced solution. Sometimes the term specification is here used in connection with a data sheet (or spec sheet). This may be confusing. A data sheet describes the technical characteristics of an item or product as designed and/or produced. It can be published by a manufacturer to help people choose products or to help use the products. A data sheet is not a technical specification as described in this article. A "in-service" or "maintained as" specification, specifies the conditions of a system or object after years of operation, including the effects of wear and maintenance (configuration changes). Specifications may also refer to technical standards, which may be developed by any of various kinds of organizations, both public and private. Example organization types include a corporation, a consortium (a small group of corporations), a trade association (an industry-wide group of corporations), a national government (including its military, regulatory agencies, and national laboratories and institutes), a professional association (society), a purpose-made standards organization such as ISO, or vendor-neutral developed generic requirements. It is common for one organization to refer to (reference, call out, cite) the standards of another. Voluntary standards may become mandatory if adopted by a government or business contract.

  

结果截取

a:  16
of: 16
or: 15
to: 14
the: 12
specification: 11
is: 7
A: 7
and: 7
may: 6
in: 5
.: 5
as: 5
be: 5
standards: 4
by: 4
technical: 4
sheet: 4
refer: 4
product: 3
Specification: 3
organization: 3
data: 3
types: 3
an: 2
word: 2
functional: 2
association: 2
It: 2
government: 2
are: 2
national: 2
describes: 2
help: 2
information: 2
developed: 2
group: 2
which: 2
including: 2
this: 2
corporations: 2
for: 2
design: 2
designed: 2
requirement: 2
mostly: 1
practice: 1
bad: 1
products.: 1
considered: 1
type: 1
without: 1
years: 1
professional: 1

  

可进行的拓展

建立重载函数,添加参数,可以根据用户需要倒序排列统计结果,即按照单词出现次数从少到多排序。

工程源码包地址https://coding.net/u/jx8zjs/p/wordCount/git

git@git.coding.net:jx8zjs/wordCount.git

英文词频统计的java实现方法的更多相关文章

  1. 词频统计的java实现方法——第一次改进

    需求概要 原需求 1.读取文件,文件内包可含英文字符,及常见标点,空格级换行符. 2.统计英文单词在本文件的出现次数 3.将统计结果排序 4.显示排序结果 新需求: 1.小文件输入. 为表明程序能跑 ...

  2. 效能分析——词频统计的java实现方法的第一次改进

    java效能分析可以使用JProfiler 词频统计处理的文件为WarAndPeace,大小3282KB约3.3MB,输出结果到文件 在程序本身内开始和结束分别加入时间戳,差值平均为480-490ms ...

  3. Python——字符串、文件操作,英文词频统计预处理

    一.字符串操作: 解析身份证号:生日.性别.出生地等. 凯撒密码编码与解码 网址观察与批量生成 2.凯撒密码编码与解码 凯撒加密法的替换方法是通过排列明文和密文字母表,密文字母表示通过将明文字母表向左 ...

  4. 组合数据类型,英文词频统计 python

    练习: 总结列表,元组,字典,集合的联系与区别.列表,元组,字典,集合的遍历. 区别: 一.列表:列表给大家的印象是索引,有了索引就是有序,想要存储有序的项目,用列表是再好不过的选择了.在python ...

  5. Hadoop的改进实验(中文分词词频统计及英文词频统计)(4/4)

    声明: 1)本文由我bitpeach原创撰写,转载时请注明出处,侵权必究. 2)本小实验工作环境为Windows系统下的百度云(联网),和Ubuntu系统的hadoop1-2-1(自己提前配好).如不 ...

  6. 1.字符串操作:& 2.英文词频统计预处理

    1.字符串操作: 解析身份证号:生日.性别.出生地等. ID = input('请输入十八位身份证号码: ') if len(ID) == 18: print("你的身份证号码是 " ...

  7. Programming | 中/ 英文词频统计(MATLAB实现)

    一.英文词频统计 英文词频统计很简单,只需借助split断句,再统计即可. 完整MATLAB代码: function wordcount %思路:中文词频统计涉及到对"词语"的判断 ...

  8. python字符串操作、文件操作,英文词频统计预处理

    1.字符串操作: 解析身份证号:生日.性别.出生地等. 凯撒密码编码与解码 网址观察与批量生成 解析身份证号:生日.性别.出生地等 def function3(): print('请输入身份证号') ...

  9. python复合数据类型以及英文词频统计

    这个作业的要求来自于:https://edu.cnblogs.com/campus/gzcc/GZCC-16SE1/homework/2753. 1.列表,元组,字典,集合分别如何增删改查及遍历. 列 ...

随机推荐

  1. 20155238 《Java程序设计》实验一(Java开发环境的熟悉)实验报告

    实验内容 使用JDK编译.运行简单的Java程序. 使用Eclipse 编辑.编译.运行.调试Java程序. 实现学生成绩管理功能,并进行测试. 实验步骤及结果 (一)命令行下Java程序开发 编译运 ...

  2. 20155307刘浩《网络对抗》逆向及Bof基础

    20155307刘浩<网络对抗>逆向及Bof基础 实践目标 本次实践的对象是一个名为pwn1的linux可执行文件.该程序正常执行流程是:main调用foo函数,foo函数会回显任何用户输 ...

  3. 【HAOI2010】软件安装

    题面 题解 缩点之后一个裸的树型背包 代码 #include<cstdio> #include<cstring> #include<algorithm> #defi ...

  4. 5290: [Hnoi2018]道路

    5290: [Hnoi2018]道路 链接 分析: 注意题目中说每个城市翻新一条连向它的公路或者铁路,所以两种情况分别转移一下即可. 注意压一下空间,最后的叶子节点不要要访问,空间少了一半. 代码: ...

  5. Redis学习之路(一)之缓存知识体系

    转自:https://www.unixhot.com/page/cache 缓存分层 缓存分级 内容 内容简介/主要技术关键词 用户层 DNS 浏览器DNS缓存 Firefox默认60秒,HTML5的 ...

  6. [webapp]ios safari 正确使用js跳转

    在safari上,以往屡试不爽的location.href = url; 变得不好用了.使用该方法跳转到新的网页,无法使用后退按钮回到上个页面.想想也是,直接修改值得方式跳转总是怪怪的,但是从刚学网页 ...

  7. django套用模板404报错小结

    首先,我的项目名是MyProject.每次当我运行,然后测试页面的时候,总是弹出 其实根据stackoverflow上某大佬的解释大意就是在setting.py和urls.py的匹配上出了问题 此处放 ...

  8. 《杜增强讲Unity之Tanks坦克大战》3-添加坦克

    3 添加坦克 3.1 本节效果预览   3.2 另存新场景 首先打开上次的场景s1,另存为s2,放到同一个文件夹下面.   3.3 添加坦克模型 在Model文件夹下面找到Tank模型   将Tank ...

  9. RabbitMQ入门:发布/订阅(Publish/Subscribe)

    在前面的两篇博客中 RabbitMQ入门:Hello RabbitMQ 代码实例 RabbitMQ入门:工作队列(Work Queue) 遇到的实例都是一个消息只发送给一个消费者(工作者),他们的消息 ...

  10. golang应用打包成docker镜像

    golang编译的应用是不需要依赖其他运行环境的,那么为什么还需要打包成docker镜像呢?当需要附带配置和日志等文件时可以更方便的移植和运行,下面介绍从dockerfile编译成镜像. 在项目根目录 ...