【alg4-查找】符号表
符号表定义:符号表是一种存储键值对的数据结构,支持两种操作:插入(put),即将一组新的键值对存入表中;查找(get),即根据给定的键得到相应的值。符号表最主要的目的就是将一个键和一个值联系起来。符号表是一种典型的抽象数据类型。符号表具体实现无序链表中的顺序查找APIpublic class ST<Key, Value>API功能ST()创建一张符号表void put(Key key,
符号表
定义:符号表是一种存储键值对的数据结构,支持两种操作:插入(put),即将一组新的键值对存入表中;查找(get),即根据给定的键得到相应的值。
符号表最主要的目的就是将一个键和一个值联系起来。
符号表是一种典型的抽象数据类型。
符号表具体实现
无序链表中的顺序查找
API
public class ST<Key, Value>
API | 功能 |
---|---|
ST() | 创建一张符号表 |
void put(Key key, Value val) | 将键值对存入表中(若值为空则将键key从表中删除) |
Value get(Key key) | 获取键key对应的值(若键key不存在则返回null) |
void delete(Key key) | 从表中删去键key(及其对应的值) |
boolean contains(Key key) | 键key在表中是否有对应的值 |
boolean isEmpty() | 表是否为空 |
int size() | 表中的键值对数量 |
Iterable<Key> keys() | 表中的所有键的集合 |
具体实现的设计决策
泛型:对于符号表,我们通过明确地指定查找时键和值的类型来区分它们的不同角色。
重复的键:我们的所有实现都遵循以下规则:
- 每个键只对应着一个值(表中不允许存在重复的键);
- 当用例代码向表中存入的键值对和表中已有的键(及关联的值)冲突时,新的值会替代旧的值。
空键:键不能空。和java中的许多其他机制一样,使用空键会产生一个运行时异常。
空值:
- 我们可以用get()方法是否返回空来测试给定的键是否存在于符号表中;
- 我们可以将空值null作为put()方法的第二个参数存入表中来实现删除。
迭代:为了方便用例处理表中的所有键值,我们有时会在API的第一行加上implements Iterable<Key>
这句话,强制所有实现都必须包含iterator()方法来返回一个实现了hasNext()和next()方法的迭代器。但是对于符号表我们采用了一个更简单的方法。我们定义了keys()方法来返回一个Iterable<Key>对象以方便用例遍历所有的键。这么做是为了和以后的有序符号表的所有方法保持一致,使得用例可以遍历表的键集的一个指定的部分。
符号表中使用的数据结构的一个简单选择是链表,每个结点存储一个键值对。
顺序查找:在查找中我们一个一个地顺序遍历符号表中地所有键并使用equals()方法来寻找与被查找的键匹配的键。
代码
package section3_1;
import java.util.ArrayList;
import java.util.List;
public class SequentialSearchST<Key, Value> {
private int n;
private Node first;
private class Node {
Key key;
Value val;
Node next;
public Node(Key key, Value val, Node next) {
this.key = key;
this.val = val;
this.next = next;
}
}
public SequentialSearchST() {
}
public int size() {
return n;
}
public boolean isEmpty() {
return size() == 0;
}
public boolean contains(Key key) {
if (key == null) throw new IllegalArgumentException("argument to contains() is null");
return get(key) != null;
}
public void delete(Key key) {
if (key == null) throw new IllegalArgumentException("argument to delete() is null");
first = delete(first,key);
}
private Node delete(Node x, Key key) {
if (x == null) return null;
if (key.equals(x.key)) {
n--;
return x.next;
}
x.next = delete(x.next,key);
return x;
}
public Value get(Key key) {
if (key == null) throw new IllegalArgumentException("argument to get() is null");
for (Node x = first;x != null;x = x.next) {
if (key.equals(x.key)) return x.val;
}
return null;
}
public void put(Key key, Value value) {
if (key == null) throw new IllegalArgumentException("first argument to put() is null");
if (value == null) {
delete(key);
return;
}
for (Node x = first;x != null;x = x.next) {
if (key.equals(x.key)) {
x.val = value;
return;
}
}
first = new Node(key,value,first);
n++;
}
public Iterable<Key> keys() {
List<Key> list = new ArrayList<>();
for (Node x = first;x != null;x=x.next) {
list.add(x.key);
}
return list;
}
public static void main(String[] args) {
int minlen = 1;
SequentialSearchST<String,Integer> st = new SequentialSearchST<>();
String[] words = new String[]{
"it","was","the","best","of","times","it","was","the","worst","of","times",
"it","was","the","age","of","wisdom","it","was","the","age","of","foolishness",
"it","was","the","epoch","of","belief","it","was","the","epoch","of","incredulity",
"it","was","the","season","of","light","it","was","the","season","of","darkness",
"it","was","the","spring","of","hope","it","was","the","winter","of","despair"
};
int idx = 0;
while (idx < words.length) {
String word = words[idx];
if (word.length() < minlen) {
idx++;
continue;
}
if (!st.contains(word)) {
st.put(word,1);
} else {
st.put(word,st.get(word)+1);
}
idx++;
}
System.out.println(st.size());
// String max = " ";
// st.put(max,0);
idx = 0;
for (String w:st.keys()) {
// if (st.get(w) > st.get(max)) {
// max = w;
// }
System.out.println(w);
idx++;
}
// System.out.println(max + " " + st.get(max));
System.out.println("idx="+idx);
}
}
在含有N对键值的基于(无序)链表的符号表中,未命中的查找和插入操作都需要N次比较。命中的查找在最坏情况下需要N次比较。特别地,向一个空表中插入N个不同的键需要~N^2/2次比较。
基于链表的实现以及顺序查找是非常低效的,无法满足处理庞大输入问题的需求。
有序数组中的二分查找
API
典型应用程序中,键都是Comparable的对象,因此可以使用a.compareTo(b)
来比较a和b两个键。许多符号表的实现都利用了Comparable接口带来的键的有序性来更好地实现put()和get()方法。更重要的是在这些实现中,我们可以认为符号表都会保持键的有序并大大扩展它的API,根据键的相对位置定义更多实用的操作。
public class ST<Key extends Comparable<Key>, Value>
API | 功能 |
---|---|
ST() | 创建一张符号表 |
void put(Key key, Value val) | 将键值对存入表中(若值为空则将键key从表中删除) |
Value get(Key key) | 获取键key对应的值(若键key不存在则返回null) |
void delete(Key key) | 从表中删去键key(及其对应的值) |
boolean contains(Key key) | 键key在表中是否有对应的值 |
boolean isEmpty() | 表是否为空 |
int size() | 表中的键值对数量 |
Key min() | 最小的键 |
Key max() | 最大的键 |
Key floor(Key key) | 小于等于key的最大键 |
Key ceiling(Key key) | 大于等于key的最小键 |
int rank(Key key) | 下于key的键的数量 |
Key select(int k) | 排名为k的键 |
void deleteMin() | 删除最小的键 |
void deleteMax() | 删除最大的键 |
int size(Key lo, Key hi) | [lo…hi]之间键的数量 |
Iterable<Key> keys(Key lo,Key hi) | [lo…hi]之间的所有键,已排序 |
Iterable<Key> keys() | 表中的所有键的集合,已排序 |
有序符号表API的实现,它使用的数据结构是一对平行的数组,一个存储键一个存储值。保证数组中Comparable类型的键有序,然后使用数组的索引来高效地实现get()和其他操作。
使用基于有序数组的符号表实现的索引用例的轨迹:
代码
package section3_1;
import java.util.ArrayList;
import java.util.List;
import java.util.NoSuchElementException;
public class BinarySearchST <Key extends Comparable<Key>,Value> {
private Key[] keys;
private Value[] values;
private int N;
public BinarySearchST(int capacity) {
keys = (Key[]) new Comparable[capacity];
values = (Value[]) new Object[capacity];
}
private void resize(int capacity) {
assert capacity >= N;
Key[] tmpk = (Key[]) new Comparable[capacity];
Value[] tmpv = (Value[]) new Object[capacity];
for (int i = 0;i < N;i++) {
tmpk[i] = keys[i];
tmpv[i] = values[i];
}
keys = tmpk;
values = tmpv;
}
public int size() {
return N;
}
public int size(Key lo, Key hi) {
if (lo == null) throw new IllegalArgumentException("first argument to size() is null");
if (hi == null) throw new IllegalArgumentException("second argument to size() is null");
if (lo.compareTo(hi) > 0) return 0;
if (contains(hi)) return rank(hi) - rank(lo) + 1;
else return rank(hi) - rank(lo);
}
public boolean isEmpty() {
return size() == 0;
}
public int rank(Key key) {
if (key == null) throw new IllegalArgumentException("argument to rank() is null");
int lo = 0;
int hi = N-1;
while (lo<=hi) {
int mid = lo + (hi - lo) / 2;
int cmp = key.compareTo(keys[mid]);
if (cmp < 0) {
hi = mid - 1;
} else if (cmp > 0) {
lo = mid + 1;
} else {
return mid;
}
}
return lo;
}
public Value get(Key key) {
if (key == null) throw new IllegalArgumentException("argument to get() is null");
if (isEmpty()) return null;
int i = rank(key);
if (i < N && key.compareTo(keys[i]) == 0) {
return values[i];
} else {
return null;
}
}
public void put(Key key, Value value) {
if (key == null) throw new IllegalArgumentException("first argument to put() is null");
if (value == null) {
delete(key);
return;
}
int i = rank(key);
if (i < N && key.compareTo(keys[i]) == 0) {
values[i] = value;
return;
}
if (N == keys.length) resize(keys.length*2);
for (int j = N;j>i;j--) {
keys[j] = keys[j-1];
values[j] = values[j-1];
}
keys[i] = key;
values[i] = value;
N++;
}
public void delete(Key key) {
if (key == null) throw new IllegalArgumentException("argument to delete() is null");
if (isEmpty()) return;
int i = rank(key);
if (i == N && key.compareTo(keys[i]) != 0) return;
for (int j = i;j<N-1;j++) {
keys[j] = keys[j+1];
values[j] = values[j+1];
}
N--;
keys[N-1] = null;
values[N-1] = null;
if (N > 0 && N == keys.length/4) resize(keys.length/2);
}
public void deleteMin() {
if (isEmpty()) throw new NoSuchElementException("Symbol table underflow error");
delete(min());
}
public void deleteMax() {
if (isEmpty()) throw new NoSuchElementException("Symbol table underflow error");
delete(max());
}
public Key min() {
if (isEmpty()) throw new NoSuchElementException("called min() with empty symbol table");
return keys[0];
}
public Key max() {
if (isEmpty()) throw new NoSuchElementException("called max() with empty symbol table");
return keys[N-1];
}
public Key select(int k) {
if (k < 0 || k >= size()) {
throw new IllegalArgumentException("called select() with invalid argument: " + k);
}
return keys[k];
}
public boolean contains(Key key) {
if (key == null) throw new IllegalArgumentException("argument to contains() is null");
return get(key) != null;
}
public Key ceiling(Key key) {
if (key == null) throw new IllegalArgumentException("argument to ceiling() is null");
int i = rank(key);
if (i == N) return null;
else return keys[i];
}
public Key floor(Key key) {
if (key == null) throw new IllegalArgumentException("argument to floor() is null");
int i = rank(key);
if (i<N && key.compareTo(keys[i])==0) return key;
else if (i == 0) return null;
else return keys[i-1];
}
public Iterable<Key> keys() {
return keys(min(),max());
}
public Iterable<Key> keys(Key lo, Key hi) {
if (lo == null) throw new IllegalArgumentException("first argument to keys() is null");
if (hi == null) throw new IllegalArgumentException("second argument to keys() is null");
List<Key> list = new ArrayList<>();
for (int i = rank(lo);i<rank(hi);i++) {
list.add(keys[i]);
}
if (contains(hi)) {
list.add(keys[rank(hi)]);
}
return list;
}
public static void main(String[] args) {
int minlen = 1;
BinarySearchST<String,Integer> bst = new BinarySearchST<>(100);
String[] words = new String[]{
"it","was","the","best","of","times","it","was","the","worst","of","times",
"it","was","the","age","of","wisdom","it","was","the","age","of","foolishness",
"it","was","the","epoch","of","belief","it","was","the","epoch","of","incredulity",
"it","was","the","season","of","light","it","was","the","season","of","darkness",
"it","was","the","spring","of","hope","it","was","the","winter","of","despair"
};
int idx = 0;
while (idx < words.length) {
String word = words[idx];
if (word.length() < minlen) {
idx++;
continue;
}
if (!bst.contains(word)) {
bst.put(word,1);
} else {
bst.put(word,bst.get(word)+1);
}
idx++;
}
String max = " ";
bst.put(max,0);
for (String w:bst.keys()) {
if (bst.get(w) > bst.get(max)) {
max = w;
}
}
System.out.println(max + " " + bst.get(max));
}
}
我们使用有序数组存储键的原因是,经典二分查找法能够根据数组的索引大大减少每次查找所需的比较次数。
对于二分查找的分析
在N个键的有序数组中进行二分查找最多需要lgN+1次比较(无论是否成功)。
尽管能够保证查找所需的时间是对数级别的,BinarySearchST仍然无法支持我们处理大型问题,因为put()方法还是太慢了。二分查找减少了比较的次数但无法减少运行所需时间,因为它无法改变以下事实:在键是随机排序的情况下,构造一个基于有序数组的符号表所需要访问数组的次数是数组长度的平方级别。
向大小为N的有序数组中插入一个新的元素在最坏情况下需要访问~2N次数组,因此向一个空符号表中插入N个元素在最坏情况下需要访问~N^2次数组。
总结
现代应用需要同时能够支持高效的查找和插入两种操作的符号表实现。也就是说,我们需要在构造庞大的符号表的同时能够任意插入(也许还有删除)键值对,同时也要能够完成查找操作。
更多推荐
所有评论(0)