Sas data update duplicates
Update: As Robert Matthews commented , you can alter this code to just obtain the duplicates - this would be more efficient if you have many single observations and few duplicate observations, and you don't care about the singles.
Here is the logic behind the code. We are producing 2 data sets: single for the observations that appear only once, and dup for the duplicate observations. If first. Otherwise , it's a duplicate , and this observation belongs to dup. Here is what you get as the output data set.
Notice that it has the correct number of duplicates for each value. Here's an alternative if the original data is not sorted by the variable whose duplicates are wanted, but big enough to make using proc sort expensive in terms of computer resources.
This single DATA step outputs all duplicates without pre-sorting the data set, but produces the duplicates in sorted order. To identify that a new record is a duplicate either needs some kind of merge or for the "master" table to be indexed on the keys that would define duplicates.
Proc Append is very effective without duplicates check because of that reason. See the examples in the online doc. Good Luck!!! Kannan Deivasigamani.
Posted AM views In reply to kannand. Recommended by SAS. For personalized recommendations, sign in with your SAS profile. Discussion stats. The subsetting if statement merely tests whether the record in sum is found in the hash object.
Unlke the check method, which only confirms the existence of an appropriate match in hash object h, the find method actually retrieves all the data upon successful matching. The data from that file are read in the declare hash statement. View solution in original post. Thanks so much, would really appreciate any further advice that you have! I believe the solution I proposed already accommodates actual updates. Have you taken a look at it? The keys all have to be coded in order to match updates, but only the data that changes need be coded.
The remaining columns can be missing. I actually ended up using your example since it accommodates updates. Thank you very much! Below is how I would code what I indicated in my original reply. My method would just be different.
Some people like to use SQL. Some people like to use Hash tables. Take your pick. Each issue of the magazine contains a form for readers to fill out when they change their names or addresses. To simplify the maintenance job, the form requests that readers send only new information.
New subscribers can start a subscription by completing the entire form. When a form is received, a data entry operator enters the information on the form into a raw data file.
The mailing list is updated once per month from the raw data file. A subscriber's SubscriberId never changes. The last name appears first, followed by a comma and the first name. This variable is missing for addresses outside the United States and Canada.
The following program creates and displays the first part of this data set. The raw data are already sorted by SubscriberId.
0コメント