[+/KM]CSV 파일 DB에 유지하기

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

빙응의 공부 블로그

[+/KM]CSV 파일 DB에 유지하기 본문

Project/PlusKM

[+/KM]CSV 파일 DB에 유지하기

빙응이 2025. 1. 21. 20:17

📝배경

개인프로젝트로 병원, 약국의 위치를 검색하고 그 병원의 진료 내역을 공유할 수 있는 서비스를 기획했습니다.

병원, 약국 위치를 받기 위해 공공데이터 포탈의 CSV 파일을 활용하기로 하였고, CSV 파일을 DB에 유지하며 최신 정보가 들어왔을 때 정상적으로 갱신이 되게 하겠습니다.

최종 목표는 CSV 파일을 DB에 저장하고 최신 정보를 중복 처리, 최신화 하는 것입니다.

📝CSV 파일 저장하기

일단 들어가기 앞서 제 프로젝트의 RDB입니다.

지금 CSV 파일에 저장하는 데이터는 장소 정보로 약국, 병원 모두 같은 규격에서 저장합니다. 그것을 이제 타입을 통해 분리합니다.

@Entity
@Builder
@Getter
@AllArgsConstructor
@NoArgsConstructor
@Table(name = "place", indexes = {
    @Index(name = "idx_place_coordinate", columnList = "coordinate", unique = false)
})
public class Place {
    @Id
    @GeneratedValue(strategy = jakarta.persistence.GenerationType.IDENTITY)
    private Long id;

    private String name;

    @Enumerated(EnumType.STRING)
    private Place_type place_type;

    private String address;

    private String tel;

    @Column(nullable = false, columnDefinition = "POINT SRID 4326")
    private Point coordinate;

}

제대로된 유니크 식별자가 없기 때문에 name + address로 중복을 탐지할 생각입니다.

CSV 파일을 저장하기 위해 저가 선택한 방법은 CSV 파일을 수동으로 바꾸며 서버가 시작될 때마다 검색하는 것입니다.

사실 이렇게한 이유는 CSV 파일이 공공데이터에서 제공하는 API 형태의 파일이 아닌 기관에서 게시판에 유지하기 때문입니다.

그렇기에 해당 정보가 올라올 때마다 수동으로 바꿔주도록 하겠습니다.

그렇기에 이 작업은 효율성보다는 정확성이 중요합니다.

밑은 초기 디자인의 코드입니다.

@Slf4j
@RequiredArgsConstructor
@Order(1)
@DummyDataInit
public class PlaceInitializer implements ApplicationRunner {
    private final PlaceRepository placeRepository;

    @Override
    public void run(ApplicationArguments args) throws Exception {
        if (placeRepository.count() > 0) {
            log.info("[Place] 더미 데이터 존재");
        } else {
            importPlace();
        }
    }

    private void importPlace() {
        // 병원과 약국 데이터를 각기 다른 CSV 파일에서 읽어오기
        importCsvToPlace("data/병원정보.csv", 1, 28, 29, 3, 10, 11);
        importCsvToPlace("data/약국정보.csv", 1, 13, 14, 3, 10, 11);
    }

    private void importCsvToPlace(String filePath, int nameIdx, int longitudeIdx, int latitudeIdx, int placeTypeIdx,
                                  int addressIdx, int telIdx) {
        try (InputStream inputStream = getClass().getClassLoader().getResourceAsStream(filePath);
             InputStreamReader reader = new InputStreamReader(inputStream, StandardCharsets.UTF_8)) {

            List<Place> places = new ArrayList<>();

            CSVReader csvReader = new CSVReader(reader);
            String[] nextLine;

            // 첫 번째 줄은 헤더이므로 건너뜁니다.
            csvReader.readNext();

            while ((nextLine = csvReader.readNext()) != null) {
                String name = nextLine[nameIdx];  // 이름
                String placeTypeStr = nextLine[placeTypeIdx];  // 장소 타입
                String address = nextLine[addressIdx];  // 주소
                String tel = nextLine[telIdx];  // 전화번호

                Double longitude = null;
                Double latitude = null;

                try {
                    // 좌표 (경도, 위도)
                    longitude = Double.parseDouble(nextLine[longitudeIdx]);
                    latitude = Double.parseDouble(nextLine[latitudeIdx]);
                } catch (NumberFormatException e) {
                }

                if (longitude == null || latitude == null) {
                    // 좌표가 없을 경우 건너뛰기
                    continue;  // 좌표가 없으면 해당 항목은 건너뜁니다.
                }
                // Place 객체 생성
                Place place = Place.builder()
                    .name(name)
                    .place_type(Place_type.valueOf(placeTypeStr))  // Place_type ENUM 사용
                    .address(address)
                    .tel(tel)
                    .coordinate(createPoint(latitude, longitude))  // Point 객체 생성
                    .build();

                places.add(place);
            }
            placeRepository.saveAll(places);
        } catch (Exception e) {
            log.error("CSV 파일을 처리하는 중 오류 발생", e);
            throw new RuntimeException("CSV 파일을 처리하는 중 오류 발생", e);
        }
    }

    private Point createPoint(double latitude, double longitude) {
        GeometryFactory geometryFactory = new GeometryFactory();
        Point point = geometryFactory.createPoint(new Coordinate(longitude, latitude));
        point.setSRID(4326);  // SRID 설정
        return point;
    }
}

말 그대로 CSV 파일을 읽어 전부 저장합니다.

아직 중복 처리, 최신화 작업은 이루어지지 않으며 이 코드의 문제점은 외래키에 대한 대책이 없습니다.

저의 서비스는 위의 RDB처럼 구독, 진료 내역 공유가 존재합니다. 이는 장소의 ID 외래키를 받지만 해당 코드처럼 진행하면 외래키 제약 조건이 깨집니다. 그렇기에 중복 검사 및 최신화가 필수입니다.

📝CSV 최신화 시에 중복 검사 및 최신화

이제 우리는 2가지를 처리해야합니다.

중복 데이터에 대한 업데이트
삭제된 데이터를 검색해서 삭제하기

📌중복 데이터 업데이트

중복 데이터에 대한 업데이트는 들어온 데이터에 대해 이론적으론 전부 save를 하면 할 수 있지만 저의 CSV 파일은 유니크 식별 컬럼이 없습니다. 그렇기에 따로 컬럼을 제작하지 않고 이름 + 주소를 통해서 중복 데이터를 업데이트를 진행하겠습니다.

진행 순서는 다음과 같습니다.

CSV 파일을 읽으며 MAP<UniqueKey, Place>에 저장
전체 테이블을 불러옴
Stream을 이용해 순차적으로 중복 데이터 검사(MAP에 들어있는 유무, 들어있다면 객체가 같은가 유무)
중복 데이터 검사를 통해 업데이트 리스트 저장

CSV 파일을 읽으며 MAP<UniqueKey, Place>에 저장

우선 기존 CSV 파일을 읽는 메소드에 중복 데이터를 검사하는 것을 추가했습니다. 그 이유는 병원/약국의 사업자를 등록할때 중복 등록으로 인해 같은 데이터가 2개 이상 들어있는 것을 발견했습니다.

또한 약국 CSV 파일에서 몇몇 약국은 좌표가 없기에 해당 처리까지 해줬습니다.

    private void syncPlaceData(String filePath, int nameIdx, int longitudeIdx, int latitudeIdx, int placeTypeIdx,
                               int addressIdx, int telIdx, Map<String, Place> csvDataMap) {
        try (
             InputStream inputStream = getClass().getClassLoader().getResourceAsStream(filePath);
             InputStreamReader reader = new InputStreamReader(inputStream, StandardCharsets.UTF_8);
             CSVReader csvReader = new CSVReader(reader)) {

            csvReader.readNext(); // 첫 번째 줄 헤더 건너뜀

            String[] nextLine;
            while ((nextLine = csvReader.readNext()) != null) {
                String name = nextLine[nameIdx];
                String placeTypeStr = nextLine[placeTypeIdx];
                String address = nextLine[addressIdx];
                String tel = nextLine[telIdx];

                Double longitude = null;
                Double latitude = null;

                try {
                    longitude = Double.parseDouble(nextLine[longitudeIdx]);
                    latitude = Double.parseDouble(nextLine[latitudeIdx]);
                } catch (NumberFormatException e) {
                    continue; // 좌표가 없으면 건너뜀
                }

                String uniqueKey = generateUniqueKey(name, address); // 유니크 키 생성
                if (csvDataMap.containsKey(uniqueKey)) {
                    log.warn("중복 데이터 발견 - 유니크 키: {}", uniqueKey);
                    continue;
                }
                Place place = Place.builder()
                    .name(name)
                    .place_type(Place_type.valueOf(placeTypeStr))
                    .address(address)
                    .tel(tel)
                    .coordinate(createPoint(latitude, longitude))
                    .build();

                csvDataMap.put(uniqueKey, place);
            }

        } catch (Exception e) {
            log.error("CSV 파일을 처리하는 중 오류 발생", e);
            throw new RuntimeException("CSV 파일을 처리하는 중 오류 발생", e);
        }
    }

Stream을 이용해 순차적으로 중복 데이터 검사(MAP에 들어있는 유무, 들어있다면 객체가 같은가 유무)

Stream을 통해서 체계적으로 CSV 파일로 읽어온 MAP에 들어있는지 검사하고 들어있다면 객체가 같은지도 검색하는 메소드를 작성했습니다.

    private void updateDatabase(Map<String, Place> csvDataMap) {
        // 기존 데이터 조회
        List<Place> existingPlaces = placeRepository.findAll();

        // 기존 데이터 맵 생성
        Map<String, Place> existingPlacesMap = existingPlaces.stream()
            .collect(Collectors.toMap(
                place -> generateUniqueKey(place.getName(), place.getAddress()),
                place -> place
            ));

        // 추가/수정 대상 찾기
        List<Place> placesToSave = csvDataMap.entrySet().stream()
            .filter(entry -> !existingPlacesMap.containsKey(entry.getKey()) || isUpdated(entry.getKey(), entry.getValue(), existingPlacesMap))
            .map(Map.Entry::getValue)
            .collect(Collectors.toList());


        placeRepository.saveAll(placesToSave);
     
    }

    private boolean isUpdated(String key, Place newPlace, Map<String, Place> existingPlacesMap) {
        Place existingPlace = existingPlacesMap.get(key);

        if (existingPlace == null) {
            return true; // 기존 데이터가 없는 경우 업데이트 필요
        }

        // 필드별 비교
        return !Objects.equals(existingPlace.getName(), newPlace.getName())
            || !Objects.equals(existingPlace.getPlace_type(), newPlace.getPlace_type())
            || !Objects.equals(existingPlace.getAddress(), newPlace.getAddress())
            || !Objects.equals(existingPlace.getTel(), newPlace.getTel())
            || !existingPlace.getCoordinate().equalsExact(newPlace.getCoordinate());
    }

📌삭제 데이터 테이블에서 삭제

삭제 데이터에 대한 관리는 중복 업데이트보다 간단합니다. CSV로 들어온 MAP에서 없는 것을 모두 삭제하면 됩니다.

    private void updateDatabase(Map<String, Place> csvDataMap) {
        // 기존 데이터 조회
        List<Place> existingPlaces = placeRepository.findAll();

        // 기존 데이터 맵 생성
        Map<String, Place> existingPlacesMap = existingPlaces.stream()
            .collect(Collectors.toMap(
                place -> generateUniqueKey(place.getName(), place.getAddress()),
                place -> place
            ));

        // 추가/수정 대상 찾기
        List<Place> placesToSave = csvDataMap.entrySet().stream()
            .filter(entry -> !existingPlacesMap.containsKey(entry.getKey()) || isUpdated(entry.getKey(), entry.getValue(), existingPlacesMap))
            .map(Map.Entry::getValue)
            .collect(Collectors.toList());

        // 삭제 대상 찾기
        List<Place> placesToDelete = existingPlaces.stream()
            .filter(place -> !csvDataMap.containsKey(generateUniqueKey(place.getName(), place.getAddress())))
            .collect(Collectors.toList());

        placeRepository.saveAll(placesToSave);
        placeRepository.deleteAll(placesToDelete);

        log.info("[Place] 데이터 동기화 완료 - 추가/수정: {}, 삭제: {}", placesToSave.size(), placesToDelete.size());
    }

전체 코드입니다.

@Slf4j
@RequiredArgsConstructor
@Order(1)
@DummyDataInit
public class PlaceInitializer implements ApplicationRunner {
    private final PlaceRepository placeRepository;

    @Override
    public void run(ApplicationArguments args) throws Exception {
        if (placeRepository.count() > 0) {
            log.info("[Place] 기존 데이터 갱신 시작");
        } else {
            log.info("[Place] 더미 데이터 삽입 시작");
        }
        importPlace();
    }

    private void importPlace() {
        Map<String, Place> csvDataMap = new HashMap<>();
        syncPlaceData("data/병원정보.csv", 1, 28, 29, 3, 10, 11, csvDataMap);
        syncPlaceData("data/약국정보.csv", 1, 13, 14, 3, 10, 11, csvDataMap);
        updateDatabase(csvDataMap);
    }

    private void syncPlaceData(String filePath, int nameIdx, int longitudeIdx, int latitudeIdx, int placeTypeIdx,
                               int addressIdx, int telIdx, Map<String, Place> csvDataMap) {
        try (
             InputStream inputStream = getClass().getClassLoader().getResourceAsStream(filePath);
             InputStreamReader reader = new InputStreamReader(inputStream, StandardCharsets.UTF_8);
             CSVReader csvReader = new CSVReader(reader)) {

            csvReader.readNext(); // 첫 번째 줄 헤더 건너뜀

            String[] nextLine;
            while ((nextLine = csvReader.readNext()) != null) {
                String name = nextLine[nameIdx];
                String placeTypeStr = nextLine[placeTypeIdx];
                String address = nextLine[addressIdx];
                String tel = nextLine[telIdx];

                Double longitude = null;
                Double latitude = null;

                try {
                    longitude = Double.parseDouble(nextLine[longitudeIdx]);
                    latitude = Double.parseDouble(nextLine[latitudeIdx]);
                } catch (NumberFormatException e) {
                    continue; // 좌표가 없으면 건너뜀
                }

                String uniqueKey = generateUniqueKey(name, address); // 유니크 키 생성
                if (csvDataMap.containsKey(uniqueKey)) {
                    log.warn("중복 데이터 발견 - 유니크 키: {}", uniqueKey);
                    continue;
                }
                Place place = Place.builder()
                    .name(name)
                    .place_type(Place_type.valueOf(placeTypeStr))
                    .address(address)
                    .tel(tel)
                    .coordinate(createPoint(latitude, longitude))
                    .build();

                csvDataMap.put(uniqueKey, place);
            }

        } catch (Exception e) {
            log.error("CSV 파일을 처리하는 중 오류 발생", e);
            throw new RuntimeException("CSV 파일을 처리하는 중 오류 발생", e);
        }
    }

    private void updateDatabase(Map<String, Place> csvDataMap) {
        // 기존 데이터 조회
        List<Place> existingPlaces = placeRepository.findAll();

        // 기존 데이터 맵 생성
        Map<String, Place> existingPlacesMap = existingPlaces.stream()
            .collect(Collectors.toMap(
                place -> generateUniqueKey(place.getName(), place.getAddress()),
                place -> place
            ));

        // 추가/수정 대상 찾기
        List<Place> placesToSave = csvDataMap.entrySet().stream()
            .filter(entry -> !existingPlacesMap.containsKey(entry.getKey()) || isUpdated(entry.getKey(), entry.getValue(), existingPlacesMap))
            .map(Map.Entry::getValue)
            .collect(Collectors.toList());

        // 삭제 대상 찾기
        List<Place> placesToDelete = existingPlaces.stream()
            .filter(place -> !csvDataMap.containsKey(generateUniqueKey(place.getName(), place.getAddress())))
            .collect(Collectors.toList());

        placeRepository.saveAll(placesToSave);
        placeRepository.deleteAll(placesToDelete);

        log.info("[Place] 데이터 동기화 완료 - 추가/수정: {}, 삭제: {}", placesToSave.size(), placesToDelete.size());
    }

    private boolean isUpdated(String key, Place newPlace, Map<String, Place> existingPlacesMap) {
        Place existingPlace = existingPlacesMap.get(key);

        if (existingPlace == null) {
            return true; // 기존 데이터가 없는 경우 업데이트 필요
        }

        // 필드별 비교
        return !Objects.equals(existingPlace.getName(), newPlace.getName())
            || !Objects.equals(existingPlace.getPlace_type(), newPlace.getPlace_type())
            || !Objects.equals(existingPlace.getAddress(), newPlace.getAddress())
            || !Objects.equals(existingPlace.getTel(), newPlace.getTel())
            || !existingPlace.getCoordinate().equalsExact(newPlace.getCoordinate());
    }

    private String generateUniqueKey(String name, String address) {
        return (name.trim() + "_" + address.trim()).replaceAll("\\s+", "_");
    }

    private Point createPoint(double latitude, double longitude) {
        GeometryFactory geometryFactory = new GeometryFactory();
        Point point = geometryFactory.createPoint(new Coordinate(longitude, latitude));
        point.setSRID(4326);
        return point;
    }
}

이렇게 CSV 파일을 DB에 저장하면서 요구사항에 맞게 중복 데이터에 대한 업데이트와 삭제 데이터에 대한 최신화를 진행했습니다. CSV 파일에 대한 유지는 공공데이터 특성상 수동으로 했지만 업데이트에 대해서 서버 시작 시 자동화하여 데이터에 대한 정확성과 안전성을 챙겼습니다.

'Project > PlusKM' 카테고리의 다른 글

[+/KM]JDBC Batch로 CSV 데이터 저장 최적화 (0)	2025.02.18
[+/KM]공간 데이터 데이터베이스 최적화를 위한 MongoDB 도입 (0)	2025.01.22
[+/KM]MySQL에서 공간 데이터 최적화하기 (0)	2025.01.20

'Project/PlusKM' Related Articles

빙응의 공부 블로그

[+/KM]CSV 파일 DB에 유지하기 본문

[+/KM]CSV 파일 DB에 유지하기

📝배경

📝CSV 파일 저장하기

📝CSV 최신화 시에 중복 검사 및 최신화

📌중복 데이터 업데이트

📌삭제 데이터 테이블에서 삭제

'Project > PlusKM' 카테고리의 다른 글

티스토리툴바